Page 1

gh-2016_2_copertina.qxp_Layout 1 31/05/16 11:22 Pagina 1

Volume 11, Number 2

CONTENTS

May 2016

SHAILVI GUPTA, THOMAS A. GROEN, BARCLAY T. STEWART, SUNIL SHRESTHA, DAVID A. SPIEGEL, BENEDICT C. NWOMEH, REINOU S. GROEN, ADAM L. KUSHNER

KRISTIN MESECK, MARTA M. JANKOWSKA, JASPER SCHIPPERIJN, LOKI NATARAJAN, SUNEETA GODBOLE, JORDAN CARLSON, MICHELLE TAKEMOTO, KATIE CRIST, JACQUELINE KERR

The spatial distribution of injuries in need of surgical intervention in Nepal ...........................................................................77

Is missing geographic positioning system data in accelerometry studies a problem, and is imputation the solution? ............................157

OCKERT LOUIS VAN SCHALKWYK, EVA M. DE CLERCQ, CLAUDIA DE PUS, GUY HENDRICKX, PETER VAN DEN BOSSCHE, DARRYN L. KNOBEL

DUSTIN T. DUNCAN, RYAN R. RUFF, BASILE CHAIX, SEANN D. REGAN, JAMES H. WILLIAMS, JOSEPH RAVENELL, MARIE A. BRAGG, GBENGA OGEDEGBE, BRIAN ELBEL

Heterogeneity in a communal cattle-farming system in a zone endemic for foot and mouth disease in South Africa ............................83

Perceived spatial stigma, body mass index and blood pressure: a global positioning system study among low-income housing residents in New York City...................................................164

DAVID GIKUNGU, JACOB WAKHUNGU, DONALD SIAMBA, EDWARD NEYOLE, RICHARD MUITA, BERNARD BETT Dynamic risk model for Rift Valley fever outbreaks in Kenya based on climate and disease outbreak data ........................................95 YING WANG, YONGLI YANG, XUEZHONG SHI, SAICAI MAO, NIAN SHI, XIAOQING HUI The spatial distribution pattern of human immunodeficiency virus/acquired immune deficiency syndrome in China .......................104 BENN SARTORIUS, KURT SARTORIUS How much incident lung cancer was missed globally in 2012? An ecological country-level study ......................................................110

SARSENBAY K. ABDRAKHMANOV, AKHMETZHAN A. SULTANOV, KANATZHAN K. BEISEMBAYEV, FEDOR I. KORENNOY, DOSYM B. KUSHUBAEV, ABLAIKHAN S. KADYROV Zoning the territory of the Republic of Kazakhstan as to the risk of rabies among various categories of animals .............174 SEONG-YONG PARK, JIN-MI KWAK, EUN-WON SEO, KWANG-SOO LEE Spatial analysis of the regional variation of hypertensive disease mortality and its socio-economic correlates in South Korea...................................................................................182 SU YUN KANG, SUSANNA M. CRAMB, NICOLE M. WHITE, STEPHEN J. BALL, KERRIE L. MENGERSEN

RENKE LÜHKEN, JÖRN MARTIN GETHMANN, PETRA KRANZ, PIA STEFFENHAGEN, CHRISTOPH STAUBACH, FRANZ J. CONRATHS, ELLEN KIEL

Making the most of spatial information in health: a tutorial in Bayesian disease mapping for areal data.........................190

Comparison of single- and multi-scale models for the prediction of the Culicoides biting midge distribution in Germany .....................119

SARSENBAY K. ABDRAKHMANOV, KANATZHAN K. BEISEMBAYEV, FEDOR I. KORENNOY, GULZHAN N. YESSEMBEKOVA, DOSYM B. KUSHUBAEV, ABLAIKHAN S. KADYROV

KYUNGSOO HAN, SEJIN PARK, JÜRGEN SYMANZIK, SOOKHEE CHOI, JEONGYONG AHN

Revealing spatio-temporal patterns of rabies spread among various categories of animals in the Republic of Kazakhstan, 2010-2013..................................................199

Trends in obesity at the national and local level among South Korean adolescents ..................................................................130 RHEA BAGGENSTOS, TOBIAS DAHINDEN, PAUL R. TORGERSON, HANSRUEDI BÄR, CHRISTINA RAPSCH, GABRIELA KNUBBEN-SCHWEIZER Validation of an interactive map assessing the potential spread of Galba truncatula as intermediate host of Fasciola hepatica in Switzerland ....................................................137

ROBERTO CONDOLEO, VINCENZO MUSELLA, MARIA PAOLA MAURELLI, ANTONIO BOSCO, GIUSEPPE CRINGOLI, LAURA RINALDI Mapping, cluster detection and evaluation of risk factors of ovine toxoplasmosis in Southern Italy ...........................................206 RAHELEH SANIEI, ALI ZANGIABADI, MOHAMMAD SHARIFIKIA, YOUSEF GHAVIDEL

TIMOTHY SHIELDS, JESSIE PINCHOFF, JAILOS LUBINDA, HARRY HAMAPUMBU, KELLY SEARLE, TAMAKI KOBAYASHI, PHILIP E. THUMA, WILLIAM J. MOSS, FRANK C. CURRIERO

Air quality classification and its temporal trend in Tehran, Iran, 2002-2012 ......................................................213

Spatial and temporal changes in household structure locations using high-resolution satellite imagery for population assessment: an analysis in southern Zambia, 2006-2011 ......................................144

ERNESTO PASCOTTO, DIEGO CAPRARO, PAOLO TOMÈ, MAURO SPANGHERO

CRISTIAN DOMșA, ATTILA D. SÁNDOR, ANDREI D. MIHALCA

Topographic distribution of gastritis in heavy pigs investigated by a geographic information system approach ...................................221

Climate change and species distribution: possible scenarios for thermophilic ticks in Romania .....................................................151

ISSN 1827-1987

Health Applications in Geospatial Science

Geospatial Health Vol. 11, 2 (2016) 77-224

Volume 11, Number 2

May 2016

Available online www.geospatialhealth.net

Official Journal of the International Society of Geospatial Health www.GnosisGIS.org

ISSN 1827-1987


gh-2016_2_copertina.qxp_Layout 1 31/05/16 11:22 Pagina 2

GEOSPATIAL HEALTH

GEOSPATIAL HEALTH Aims and Scope. Geospatial Health is the official journal of the International Society of Geospatial Health (www.GnosisGIS.org). The journal was founded in 2006 at the University of Naples Federico II by Giuseppe Cringoli, John B. Malone, Robert Bergquist and Laura Rinaldi. The focus of the journal is on all aspects of the application of geographical information systems, remote sensing, global positioning systems, spatial statistics and other geospatial tools in human and veterinary health. The journal publishes two issues per year. Submission of manuscripts Submission to Geospatial Health will be done electronically via the Internet through the website: www.geospatialhealth.net Submission of a manuscript implies that it is original and is not being considered for publication elsewhere. Submission also implies that all authors have approved the paper for release and are in agreement with its content. Preparation of manuscripts - Manuscripts should be written in English. - Manuscripts should have numbered lines, with wide margins and double spacing throughout. Every page of the manuscript, including the title page, should be numbered. - Manuscripts should be organized in the following order: Title Name(s) of author(s) Complete postal address(es) of affiliations Full telephone, Fax no. and E-mail address of the corresponding author Abstract (no longer than 250 words) Keywords (3-5) Introduction Materials and Methods (with subheadings if necessary) Results (with subheadings if necessary) Discussion Conclusions Acknowledgements References Titles of tables and figures Tables (separate file(s)) Figures (separate file(s), in JPG or TIFF format)

Publisher - Editorial Office PAGEPress srl - Via Giuseppe Belli 7 - 27100 Pavia - www.pagepress.org Editorial Staff Lucia Zoppi, Managing Editor Cristiana Poggi, Production Editor Tiziano Taccini, Technical Support Printed by Press Up srl Via La Spezia 118/C 00055 Ladispoli (RM), Italy Logo on front cover by Vincenzo Musella

References - References cited in the text should be presented in a list in the section “References�. - In the text refer to the author's name (without initial) and year of publication. - If reference is made in the text to a publication written by more than two authors the name of the first author should be used followed by "et al.". This indication, however, should never be used in the list of references. In this list names of first author and co-authors should be mentioned. - References cited together in the text should be arranged chronologically. - The references should be listed in alphabetical order and they should contain: surname and initials of each author, year of publication, title of the paper, abbreviated name and volume of the journal (for books, title and publisher), first and last page of the paper. - For example: for journals Cringoli G, Taddei R, Rinaldi L, Veneziano V, Musella V, Cascone C, Sibilio G, Malone JB, 2004. Use of remote sensing and geographical information systems to identify environmental features that influence the distribution of paramphistomosis in sheep from the southern Italian Apennines. Vet Parasitol 122:15-26. for books Elliott P, Wakefield J, Best N, Briggs D, 2000. Spatial epidemiology. Methods and applications. Oxford University Press, Oxford, UK.


gh-2016_2_prime pagine.qxp_Layout 1 31/05/16 11:19 Pagina 1

GEOSPATIAL HEALTH Volume 11, Number 2, May 2016

Geospatial Health is indexed in the CABI International, in MEDLINE/PubMed databases and in the Master Journal List of Thomson Reuters, publishers of Web of Science. Impact Factor - 2014: 1.194 © Journal Citation Reports 2015, Published by Thomson Reuters


gh-2016_2_prime pagine.qxp_Layout 1 31/05/16 11:19 Pagina 2

Join the International Society of Geospatial Health, GnosisGIS Member benefits: - Free issues of the journal Geospatial Health if attending the Annual Meeting - Free access to databases covering data from North - and South America, West Africa, East Africa and South East Asia including China - Access to a global network of colleagues interested in geospatial health technology - Assistance with GIS and remote sensing problems Membership Dues: 50$ for developed country members 25$ for members from developing countries 10$ for students For more information check our web site: www.GnosisGIS.org Publication charges = 500 Euros per paper. Any regular member of the International Society of Geospatial Health (www.gnosisgis.org), who is simultaneously either first author or corresponding author of the paper, is eligible for a 50% discount per paper. For more information visit the following page: http://geospatialhealth.net/index.php/gh/pages/view/payments

Š 2016 GnosisGIS - UNINAVP


gh-2016_2_prime pagine.qxp_Layout 1 31/05/16 11:19 Pagina 3

GEOSPATIAL HEALTH Volume 11, Number 2, May 2016


gh-2016_2_prime pagine.qxp_Layout 1 31/05/16 11:19 Pagina 4

GEOSPATIAL HEALTH An International Scientific Journal Founded in 2006 by J.B. Malone, G. Cringoli, R. Bergquist and L. Rinaldi

Editor-in-Chief Robert Bergquist, Ingeröd 407, S-45494 Brastad, Sweden - Tel. +46 523 43336 (editor@geospatialhealth.net) Associate Editors Laura Rinaldi, University of Naples Federico II, Naples, Italy - Tel. +39 081 2536281 (lrinaldi@unina.it) Anna-Sofie Stensgaard, Faculty of Health and Medical Sciences, University of Copenhagen, Denmark Tel. +45.35.336588 (asstensgaard@snm.ku.dk) Jürg Utzinger, Swiss Tropical and Public Health Institute, Basel, Switzerland - Tel. +41 61 2848129 (juerg.utzinger@unibas.ch) Editorial Board Sherif Amer Vicente Belizario Annibale Biggeri Mark Booth Dolores Catelan Pietro Ceccato Archie C.A. Clements Ulisses Confalonieri Giuseppe Cringoli Dustin T. Duncan Els Ducheyne Augustin Estrada-Pena Màrius V. Fuentes Claudio Genchi Nicholas Hamm Richard Kiang Uriel Kitron Fedor Korennoy Thomas K. Kristensen Lydia Leonardo Jeffrey C. Luvall John B. Malone Samuel O. Manda Santiago Mas-Coma Samson Mukaratirwa Naoko Nihei Judy Omumbo Nicolas M. Oreskovic Ximena Porcasi Gómez Peter Steinmann Yves M. Tourre Penelope Vounatsou Arve Lee Willingham Guojing Yang Xiaonong Zhou

Department of Urban and Regional Planning and Geo-information Management, University of Twente, Enschede, The Netherlands Department of Health, University of the Philippines, Manila, Philippines University of Florence, Florence, Italy School of Medicine, Pharmacy and Health, Durham University, Durham, UK University of Florence, Florence, Italy The Earth Institute at Columbia University, Palisades, NY, USA The Australian National University, Canberra, Australia Instituto de Pesquisas René Rachou, FIOCRUZ, Belo Horizonte, MG, Brazil University of Naples Federico II, Naples, Italy Department of Population Health, New York University School of Medicine, New York, NY, USA AVIA-GIS, Zoersel, Belgium Zaragosa University, Zaragosa, Spain University of València, Burjassot-València, Spain University of Milan, Milan, Italy Department of Earth Observation Science, Faculty of Geo-Information Science and Earth Observation, University of Twente, Enschede, The Netherlands NASA Goddard Space Flight Center, Greenbelt, MD, USA University of Illinois, Urbana, IL, USA Federal Center for Animal Health, Vladimir, Russia Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark University of the Philippines, College of Public Health, Manila, Philippines NASA Marshall Space Flight Center, Huntsville, AL, USA Louisiana State University, Baton Rouge, LA, USA South African Medical Research Council, Pretoria, South Africa University of València, Burjassot-València, Spain University of KwaZulu-Natal, Pietermaritzburg, South Africa National Institute of Infectious Diseases, Tokyo, Japan The Earth Institute at Columbia University, Palisades, NY, USA Harvard Medical School, Boston, MA, USA Instituto de Altos Estudios Espaciales Mario Gulich, Centro Espacial Teófilo Tabanera, CONAE, Cordoba, Argentina Swiss Tropical and Public Health Institute, Basel, Switzerland Centre National d'Etudes Spatiales, Tolouse, France Swiss Tropical and Public Health Institute, Basel, Switzerland School of Veterinary Medicine, Ross University, Basseterre, Federation of St. Kitts and Nevis Jiangsu Institute of Parasitic Diseases, Wuxi, China National Institute of Parasitic Diseases, Shanghai, China


gh-2016_2.qxp_Hrev_master 31/05/16 11:44 Pagina 77

Geospatial Health 2016; volume 11:359

The spatial distribution of injuries in need of surgical intervention in Nepal Shailvi Gupta,1 Thomas A. Groen,2 Barclay T. Stewart,3,4 Sunil Shrestha,5 David A. Spiegel,6 Benedict C. Nwomeh,7 Reinou S. Groen,8 Adam L. Kushner9 1University of California, San Francisco East Bay and Surgeons OverSeas, Oakland, CA, USA; 2University of Twente, Enschede, The Netherlands; 3University of Washington, Seattle, WA, USA; 4Department of Interdisciplinary Sciences, Stellenbosch University, Cape Town, South Africa; 5Nepal Medical College, Kathmandu, Nepal; 6Children’s Hospital of Pennsylvania, Philadelphia, PA, USA; 7Nationwide Children’s Hospital and Surgeons OverSeas, Columbus, OH; 8Johns Hopkins University School of Medicine and Surgeons OverSeas, Baltimore, MD; 9Johns Hopkins Bloomberg School of Public Health and Surgeons OverSeas, Baltimore, MD, USA

Abstract Geographic information system modelling can accurately represent the geospatial distribution of disease burdens to inform health service delivery. Given the dramatic topography of Nepal and a high prevalence of unmet surgical needs, we explored the consequences of topography on the prevalence of surgical conditions. The Nepalese Surgeons OverSeas Assessment of Surgical Need (SOSAS) is a validated, countrywide, cluster randomised survey that assesses surgical need in lowand middle-income countries; it was performed in Nepal in 2014. Data on conditions potentially affected by topography (e.g. fractures, hernias, injuries, burns) were extracted from the database. A national digital elevation model was used to determine altitude, aspect, slope steepness and curvature of the SOSAS survey sites. Forward stepwise linear regression was performed with prevalence of each surgical con-

Correspondence: Shailvi Gupta, University of California, San Francisco East Bay and Surgeons OverSeas, 1411 East 31st Street, 94602 Oakland, CA, USA. Tel: +1.972.841.0242 - Fax: +1.510.437.5127. E-mail: Shailvi.gupta@gmail.com Key words: Global surgery; Geographic information system; Public health; Topography; Nepal. Acknowledgments: this study would not have been possible without the gracious help of the Nepali enumerators and field supervisors who graciously donated their time and expertise. Special thanks to Surgeons OverSeas and the Association for Academic Surgery who provided the funding to make this study possible. Received for publication: 23 March 2015. Revision received: 24 September 2015. Accepted for publication: 16 October 2015. ©Copyright S. Gupta et al., 2016 Licensee PAGEPress, Italy Geospatial Health 2016; 11:359 doi:10.4081/gh.2016.359 This article is distributed under the terms of the Creative Commons Attribution Noncommercial License (CC BY-NC 4.0) which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

dition as the response variable and topographic data as explanatory variables. The highest correlation coefficient was for models predicting hernias and fractures, both explaining 21% of the variance. The model fitted to death due to fall would become significant when an outlier was excluded (P<0.001; R2=0.27). Excluding the outlier yielded a better-fitted model to burn injury (stepwise regression) without any explanatory variables. Other models trended towards a correlation, but did not have sufficient power to detect a difference. This study identified slight correlation between elevation and the prevalence of hernias and fall injuries. Further investigation on the effects of topography and geography on surgical conditions is needed to help determine if the data would be useful for directing allocation of surgical resources.

Introduction With increasing empirical data globally, surgical care is gaining momentum as a public health priority (Groen et al., 2012; Petroze et al., 2013; Gupta et al., 2015; Bae et al., 2011; Meara et al., 2014). Global estimations conclude that 73.6% of all surgical operations are carried out in the wealthiest nations, while only 3.5% of the operations are performed in the world’s poorest nations (Weiser et al., 2008). Approximately 2 billion people lack access to an operating room (Funk et al., 2010). Despite the recent interest in global surgical needs, there is still a paucity of objective data needed to define which barriers make access to surgical care so difficult in the world’s poorest nations, and if surgical conditions vary depending on geospatial considerations globally. Geographic information systems (GIS) are increasingly used to help model the geospatial distribution of diseases (Rocha et al., 2014; Faierman et al., 2014; Tollefson et al., 2015). Such modelling can provide an accurate, geographic representation of the spatial variation of disease burdens, which can help to guide policymakers in developing future service plans, tackling this aspect of healthcare access. In addition to planning for the location of healthcare services with the aim of providing support where the need is greatest, geospatial modelling can also be used to investigate the extent to which biophysical conditions, such as the presence of parasite reservoirs or how the topography of a country affects the prevalence of medical conditions (Osei et al., 2010). Nepal is a South Asian country with just over 27 million people. Situated between China in the North and India in the South, West and East, Nepal is a landlocked nation with diverse geography, housing eight of the world’s ten largest mountains in addition

[Geospatial Health 2016; 11:359]

[page 77]


gh-2016_2.qxp_Hrev_master 31/05/16 11:44 Pagina 78

Article

to numerous lower hills as well as lakes (World Bank, 2013). The terrain rises from a low of 59 m above the mean sea level in the Terrai Region at the northern rim of the Gangetic Plain to 8848 m, representing Earthâ&#x20AC;&#x2122;s highest peak Mt. Everest (World Bank, 2013). This wide range of altitudes, and the associated variation in topographic conditions (from gentle flat areas to steep, rugged terrain), provides a useful case study to explore the possible consequences of topography on the prevalence of surgical conditions, such as fractures, hernias and burns. This is an important aspect as was vividly brought home by the devastation of the country in April 2015 through the Gorkha Earthquake, one of the worst natural disasters in its history, which killed more than 8000 people and injuring more than 23,000 (Central News Network, 2015). Such an event in an already low-income and politically fragile country such as Nepal further complicates access to surgical care. Prior to the 2015 earthquake, the Surgeons OverSeas Assessment of Surgical Need (SOSAS) survey was used in Nepal in 2014 to assess the prevalence of surgical conditions and avoidable deaths. The objective of the present study is to use the data to explore the effect of topography on prevalence of surgical conditions in Nepal.

affected by altitude were chosen for inclusion in this study. Given the increased strenuous exercise needed at higher altitudes, the augmented risk of falling from heights and lower temperatures (leading to the necessity of fire for warmth), data on fractures, hernias, fall injuries (both fatal and non-fatal), burn injuries, and unmet surgical needs in general were extracted from the SOSAS database and used for the analysis. Unmet surgical need was defined as the affliction of an individual, who had reported a current (within one month) condition that he or she perceived needed at least a surgical consultation, which could not be accessed. Prevalence was calculated as the number of recordings of a medical condition relative to the total number of people surveyed. The spatial database was constructed as follows. A digital elevation model (DEM) for the whole of Nepal with a resolution of 90 by 90 m (Figure 2A) was downloaded from the United States Geological Survey (USGS), created by the shuttle radar topography mission (SRTM) (http://srtm.usgs.gov/). Next to altitude, aspect (degrees from north), slope steepness (in degrees) and curvature (the rate at which the slope changes) were calculated from the DEM. For every location where

Materials and Methods The SOSAS survey, a validated, countrywide, cluster randomised, cross-sectional survey, was performed in Nepal from May 25 to June 12, 2014. It consists of two parts, the first of which concerns household demographic data, access to healthcare and household member deaths within the past year. The second part randomly selects two household members for a verbal head-to-toe examination focusing on six anatomical regions: i) face, head and neck; ii) chest and breast; iii) abdomen; iv) groin and genitalia; v) back; and vi) extremities. Each respondent verbally elicits symptoms or experiences associated with a general spectrum of surgical conditions, such as wounds, swellings, deformities, burns and injuries. The SOSAS survey has been described in more detail previously (Groen et al., 2012).

Sampling A two-stage cluster sampling was performed. First, 15 of the 75 districts proportional to the population were randomly selected, after which 45 Village Development Committees (VDCs) were randomly selected (Figure 1), three for each district after stratification for the urban to rural population distribution (two rural to one urban). This methodology was similar to that used by the Demographic and Health Surveys normally carried out in Nepal (Nepal Ministry of Health and Population, Department of Health Services, 2012). A total sample size of 1350 households was used. The sample size estimation was calculated from a prevalence of unmet surgical needs of 5%, reported by a pilot study of SOSAS in Nepal in January 2014 (Gupta et al., 2014). All surveys were administered in Nepalese and the responses recorded in English via paper forms. Appropriate ethical approval was obtained prior to study execution. Verbal consent was obtained from all respondents prior to survey (parental consent, oral assent and/or parental permission was obtained for individuals younger than 18 years). Individuals noted to be cognitively impaired by household members were excluded from the study.

The Surgeons OverSeas Assessment of Surgical Need database By consensus among the authors, surgical conditions that might be [page 78]

Figure 1. Distribution of topographical conditions that were used in this study from a random selection of 1000 locations in Nepal (i.e. the background) and sites sampled by the Surgeons OverSeas Assessment of Surgical Need (SOSAS). Darker red represents overlapping between background and SOSAS distribution.

[Geospatial Health 2016; 11:359]


gh-2016_2.qxp_Hrev_master 31/05/16 11:44 Pagina 79

Article

SOSAS data had been collected, average and standard deviation of these factors were calculated within a 5 km radius from each site. Aspect was then converted to northness and eastness, which converts the degrees from north to a factor with a more straightforward interpretation. Thus, minimum northness (i.e. straight south) becomes -1, while maximum (i.e. straight north) = +1. Similarily, west signifies low levels of eastness (i.e. straight west is minimum eastness = -1), and high values indicate east (i.e. straight east = +1).

model fits, observed versus modelled scatter plots are provided in Figure 3. Overall model significance was low. The overall model performance was highest for the model explaining hernias and fractures (both 21% explained variance). For the models fitted to fall as a cause

Statistical analyses Simple forward stepwise linear regression was performed in R (R Development Core Team, 2014) with prevalence of surgical conditions as response and topographic explanatory variables. Model selection was based on the Akaike Information Criterion (AIC) (Akaike, 1978). The AIC gives a goodness of fit indication based on the number of explanatory variables included and sample size of the dataset. Preference is given to models with fewer explanatory variables. Before fitting the models, collinearity was checked by means of variance inflation factors (VIFs). A VIF upper threshold of 10 was used as criterion to identify too strong collinearity (Quinn and Keough, 2002). After final models were fitted, normality of residuals was checked to ensure that distribution of data was in accordance with model assumptions. Additionally, the degree of representativeness of SOSAS clusters for the topography in the region was made visible by creating histograms of the tested topographic determinants against a background set of 1000 random points in the same area.

Results The representativeness of the SOSAS sample for topography is shown in Figure 2B. The SOSAS database covers the distribution of topographic conditions well except for the really high (i.e. >5000 m) and steep (i.e. >34°) areas. Table 1 describes the variables retained by the stepwise procedure and the evidence for association with the selected surgical conditions. Because the variable selection was based on AIC, not every variable necessarily proves significant (P<0.05). Next to variables selected and their significance, overall model performance is expressed by means of the variance explained (R2) and the overall model significance based on an F-test. For a graphical assessment of

Figure 2. Digital elevation model of Nepal based on United States Geological Survey shuttle radar topography mission data (A) and location of clusters included in the Surgeons OverSeas Assessment of Surgical Need database (B).

Table 1. Model summary for each surgical condition analysed.

Intercept Mean altitude (m asl) Standard altitude (m asl) Mean slope (degrees) Standard slope (degrees) Mean curvature (ratio) Northness (ratio) Eastness (ratio) Model R2 Overall P

Fractures

Hernias

Death due to fall

Fall

Surgical needs

Burning injury

<0.001 0.01

0.01 0.06

0.07 0.08

<0.001 0.03

<0.001

0.007

0.007 0.08

0.01

0.11

0.13

0.03 0.18 0.21 0.05

0.21 0.02

0.07 0.08

0.11 0.10

na na

0.05 0.13

asl, above the mean sea level; na, not available. Values indicate P values of the partial slopes of every model based on t-tests, except for R2 (variance explained by the model) and the overall P value, which is based on an F-test (best fit statistical comparison). Italics indicates significance at P<0.05.

[Geospatial Health 2016; 11:359]

[page 79]


gh-2016_2.qxp_Hrev_master 31/05/16 11:44 Pagina 80

Article

of death (Figure 3C) and burn injury (Figure 3F), one single location had a large leverage on the model. When outliers were excluded, the model fitted to fall injury as a cause of death becomes significant (P<0.001; R2=0.27); though excluding the outlier did not make the model fitted to the burn injury model significant.

Discussion The concept of access to healthcare has been defined as those dimensions that describe the potential and actual entry of a given population to the health care delivery system. The availability of health care is a component in this concept and refers to the volume and distribution of medical resources in an area (Andersen et al., 1983). Much research on global surgery has equated access of adequate surgical care with the availability of trained surgeons and nurses as well as

physical facilities, such as operating theatres, autoclaves, utensils, etc., while other research papers haves focused on characteristics of the population, such as demographics, income, perceived need for surgical care, transport to a primary healthcare facility and severity of symptoms (Groen et al., 2012; Kushner et al., 2010; LeBrun et al., 2014; Ologunde et al., 2014). However, apart from patients, staff, buildings, instruments, affordability and cultural acceptability of care there is another important component: i.e. geographical accessibility, specifically targeted towards areas with particular high needs for care. Nepal is a country known for a great topographical variation and the SOSAS survey found a 10% prevalence of unmet surgical needs countrywide, and that was prior to the 2015 earthquake (Gupta et al., 2015). Given the wide range of altitudes and the variation in topographic conditions in Nepal along with a high prevalence of unmet surgical needs, our aim was to explore the possible consequences of topography on the prevalence of surgical conditions. Though not conclusive, our study reveals a correlation (though fairly low) between elevation and the

Figure 3. Graphical representations of predicted vs observed prevalence rates of medical conditions as functions of topographical conditions. For C) and F), outliers that had a large leverage on the fitted model are identified. Phidim and Tharpu are Village Development Committees within Nepal that were sampled in this study.

[page 80]

[Geospatial Health 2016; 11:359]


gh-2016_2.qxp_Hrev_master 31/05/16 11:44 Pagina 81

Article

prevalence of hernias and fall injuries. Geospatial analysis is a useful tool to identify the land cover of particular diseases to: prioritie riskprone areas; examine healthcare misdistribution; and target areas for disease prevention. For example, a study from Bihar state in India mapped out risk prone areas for visceral leishmaniasis or kala-azar sing a GIS approach. This study demonstrated that rural villages surrounded by a higher proportion of transitional swamps and sugarcane plantations had higher sand-fly abundance; and thus, were considered risk-prone areas for man-vector contact (Sudhakar et al., 2006; SalahiMoghaddam et al., 2010). Another study from sub-Saharan Africa used geospatial modelling to optimize rollout plans for anti-retroviral therapy in South Africa (Gerberry et al., 2014). This study found significant geographic variation in the efficiency of interventions in reducing the transmission of the human immunodeficiency virus using geographic targeting to help maximise geographic equity in access to interventions. Such data provide empirical evidence to plan targeted intervention. Similarly, this study sought to identify risk-prone areas for selected conditions requiring surgery, as such information might contribute to the development of targeted interventions to reduce unmet surgical needs in disproportionately affected populations. Furthermore, geospatial analysis can help explore if changes in altitude have a causal association for surgical conditions from a physiological basis. While this study was not equipped to explore a causal relationship between altitude and surgical conditions, previous studies have explored this relationship with regard to infectious diseases, such as pulmonary tuberculosis (Vargas et al., 2004). A Mexican study reveals that altitude had a strong inverse relationship to pulmonary tuberculosis, perhaps related to the well-known changes in alveolar oxygen pressure at different altitudes. Such techniques can help explore if certain surgical conditions are more likely at higher altitudes from a physiological perspective. Our results show that in Nepal, slope correlated with the numbers of both fractures and hernia prevalence, while altitude was correlated with fractures and death due to fall injuries, though not necessarily all fall injuries (non-fatal included). Other data analysed trended toward a correlation, but this study, as an exploratory analysis, did not have sufficient statistical power to detect a difference with respect to other conditions. The correlations noted may be explained, in part, by: i) physiology of living at higher elevations; ii) a greater relative risk for incurring surgical conditions that populations living in more complex terrain have (e.g. falling leading to fracture; repeated strenuous activity leading to hernia); or iii) having less access to appropriate surgical care. However, we did not find evidence for a difference between the prevalence of any unmet surgical need at differing elevations or terrain ruggedness. Clearly, the large effect that a few observations had on the overall performance of some of the models (Phidim and Tharpu) shows that he spatial spread of these observations and the total sample size was probably not optimal for this study. Nevertheless, these data demonstrate that there may be some relationships between terrain and surgical conditions that need to be better defined. This calls for efforts to collate databases, such as the SOSAS surveys, to analyse these relationships, which then in turn can be used to prioritse the allocation of scarce surgical resources to areas where they are most needed. As geospatial technology continues to improve, such technologies should be included in current data collection strategies and evaluations. Our study was executed prior to the Gorkha earthquake in 2015, thus many more individuals in mountainous regions without access to healthcare likely exist than were noted in our study. In order to confront this issue, Nepal instituted emergency mobile health clinics postdisaster (World Health Organization, 2015), thus allowing more Nepalese to have access to healthcare. This study has several limita-

tions. Limitations inherent to all cross sectional studies with similar random sampling methodology should be considered. Despite the use of data from a robustly sample-sized survey, resources only allowed for 15 of Nepalâ&#x20AC;&#x2122;s 75 districts to be selected. Because these districts were selected proportional to population, this means that more densely populated areas had a higher chance of being selected resulting in less developed regions and less densely populated areas of Nepal being underrepresented. These more rural areas in Nepal tend to be more mountainous and with higher elevations. Thus, the SOSAS study is a good start to explore such topographical relationships, but it is likely that a dedicated study focusing on both the higher and lower elevations in Nepal is needed to build more conclusive results. SOSAS relies on verbally self-reported data, which may be prone to recall bias. However, as part of SOSAS Nepal, a visual physical examination was performed, which validated these verbal reports (Gupta et al., 2015). Despite those limitations, the present results provide a valuable starting point for future studies to evaluate the effect of topographical difference in the prevalence of surgical conditions.

Conclusions Given the wide range of altitudes and the variation in topographic conditions in Nepal, we explored the possible consequences of topography on the prevalence of surgical conditions, such as fractures, hernias and burns. Though not conclusive, the study reveals a correlation between elevation and the prevalence of hernias and fall injuries. Further investigation on the effects of topography and geography on surgical conditions is needed to help direct the allocation of resources for surgical care.

References Akaike H, 1978. A Bayesian analysis of the minimum AIC procedure. Ann I Stat Math 30:9-14. Andersen RM, McCutcheon A, Aday LA, Chiu GY, Bell R, 1983. Exploring dimensions of access to medical care. Health Serv Res 18:49-74. Bae JY, Groen RS, Kushner AL, 2011. Surgery as a public health intervention: common misconceptions versus the truth. B World Health Organ 89:394. Central News Network, 2015. Death toll in Nepal earthquake tops 8,000. Available from: http://www.cnn.com/2015/05/10/asia/nepal-earthquake-death-toll/ Faierman ML, Anderson JE, Assane A, Bendix P, Vaz F, Rose JA, Funzamo C, Bickler SW, Noormahomed EV, 2014. Surgical patients travel longer distances than non-surgical patients to receive care at a rural hospital in Mozambique. Hum Res Dev 7:60-6. Funk LM, Weiser TG, Berry WR, Lipsitz SR, Merry AF, Enright AC, Wilson IH, Dziekan G, Gawande AA, 2010. Global operating theatre distribution and pulse oximetry supply: an estimation from reported data. Lancet 376:1055-61. Gerberry DJ, Wagner BG, Garcia-Lerma JG, Heneine W, Blower S, 2014. Using geospatial modeling to optimize the rollout of antiretroviralbased pre-exposure HIV interventions in sub-Saharan Africa. Nat Commun 5:5454. Groen RS, Samai M, Petroze RT, Kamara TB, Yambasu SE, Calland JF, Kingham TP, Guterbock TM, Choo B, Kushner AL, 2012. Pilot testing of a population-based surgical survey tool in Sierra Leone.

[Geospatial Health 2016; 11:359]

[page 81]


gh-2016_2.qxp_Hrev_master 31/05/16 11:44 Pagina 82

Article

World J Surg 36:771-4. Groen RS, Samai M, Stewart K, Cassidy LD, Kamara TB, Yambasu SE, Kingham TP, Kushner AL, 2012. Untreated surgical conditions in Sierra Leone: a cluster randomised cross-sectional, countrywide survey. Lancet 380:1082-7. Gupta S, Ranjit A, Shrestha R, Wong EG, Robinson WC, Shrestha S, Nwomeh BC, Groen RS, Kushner AL, 2014. Surgical needs of Nepal: pilot study of a population based survey in Pokhara, Nepal. World J Surg 38:3041-6. Gupta S, Shrestha S, Ranjit A, Nagarajan N, Groen RS, Kushner AL, Nwomeh BC, 2015. Surgical care in Nepal: conditions, preventable deaths, procedures and validation of a countrywide survey. Brit J Surg 102:700-7. Kushner AL, Cherian MN, Noel L, Spiegel DA, Groth S, Etienne C, 2010. Addressing the Millennium Development Goals from a surgical perspective: essential surgery and anesthesia in 8 low-and middleincome countries. Arch Surg 145:154-9. LeBrun DG, Chackungal S, Chao TE, Knowlton LM, Linden AF, Notrica MR, Solis CV, McQueen KA, 2014. Prioritizing essential surgery and safe anesthesia for the Post-2015 Development Agenda: operative capacities of 78 district hospitals in 7 low- and middle-income countries. Surgery 155:365-73. Meara JG, Hagander L, Leather AJM, 2014. Surgery and global health: a Lancet Commission. Lancet 383:9911. Nepal Ministry of Health and Population, Department of Health Services, 2012. Annual report. Available from: dohs.gov.np/wpcontent/uploads/2014/04/Annual_Report_2068_69.pdf Ologunde R, Maruthappu M, Shanmugarajah K, Shalhoub J, 2014. Surgical care in low and middle-income countries: burdens and barriers. Int J Surg 12:858-63. Osei FB, Duker AA, Augustijn PWM, Stein A, 2010. Spatial dependency of cholera prevalence on potential cholera reservoirs in an urban area, Kumasi, Ghana. Int J Appl Earth Obs 12:331-9.

[page 82]

Petroze R, Groen RS, Niyonkuru F, Mallory M, Ntaganda E, Joharifard S, Guterbock T, Kushner AL, Kyamanywa P, Calland FJ, 2013. Estimating operative disease prevalence in a low income country: results of a nationwide population survey in Rwanda. Surgery 153:457-64. Quinn GP, Keough MJ, 2002. Experimental design and data analysis for biologists. Cambridge University Press, Cambridge, UK. R Development Core Team, 2014. R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. Available from: http://www.R-project.org/ Rocha CM, Kruger E, Whyman R, Tennant M, 2014. Predicting geographically distributed adult dental decay in the greater Auckland region of New Zealand. Community Dent Hlth 31:85-90. Salahi-Moghaddam A, Mohebali M, Moshfae A, Habibi M, Zarei Z, 2010. Ecological study and risk mapping of visceral leishmaniasis in an endemic area of Iran based on a geographical information systems approach. Geospat Health 5:71-7. Sudhakar S, Srinivas T, Palit A, Kar SK, Battacharya SK, 2006. Mapping of risk prone areas of kala-aza (visceral leishmaniasis) in parts of Bihar State, India: an RS and GIS approach. J Vector Dis 43:115-22. Tollefson TT, Shaye D, Durbin-Johnson B, Mehdezadeh O, Mahomva L, Chidzonga M, 2015. Cleft lip palate in Zimbabwe: estimating the distribution of surgical burden of disease using GIS. Laryngoscope 125(Suppl.1):1-14. Vargas MH, Furuya MEY, Perez-Guzman C, 2004. Effect of altitude on the frequency of pulmonary tuberculosis. Int J Tuberc Lung D 8:1321-4. Weiser TG, Regenbogen SE, Thompson KD, Haynes AB, Lipsitz SR, Berry WR, Gawande AA, 2008. An estimation of the global volume of surgery: a modeling strategy based on available data. Lancet 372:139-44. World Bank, 2013. Nepal country at a glance. Available from: http://www.worldbank.org/en/country/nepal

[Geospatial Health 2016; 11:359]


gh-2016_2.qxp_Hrev_master 31/05/16 11:44 Pagina 83

Geospatial Health 2016; volume 11:338

Heterogeneity in a communal cattle-farming system in a zone endemic for foot and mouth disease in South Africa Ockert Louis van Schalkwyk,1,2 Eva M. De Clercq,3,4 Claudia De Pus,5 Guy Hendrickx,3 Peter van den Bossche,5 Darryn L. Knobel6,7 1Centre for Veterinary Wildlife Studies, University of Pretoria, Onderstepoort; 2Office of the State Veterinarian, Skukuza, South Africa; 3Avia-GIS, Zoersel; 4Research Fellow FNRS, Earth and Life Institute, Université Catholique de Louvain, Louvain-la-Neuve; 5Institute of Tropical Medicine, Antwerpen, Belgium; 6Department of Veterinary Tropical Diseases, University of Pretoria, Onderstepoort, South Africa; 7Center for Conservation Medicine and Ecosystem Health, Ross University School of Veterinary Medicine, Basseterre, St Kitts and Nevis

Abstract In South Africa, communal livestock farming is predominant in the foot and mouth disease control zone adjacent to the Greater Kruger National Park (KNP), where infected African buffaloes are common.

Correspondence: Ockert Louis van Schalkwyk, Office of the State Veterinarian, PO Box 12, Skukuza 1350, South Africa. Tel. +27.013.735.5641. E-mail: lvs@vodamail.co.za Key words: Communal livestock farming; Wildlife-livestock interface; Disease control; Spatial heterogeneity; South Africa. Remembrance: we would like to dedicate this paper to Prof. Peter Van den Bossche who was a friend and colleague and who tragically left us before this paper was published. We all knew Peter from different angles and found great pleasure in working with him. He was very supportive of this research and very interested in how outcomes may help improve livestock management at the wildlife-livestock interface. Acknowledgements: this work was funded by the Belgian Science Policy Office (Belspo) STEREO II Programme, Project SR/00/102: EPISTIS. The research was also supported by the Peace Parks Foundation. The authors would like to thank the State Veterinary Services of Mpumalanga and Limpopo Provinces in South Africa for their assistance in accessing archived livestock registers. General statistical advice received from Drs. Dirk Berkvens and Luigi Sedda is greatly appreciated. Received for publication: 26 February 2015. Revision received: 23 July 2015. Accepted for publication: 8 August 2015. ©Copyright O.L. van Schalkwyk et al., 2016 Licensee PAGEPress, Italy Geospatial Health 2016; 11:338 doi:10.4081/gh.2016.338 This article is distributed under the terms of the Creative Commons Attribution Noncommercial License (CC BY-NC 4.0) which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

During routine veterinary inspections of cattle in this area, a large amount of production and demographic parameters were being recorded. These data were collated for a five-year period (2003-2007) in three study sites to better understand the temporal dynamics and spatial heterogeneity in this system. A decreasing gradient from South to North with respect to both human and cattle population densities was observed. Rainfall and human population density alone could explain 71% of the variation in cattle density. Northern and central sites showed an overall decrease in total cattle numbers (15.1 and 2.9%, respectively), whereas a 28.6% increase was recorded in the South. The number of cattle owners in relation to cattle numbers remained stable during the study period. Only 4.0% of households in the South own cattle, compared to 13.7 and 12.7% in the North and Centre. The overall annual calving rate was 23.8%. Annual mortality rates ranged from 2.4 to 3.2%. Low calf mortality (2.1%) was recorded in the North compared to the South (11.6%). Annual off-take in the form of slaughter averaged 0.2, 11.7, and 11.0% in the North, Central and South sites, respectively. These figures provide valuable baseline data and demonstrate considerable spatial heterogeneity in cattle demography and production at this wildlife-livestock interface, which should be taken into consideration when performing disease risk assessments or designing disease control systems.

Introduction A large area next to the Kruger National Park (KNP) and its adjoining private and provincial nature reserves (APNR) in South Africa, is used for communal farming of both livestock and crops. Although the World Animal Health Organisation (OIE) considers South Africa to be free from the foot and mouth disease (FMD) in areas where vaccination is practised, the KNP and APNR are endemic for FMD (Brückner et al., 2002; Vosloo et al., 2002; Scoones et al., 2010), and this is due to the presence of the reservoir host of the disease, the African buffalo (Syncerus caffer) (Vosloo et al., 2006). To this end, a fence was erected around the KNP in the 1960s (Joubert, 2007; Scoones et al., 2010), and compulsory, weekly cattle inspections and bi- to tri-annual vaccination against FMD takes place at government livestock inspection points (IPs) within the area directly adjacent to the infected zone (the so-called protection zone with vaccination). Livestock in the remainder of the protection zone (protection zone without vaccina-

[Geospatial Health 2016; 11:338]

[page 83]


gh-2016_2.qxp_Hrev_master 31/05/16 11:44 Pagina 84

Article

tion) are inspected every fortnight and are not vaccinated. In addition, strict movement controls, enforced through a movement permit system, enable traceability of animal movements to, from and between IPs inside the protection zone (BrĂźckner et al., 2002; Scoones et al., 2010; Vosloo et al., 2002). This regular monitoring presents a unique opportunity to gain insight into some of the basic production and demographic parameters of this communally farmed livestock system. Land in these areas remains public property and is managed under communal tenure. Communal farming is an inherently different agricultural system to commercial farming enterprises (Behnke Jr., 1985; Giannecchini et al., 2007) and direct comparisons are not advisable (Abel, 1997; Swanepoel et al., 2000). Nevertheless, productivity comparisons of these systems have been widely attempted, with the finding that, on a per animal basis, communal farming compares poorly to commercial enterprises, but on a per hectare basis, it performs as well, if not better (Barrett, 1991; Shackleton et al., 2005). Close to three quarters of the income of subsistence farming households in South Africa is generated by wages and salaries, often earned through jobs in urban centres, or social grants unrelated to the farming operation, with very little income derived directly from crops or livestock (Kirsten and Moldenhauer, 2006; Goqwana et al., 2008). The benefits of livestock, cattle in particular, are not restricted by their financial value, and include various direct-use benefits such as meat, milk, manure, draught power, transport, and hides (Barrett, 1991; Scoones, 1992b; Swanepoel et al., 2000; Shackleton et al., 2005, Dovie et al., 2006). Much of the financial value of cattle is locked up as potential value, such that the animals act as a living savings account (Randolph et al., 2007; Shackleton et al., 2001). Livestock have the further advantage over crops that they can be utilised at any time during the year (Vandamme et al., 2010). Although a number of studies have been conducted within the communal livestock farming area along the western boundary of the KNP (Swanepoel et al., 2000; Dovie et al., 2003, 2006; Shackleton et al., 2005; Nthakheni, 2006; Stroebel et al., 2008, 2011), none of these considered the heterogeneity within the larger area explicitly, since they mainly focused on individual communities. This paper aims to provide quantitative data on the heterogeneity and spatio-temporal trends of the cattle population of roughly 350,000 animals found in the control zone described. It was felt that baseline demographic information and population trends could aid disease risk assessments, surveillance and control in the region, activities which currently do not explicitly consider such heterogeneities.

Gazankulu homeland states. The three study sites comprised 38 IPs in total (North: 10; Centre: 12; South: 16), all of which are in the FMD protection zone with vaccination.

Census data Data on the number of people and households were obtained from the 2001 national census (Statistics South Africa, 2001). Numbers per IP were derived using the centroids of census polygons falling inside the IP boundaries as determined by the density analyses.

Livestock data During the compulsory weekly inspection of cattle for FMD, cattle population data are collected by animal health technicians of the provincial state veterinary services. These data are kept in a livestock register on a per owner basis and aggregated per month at an IP level. Data for this study was retrieved from routine monthly aggregate hard copy reports of the livestock register and entered into an electronic database for the period January 2003 to December 2007 (Table 1).

Materials and Methods Study sites The study was conducted in three sites (North, Centre and South) along the western boundary of the KNP and APNR. Figure 1 shows the location of these sites, which were originally selected based on a combination of areas perceived by local experts to be at high risk of FMD outbreaks (due to African buffaloes straying from game reserves and coming in contact with cattle) as well as the perceived differences between the sites with regard to land use, population density and general environmental conditions. All three sites are situated in the Lowveld region of northeastern South Africa, a low altitude area characterised by hot summers, during which the peak rainfall occurs (November to February), and mild, dry winters. Prior to 1994, these sites formed part of the former Venda and [page 84]

Figure 1. Location of study sites.

[Geospatial Health 2016; 11:338]


gh-2016_2.qxp_Hrev_master 31/05/16 11:44 Pagina 85

Article

where two groups visited the same IP location, but bearing a different IP designation – e.g. Ireagh A and Ireagh B), cattle numbers were aggregated for density calculations.

Derived variables Calving rate The livestock register data did not contain detailed information on the age and gender composition of the cattle population (other than calf numbers) and therefore the calving rate reported represents calves born as a proportion of the total cattle population, rather than just cows, as is the convention. Conception time series Based on the calving data, a conception time series was created using a nine-month gestation period lag. Net permit movements The net movement of animals by means of permits was calculated by subtracting outgoing permits from incoming permits at each IP. Negative permit figures therefore reflect a net movement away from an IP. Off-take Since some of the data only contained net permit movements, we could not separate incoming and outgoing permits to work out crude off-take/import rates. Hence, we only calculated net off-take, being the sum of net permits movements and slaughter per year in relation to the median number of cattle at a particular IP within the same year. Own/local consumption Total number of cattle slaughtered per year in relation to the median number of owners at that IP, during the same year.

Cattle/owner ratio Since these data were aggregated per IP, we did not have numbers at the owner level. The cattle to owner ratio was thus used as a proxy for the mean herd size at a given IP location, even though a large amount of the variability in individual herd size was inherently lost (Behnke Jr., 1987; Swanepoel et al., 2000; Mapiye et al., 2009a). Proportion of households keeping cattle The number of cattle owners divided by the census households in the area of the IP can be used as an indication of the proportion of census households keeping cattle (making the assumption that no more than one cattle owner resides in each household).

Missing data In the three study sites (North, Centre and South), 68.8, 84.3 and 71.6% of the monthly records, respectively, were complete for the study period (74.9% overall). Missing data points were imputed by linear interpolation and by carrying forward or backward the closest observation where missing values were at the beginning or end of a time series. Missing net permit movements for the current month (Pnet,t) were calculated after interpolation of the total number of cattle (Nt) in the current month (t), the number of cattle in the previous month (Nt1), the births in current month (Bt), the mortality in current month (Mt) and the number of animals locally sold/slaughtered in the current month (St) using the following formula:

Density Areas used for the calculation of cattle densities in the southern site were based on historic farm boundaries (so-called parent farms). No such farm boundaries existed for the northern and central sites, so areas here were based on Thiessen polygons derived from the location of the IPs. Thiessen polygons define the area around sample points, so that any location within each polygon is closest to its sample point (ESRI, 2009). The choice of area delineation was confirmed by cattle tracking data collected in a related study in all three sites as described by Van Schalkwyk (2015). Where subdivisions of IPs existed (e.g.

Pnet,t=Nt + Bt – Mt – St – Nt-1 Linear interpolation had no influence on the variance of the final dataset. Two IPs opened and one closed during the study period; data for the periods when these IPs did not function were not imputed and instead retained as true missing values.

Environmental data The normalised difference vegetation index (NDVI) is a ratio of the

Table 1. Attributes recorded per livestock inspection point from the monthly aggregates of the livestock register for each of the study sites. Attribute Number of cattle owned Number of calves in herd Number of calves born since last inspection Mortality since last inspection Calf mortality since last inspection Local sales and animals slaughtered Permits

Remarks Number of cattle registered per IP Estimate of the number of calves up to one year of age at each IP – No cause of death recorded. It also includes stock theft No cause of death recorded. It also includes stock theft Home slaughter for own consumption or selling to, and slaughter by, a local butcher within the same IP area, which requires no movement permit It could include permits for buying/selling, giving/receiving or borrowing/lending of livestock to/ from someone outside the IP area. No registered abattoirs/feedlots exist in any of the study sites, and any commercial sale to abattoirs/feedlots would thus also fall under permit movements, rather than local sales and slaughtered.

IP, inspection point.

[Geospatial Health 2016; 11:338]

[page 85]


gh-2016_2.qxp_Hrev_master 31/05/16 11:44 Pagina 86

Article

9.1.1 (Kulldorff, 1997). Figures reported in text refer to the median followed by the interquartile range in brackets, unless otherwise specified.

red and near infrared reflectance as measured by the Moderate Resolution Imaging Spectroradiometer (MODIS) and is generally used to assess the health and density of vegetation (Carroll et al., 2004; Klingseisen et al., 2013). NDVI data from 2001 to 2008 with a 16-day interval were obtained from the United States Geological Survey (2013) and de-noised using a cubic spline interpolation (Scharlemann et al., 2008). Land cover data at the 30 m resolution (GeoterraImage, 2008) derived from Landsat 5 (http://landsat.gsfc.nasa.gov/?P=3180) was recategorised into four categories: i) areas potentially suitable as grazing areas, i.e. open woodland, open sparse bush-land, open sparse grassland and bush-land and thicket (here called Grazing); ii) urban areas (here called Urban); iii) non-wet bare areas, which have low potential for grazing (here called Bare); and iv) dry-land subsistence cultivation areas (here called Subsistence). Rainfall per IP was based on its nearest weather station and aggregated per month from daily data obtained from the South African Weather Service for the period 1998-2008. The year refers here to the 12-month period from 1 July through 30 June. This was used in the analysis of the livestock data rather than calendar years to avoid splitting rainy and/or calving seasons, which peak around the beginning/end of the calendar year.

Time series decomposition This was used to split time series values for livestock data into i) a trend component (calculated by a moving average: symmetric window with equal weights); ii) a repeating pattern component within each year (also known as the seasonal component), which averages each point in the time series over the entire period, after trend removal; and iii) the remainder (noise) (decompose function of the R Stats package; Cowpertwait and Metcalfe, 2009).

Index The Gini index was used to assess seasonality in birth, mortality and off-take. It measures statistical dispersion in a dataset, and can be seen as a quantitative measure of clustering in data (temporal in this case). A Gini index of zero signifies perfect equality, e.g. all months having equal calving frequencies, compared to one, where all births would occur in a single month (Gastwirth, 1972; Lee, 1996).

Cross-correlation

Statistical analysis

The cross-correlation function in the Stats package of R was used to compute the correlation between two univariate time series spanning the same period, each at a different lag period. Where necessary, time series were log transformed. Since we were also interested in correlation of the seasonal components of these time series, seasonal components in the time series were not removed (Cowpertwait and Metcalfe, 2009).

General statistical analyses were conducted using R (R Core Team, 2013) and Microsoft Excel 2007 (Microsoft Corporation), while spatial analyses were done using IDRISI Andes (Eastman, 2006), ArcGIS, version 9.3.1 (ESRI, 2009) and Hawth’s Analysis Tools for ArcGIS (Beyer, 2004). Space-time cluster analysis was done using a discrete Poisson space-time scan statistic in the SatScan software package, version

Table 2. Overview of the biophysical and demographic characteristics of the study sites.

Area (km2) Mean elevation (m)* Mean annual rainfall (range) (mm)° Mean temperature (range) (°C)° Population density (people/km2)# Household density (households/km2)#

North

Site Central

South

405 366 310 (200-400) 23.5 (10.2-34.5) 22.4 5.5

965 403 520 (400-700) 22.2 (8.8-32.8) 62.0 13.3

321 413 680 (550-850) 21.5 (8.1-31.5) 565.7 122.5

*Aster 30m Digital Elevation Model (2009) [Ministry of Economy, Trade, and Industry (METI) of Japan and the United States National Aeronautics and Space Administration (NASA)]; °Hijmans et al. (2005); #Statistics South Africa (2001).

Table 3. Predominant land cover of the three study sites (%). Study site North Central South

[page 86]

Grazing

Urban

Land cover Bare

Subsistence cultivation

Other

91.4 78.4 43.9

0.8 4.0 16.7

3.1 4.8 10.3

3.9 11.1 24.8

0.8 1.6 4.4

[Geospatial Health 2016; 11:338]


gh-2016_2.qxp_Hrev_master 31/05/16 11:44 Pagina 87

Article

Results Land use Table 2 gives an overview of the biophysical and demographic characteristics of the study sites. A strong, latitudinal gradient in human population densities can be observed, with the lowest densities occurring in the North. This gradient is also reflected in the land cover (Table 3) of the three study sites. The relatively high proportion of subsistence cultivation found in the southern site was not proportional to the increase in population, with the area of subsistence cultivation per household higher in the northern [1.10 ha (0.57-2.16)] and central site [0.90 ha (0.54-1.33)] than in the South [0.24 ha (0.170.30)]. Despite subsistence cultivation areas providing potential seasonal fodder (Herrero et al., 2010; Rocha et al., 1991) and bare areas being lush green with annual grass and herbaceous plants in the early rainy season (personal observation), they were not considered as potential grazing areas due to the inconsistent nature in this respect. Less than half (43.9%) of the southern site could be regarded as potential grazing area, compared to 91.4% in the northern area and 78.4% in the centre.

Cattle numbers and densities Total cattle numbers for each study site are shown in Figure 2 with the dashed line depicting the temporal trend after removal of seasonal fluctuations through time series decomposition of the data.

The northern and central sites showed an overall decrease of 15.1 and 2.9%, respectively, in total cattle numbers over the five-year study period, whereas a 28.6% increase in the number of cattle was recorded in the South, corresponding to an annual growth rate of 5.15%. Notwithstanding, fluctuations of 35-42% were recorded in all threestudy sites over the study period. Time series decomposition revealed a seasonal pattern in the cattle numbers of all three study sites. The fluctuation ascribed to seasonal variation, evaluated as a percentage of the median number of cattle in the study area, showed that seasonal variation was only responsible for 4-6% of the fluctuation in the northern and central sites, and less than two percent in the South. The median proportion of calves in the cattle population in the other two sites differed significantly from the central site, i.e. 19.37% (14.1623.40) vs 11.91% (8.21-16.98) and 10.71% (6.95-16.53) in the northern and southern sites, respectively (Kruskal-Wallis rank sum test; P<0.01). The median cattle density for the northern, central and southern sites during the study period was 11.6 (9.4-14.2), 18.6 (15.1-22.4) and 32.7 (27.5-37.1) animals per km2, respectively. A combination of human population density and annual rainfall could explain 71% of the variation in cattle density recorded per IP (R2=0.71; P<0.01). When considering all IPs across the three study sites, a 68.2% correlation between the cattle density and area of subsistence cultivation per household was evident (t=12.56, P<0.01). Within each study site, however, this correlation was less pronounced, and it was only signif-

Figure 2. Total cattle numbers in each study site (temporal trend shown as a dashed line).

[Geospatial Health 2016; 11:338]

[page 87]


gh-2016_2.qxp_Hrev_master 31/05/16 11:44 Pagina 88

Article

icant in the northern site (33.6%, Pearsonâ&#x20AC;&#x2122;s product moment coefficient; t=2.34, P<0.02).

Owner ratio The median number of cattle owners in the northern, central and southern sites was 292 (280-295), 1150 (1122-1168) and 1282 (12701290), respectively. These numbers remained remarkably stable during the study period in comparison with the cattle numbers. The maximum fluctuation in the number of owners reached 20% in the North and Centre, while the southern site was more stable, recording a maximum fluctuation of only six percent. The median cattle/owner ratio in the North, Centre and South was 16.9 (12.7-21.9), 12.7 (11.2-14.0) and 8.1 (6.8-10.3), respectively, with a significant difference between the three study sites (Kruskal-Wallis rank sum test; P<0.01). A significantly lower percentage of households own cattle in the South at 4.0% (2.4-4.6), compared to 13.7% (10.0-33.2) and 12.7% (9.3-15.1) in the North and Centre, respectively (Kruskal-Wallis rank sum test; P<0.01).

Calves and calving patterns Calving peaked around December/January of each year with more than half (55.4%) the annual calf crop born from November to the end of February (four months). The period between the beginning of June and the end of August (three months) had the lowest calving percent-

ages, only producing 14.5% of the annual crop. This temporal pattern was more pronounced in the central site, where 62.9% of all calves were born from November to February. A median annual calving rate per IP of 23.82% (17.19-29.91) was recorded for the entire study area. Although the calving rates varied between the years studied, there were no significant differences in annual calving rates between the study sites over the total study period, other than the significantly higher calving rate for the central site during the 2006/2007 years compared to the other two study sites (Kruskal-Wallis rank sum test, P<0.01). Table 4 shows the calving rate and the Gini index outcome per study site for each year. It is noteworthy that the three peak calving months in all study sites over the four calving seasons were the same (January, December and February â&#x20AC;&#x201C; in that order). To assess the influence of rainfall and/or NDVI on conception rates, a cross-correlation analysis of the conception rate time series (log transformed) compared to the NDVI (mean) and rainfall (log transformed) time series was conducted. This revealed a significant relationship between conception rate and both these covariates (P<0.05) with both covariates leading conception in time, but displaying different time lags. Table 5 shows the lag between peak rainfall/NDVI and peak conceptions.

Mortality Annual mortality rates recorded per IP for the northern, central and southern sites during the study period were 2.75% (1.61-4.06), 2.35%

Table 4. Calving rate, median Gini index and top three calving months.

North Calving rate Gini index Peak months Central Calving rate Gini index Peak months South Calving rate Gini index Peak months

2003/2004

2004/2005

2005/2006

2006/2007

All years

0.21 (0.15-0.28) 0.39 Mar, Jan, Feb

0.31 (0.28-0.32) 0.51 May, Feb, Jan

0.21 (0.15-0.23) 0.63 Jan, Dec, Feb

0.23 (0.17-0.28) 0.57 Jan, Dec, Nov

0.24 (0.16-0.30) 0.53 Jan, Dec, Feb

0.25 (0.14-0.26) 0.46 Jan, Nov, Dec

0.25 (0.23-0.28) 0.62 Dec, Jan, Nov

0.18 (0.14-0.23) 0.55 Dec, Feb, Jan

0.33 (0.29-0.35) 0.55 Jan, Dec, Feb

0.25 (0.20-0.31) 0.55 Jan, Dec, Feb

0.22 (0.18-0.28) 0.48 Jan, May, Feb

0.27 (0.20-0.42) 0.53 Jan, Dec, Apr

0.21 (0.11-0.27) 0.34 Dec, Jan, Sep

0.22 (0.18-0.25) 0.46 Dec, Jan, Nov

0.23 (0.17-0.28) 0.47 Jan, Dec, Feb

Table 5. Time lag (months) between peak rainfall/normalised difference vegetation index and peak conceptions. Study site Rainfall Lag (months preceeding) Correlation coefficient (P<0.05) North Central South

3 2 2

NDVI Lag (months preceeding) Correlation coefficient (P<0.05)

0.44 0.65 0.34

NDVI, normalised difference vegetation index.

[page 88]

[Geospatial Health 2016; 11:338]

2 0 1

0.60 0.64 0.43


gh-2016_2.qxp_Hrev_master 31/05/16 11:44 Pagina 89

Article

(1.70-3.54) and 3.16% (2.31-5.08), respectively. This was however highly variable from year to year, with the central site reporting a mortality rate of 14.42% (3.37-35.31) during 2005/2006. A space-time cluster analysis, using a discrete Poisson model, revealed a significant cluster of increased mortality (relative risk: 21.32; P<0.01) in the northernmost six IPs of the central site for the period October 2005 to April 2006, while a secondary cluster (relative risk: 15.47; P<0.01), spanning exactly the same period and completely overlapping the primary cluster, extended further north to include the entire northern site. Surprisingly, none of the concurrent NDVI or rainfall values for the affected IPs were significantly lower than those of the IPs outside either the clusters. However, rainfall and NDVI values from the previous season were significantly lower in these clusters than in the rest of the centre and all of the two other study sites (P<0.01 for both clusters). Another latitudinal gradient was observed in calf mortality, with the North recording significantly lower calf mortality than the other two study sites (Kruskal-Wallis rank sum test; P<0.01). Actual numbers aggregated for the entire northern site over the four years, gave a remarkably low median calf mortality of 2.1%, while at an IP level, across all years, three quarters of the reported annual calf mortality rates were below 2.9%. This is in stark contrast to the South, where the median calf mortality per IP was 11.6% (5.2-21.4). In the central site, very high calf mortality was reported in 2005 and 2006 [14.6% (6.3-21.3)] compared to the other three years studied [6.3% (3.39.3)]. The northern and southern sites did not show any particular temporal pattern in calf mortality within the years, while in the central site, the calf mortality was most concentrated in the short period between January and February (Gini index: 0.28). While no significant relationship between cattle density and annual mortality (all age groups) was found, calf mortality did show a positive correlation with cattle density, even though it only accounted for 21% of the variation in calf mortality (R2: 0.21; P<0.01). Adding the negative correlation that calf mortality showed with cattle/owner ratio, explained 30% of the variation in calf mortality (R2: 0.30; P<0.01), whilst the same combination could explain very little of the variation in the annual mortality rates for the total population (R2: 0.06, P<0.05). This effect was more pronounced in the South compared to the other study sites.

0.4)], where owners were slightly more inclined to move animals towards the IP than away from it, i.e. buy or receive animals. In general, these permit movements and own/local consumption data indicate that owners from IPs in the North were 26 times more likely to move an animal away (sell/give) from the IP than to slaughter it locally. This contrasts with the central site, where an owner was four times more likely to slaughter an animal than move it away from the IP, and with the southern site, where an owner was twelve times more likely to slaughter one of his own animals than to move an animal towards the IP (buy/receive). Significant correlation (P<0.05) was found between slaughter and mortality in the central and southern sites. Cross-correlation analysis (Figure 4) showed that in the central site, slaughter and permit movements lead mortality rates by up to five months. In the South, slaughter and mortality peaked at the same time, or peak mortality preceded slaughter by one month. In the central site, at ten and twelve months after peak mortality, a significant influx of animals through permits occurred, possibly indicating some level of restocking. The northern site only showed a significant correlation at a lag period of three and five months, with mortality leading outgoing permit movements. The central site had the highest median net off-take rate at 16.09% (7.75-22.78), compared to the North at 7.23% (0.6620.77) and the South at 8.43% (3.96-16.62). The median off-take rate across all sites and years was 10.8% (4.35-19.20). Some concentration of off-take occurred during the winter months, mostly in the North (Gini index: 0.31).

Discussion In just over a decade, at our Southern site, the urban areas have increased by 39% since 1993, resulting in both expanded as well as denser settlements with concomitant losses in natural vegetation and restriction of rangelands (Lambin et al., 2001; Coetzer et al., 2010; Herrero et al., 2010). The development poses a potential threat to the natural resource base that is crucial not just to subsistence livestock farming, but also to the many people whose livelihood depends on the areaâ&#x20AC;&#x2122;s wide range of natural products (Shackleton et al., 2001; Dovie et

Off-take The number of animals slaughtered as a percentage of the median number of animals per IP during the same time period, showed a significant difference between the North and the other two study sites, with annual slaughter percentages of 0.2% (0-1.4), 11.7% (9.2-16.5) and 11.0% (7.6-15.2) for the North, Centre and South, respectively (Kruskal-Wallis rank sum test; P<0.01). Although no distinct temporal (monthly) pattern was detected in the slaughter behaviour of the IPs, the greatest number of animals slaughtered over the entire study period was reported in January (10.3%). The highest Gini index was detected in the North (0.34), where slaughter peaked equally in January and April. Annually, and in relation to the median number of owners per IP, very low local meat consumption rates were observed in the North [0.04 cattle slaughtered per owner (0-0.16)], compared to the Central [1.6 (1.1-2.1)] and South [1 (0.6-1.1)] sites. To get a better idea of the net movement of animals, as opposed to own/local consumption, we looked at the per-IP aggregated net permit movements per year, divided by the median number of owners at the same IP during the same period. Figure 3 clearly shows the tendency of owners to move cattle away from the IP in the North [-1 (-2.8 â&#x20AC;&#x201C; 0.05)] and Centre [-0.4 (-1.2-0.1)], compared to the South [0.1 (-0.1-

Figure 3. Net permit movements at livestock inspection points during the study period in each study site.

[Geospatial Health 2016; 11:338]

[page 89]


gh-2016_2.qxp_Hrev_master 31/05/16 11:44 Pagina 90

Article

al., 2003; Coetzer et al., 2010). This urban expansion was not necessarily a result of human population increase, which in fact decreased slightly over a similar period (Statistics South Africa, 2008). However, importantly, the cattle population in the southern site grew about five times more than the growth in the number of cattle owners during the same period. Notwithstanding, the increase of 168% in cattle had been reported previously for a six-year period in a specific village in Bushbuckridge (albeit a period immediately after the severe drought of 1992) (Dovie et al., 2006), which was similar to our findings, mainly due to an increase in herd size rather than numbers of livestock owners (Scogings et al., 1999; Shackleton et al., 2005). Herd accumulation in communal systems when optimal conditions prevail, may thus act as a form of insurance against adverse events (McPeak and Barrett, 2001). The cattle densities we report (11.6-32.7 animals per km2), driven primarily by annual rainfall and human population density (Scoones, 1992b; Bourn and Wint, 1994; Wint and Robinson, 2007), fall within a similar range as reported elsewhere. However, we also found a positive correlation between cattle density and subsistence cultivation at a broad scale (across all study sites), which has been suggested as an important determinant of cattle distribution (Bourn and Wint, 1994; Wint and Robinson, 2007) through provision of post harvest supple-

mentary fodder (Rocha et al., 1991; DĂźvel and Afful, 1996; Herrero et al., 2010). Regardless, no association between mortality or calving rates and the area of subsistence cultivation available (Mukhebi et al., 1991) could be detected in this study. The actual utilisation of crops as supplementary fodder appeared to be low as noted by Scoones (1992a) and Dovie et al. (2006) previously and buying of supplementary fodder was not common either. Our approximation of the proportion of households owning cattle was lower than reported by others (Barrett, 1991; Shackleton et al., 2005; Moll, 2005; Dovie et al., 2006), especially in the South, where people appeared less inclined to practice agriculture compared to, for example, the northern site. Supporting evidence comes from the lower proportion of cattle-owning households, the smaller cattle to owner ratio, and the less than one third of a hectare of land cultivated per household. While this could be ascribed to other income generating opportunities in these highly populated areas in the South, it does not necessarily equate to a lower dependence on livestock (Shackleton et al., 2005; Dovie et al., 2006), especially when considering the much higher local meat consumption, most likely demand driven, in the South compared to the North. Concentrated or seasonal calving as a consequence of wet season conceptions that closely follow rainfall and NDVI increases, similar to what we observed, has been reported previ-

Figure 4. Cross-correlation between mortality and slaughter as well as net permit movements during the study period. A positive lag signifies mortalities leading to slaughter/permit movements. Outgoing permit movements were recorded as negative values and would thus result in negative cross-correlation values when outgoing permit movements predominated. Dashed horizontal lines signify the 95% significance level with dark bars falling within this significance threshold.

[page 90]

[Geospatial Health 2016; 11:338]


gh-2016_2.qxp_Hrev_master 31/05/16 11:44 Pagina 91

Article

ously in cattle where no manipulation of reproduction occurs (Nqeno et al., 2010; Rocha et al., 1991). A poor conception year followed by a higher than normal one, e.g. in the Centre during 2006/2007, a phenomenon previously described (Scoones, 1992a), is most likely related to the number of cows that are not pregnant or lactating following a low conception year and consequently not available for mating during the following year. This comes at a potential cost, since highly fertile cows that do conceive during these difficult years often succumb due to the nutritional strain demanded by pregnancy or lactation, a situation that effectively selects against higher fertility (Moyo, 1996; Swanepoel et al., 2000; Desta and Coppock, 2002; Stroebel, 2004; Stroebel et al., 2008). Since our calving rates are based on the entire herd, not just the cows, it is difficult to compare our results directly to comparable figures published, although studies in areas similar to our study sites suggest that the cow proportion of herds is approximately 50% (Swanepoel et al., 2000; Shackleton et al., 2005; Stroebel et al., 2011). If so, our calving rates are very similar to what others have reported in communal herds (Behnke Jr., 1985; Rocha et al., 1991; Nthakheni, 2006; Angassa and Oba, 2007; Ba et al., 2011). A 50% calving rate, suggesting an average of one calf per cow every two years, is quite good in reproductive terms, especially considering the general lack of forced weaning (Rocha et al., 1991) and the low number of farmers who provide supplementary feed (Scoones, 1992a; Dovie et al., 2006). We were however unable to quantify embryonic loss, abortion or stillbirths; neither could we directly estimate the role of disease, if any, in reproduction. Studies in the vicinity of our central and northern sites have reported 20-23% calves in the population with lower proportions observed in drier years (Swanepoel et al., 2000; Mahabile et al., 2005, Stroebel et al., 2011), which is similar to our findings in the central site. The annual mortality rates reported in the three study sites were relatively low (0-16%) compared to those reported in the literature (030%) (Rocha et al., 1991; Scoones, 1992b; Shackleton et al., 2005). Even though we did not experience any drought during the study (which can cause mortality rates up to 70%; (Mukhebi et al., 1991; Moyo, 1996; Swanepoel et al., 2000; Barrett, 2001; Desta and Coppock, 2002; McPeak and Barrett, 2001), the significant relationship between elevated mortality and the previous seasonâ&#x20AC;&#x2122;s rainfall/NDVI, rather than that of the concurrent season, was an interesting finding. Obviously, mortality does not follow a linear relationship with precipitation and NDVI (as proxy for available fodder), but the animals are rather pushed to the brink of their nutritional resilience (Desta and Coppock, 2002; Angassa and Oba, 2007) as a longer-term consequence of these factors. Although the majority of mortality cases can generally be attributed to poor nutrition, especially during dry periods (Rocha et al., 1991; Desta and Coppock, 2002; Shackleton et al., 2005), we cannot answer this question, as our data did not distinguish between nutritional and disease-related mortality. Since we could not quantify stock theft (which is also recorded as mortality), it is important to keep it in mind as a confounder of reported mortality figures, especially in the more densely populated areas, where it is increasing (Rocha et al., 1991; Ainslie et al., 2002; Shackleton et al., 2005). A similar positive correlation between cattle as well as human density and calf mortality found in this study has been reported before (Lybbert et al., 2004). These authors further reported a negative correlation between herd size and calf mortality, which concurs at an IP level. These phenomena could stem from competition for milk, both with other calves and with humans (Wilson and Clarke, 1976; Cossins and Upton, 1987), especially in the densely populated areas with limited grazing areas, such as in the southern site. Off-take rates recorded during our study are slightly elevated when

compared with other studies (Mukhebi et al., 1991; Rocha et al., 1991; Scoones, 1992b; Swanepoel et al., 2000; Shackleton et al., 2005; Mahabile et al., 2005; Stroebel et al., 2011). In the South, where there is high local demand, slaughter rates were correspondingly high; however, the off-take often surpassed what natural production could replace, hence the net import of animals to IPs observed there. This high off-take rate is further exacerbated by small herd sizes and high calf mortality. The high proportion of meat consumption by the owners of small herds in densely populated areas, which is not unexpected (Mukhebi et al., 1991), might be regarded as an encouraging increase of the integration of livestock into local markets (Jones and Thornton, 2009). In the North, off-take was dominated by movement of animals away from the IP, with local slaughter rates very low and peaking close to the Christmas/New Year period, start of the first school term and the Easter holidays, which might indicate some level of market intelligence. In the Centre, on the other hand, local consumption was sufficiently low in relation to natural production to allow movements away from the IPs as an additional off-take strategy. The central site had the highest off-take rate of the three study sites, confirming that overall the rates are not necessarily the highest in areas with the largest herds (Musemwa et al., 2008; Ba et al., 2011). Our findings are most likely explained by the general lack of local markets in combination with the inverse relation between herd size and human density, limiting local opportunities to sell or slaughter for those owners with bigger herds. However, our findings of a general inverse association between movements/sales (i.e. not local/own consumption) and rainfall, concur with the findings of others (Shackleton et al., 2005). A great variety of responses of livestock owners to adverse conditions, such as increased slaughter, increased and decreased sales, increased movements, reduction in herd size and even complete destocking have been reported in a number of communal areas in the past (Scoones, 1992a; McPeak, 2004, Shackleton et al., 2005). What is noteworthy in this study is the apparent proactive risk aversion strategy of owners in the Centre, which occurred both through slaughter and movements away from IPs a number of months before mortality peaked during an extended dry period as well as ostensible restocking through increased importation of animals in the more favourable months that followed. In the South, on the other hand, the timing of the response to increased mortality rates seemed to be more a case of salvage (or distress) slaughter than risk aversion, possibly because the smaller herd sizes in this study site did not allow large-scale preemptive sales/slaughter without complete, or near-complete, destocking. Surprisingly, the northern site seemed almost indifferent towards anticipated mortality, which might be a consequence of their larger herds that could buffer losses, as well as the low demand for meat from local slaughter. Off-take through (non-local) sales is not a simple matter in these study sites, especially considering the lack of commercial abattoirs or feedlots as well as the distances involved in travelling to sell cattle, often for only a few animals (Musemwa et al., 2008). Although very low levels of competition often leads to poor prices, this problem is frequently overcome through speculators and auctions; hence the tendency to sell locally when demand allows it (Nkhori, 2004). From the disease risk perspective, the sloping cattle density gradient from north to south is notable, especially where direct contact with FMD must be considered, both from a likelihood of interaction with buffaloes as well as cattle to cattle spread. Also, the seasonality of calving is useful in prospectively determining the period at which most young animals would be losing their maternal immunity and would be most susceptible to disease. This is helpful in the spatio-temporal risk profiling as well as the implementation of vaccination programmes,

[Geospatial Health 2016; 11:338]

[page 91]


gh-2016_2.qxp_Hrev_master 31/05/16 11:44 Pagina 92

Article

especially in the case of FMD, where buffaloes and cattle tend to show very similar seasonality in their calving patterns (Ryan et al., 2007) with possible linkages between young animals and infectivity (Bengis et al., 1986; Thomson, 1996; Vosloo et al., 2005). Furthermore, off-take strategies could influence disease spread, especially over larger distances, while local meat consumption patterns could influence zoonotic disease risk.

Conclusions The findings we present here clearly show that a great deal of heterogeneity exists in the communal livestock component of the KNP and APNR wildlife-livestock-human interface, not only in its physical attributes, but also in the way people and animals respond to, and interact with, these attributes and each other. Such activities are important to consider, especially in disease control strategies and disease risk assessments, where they are often oversimplified or unknowingly ignored due to their indirect nature. While a number of findings could be broken down to show more details, they already increase our insight into how this unique system operates enabling us to employ risk assessment and control strategies that not only are more effective from a disease prevention point of view, but also have the least possible negative impact on the system itself.

References Abel N, 1997. Mis-measurement of the productivity and sustainability of African communal rangelands: a case study and some principles from Botswana. Ecol Econ 23:113-33. Ainslie A, Kepe T, Ntsebeza L, Ntshona Z, Turner S, 2002. Programme for land and agrarian studies: cattle ownership and production in the communal areas of the Eastern Cape, South Africa. University of the Western Cape, Bellville, Cape Town. Available from: http://www.plaas.org.za/sites/default/files/publicationspdf/RR10.pdf Angassa A, Oba G, 2007. Relating long-term rainfall variability to cattle population dynamics in communal rangelands and a government ranch in southern Ethiopia. Agric Syst 94:715-25. Ba A, Lesnoff M, Poccard-Chapuis R, Moulin CH, 2011. Demographic dynamics and off-take of cattle herds in southern Mali. Trop Anim Health Pro 43:1101-9. Barrett JC, 1991. Pastoral development network papers: the economic role of cattle in communal farming systems in Zimbabwe. Overseas Development Institute, London, UK. Behnke Jr RH, 1985. Measuring the benefits of subsistence vs commercial livestock production in Africa. Agric Syst 16:109-35. Behnke Jr RH, 1987. Cattle accumulation and the commercialization of the traditional livestock industry in Botswana. Agric Syst 24:129. Bengis RG, Thomson GR, Hedger RS, De Vos V, Pini A, 1986. Foot-andmouth disease and the African buffalo (Syncerus caffer). 1. Carriers as a source of infection for cattle. Onderstepoort J Vet Res 53:69-73. Beyer HL, 2004. Hawth's analysis tools for ArcGIS. Version: 3.27. Available from: http://www.spatialecology.com/htools Bourn D, Wint W, 1994. Pastoral development network papers: livestock, land use and agricultural intensification in Sub-Saharan [page 92]

Africa. Overseas Development Institute, London, UK. BrĂźckner GK, Vosloo W, Du Plessis BJA, Kloeck PELG, Connoway L, Ekron MD, Weaver DB, Dickason CJ, Schreuder FJ, Marais T, Mogajane ME, 2002. Foot and mouth disease: the experience of South Africa. Available from: www.oie.int/doc/ged/d497.pdf Carroll ML, DiMiceli CM, Sohlberg RA, Townshend JRG, 2004. 250m MODIS normalized difference vegetation index. University of Maryland, College Park, MA, USA. Coetzer KL, Erasmus BFN, Witkowski ETF, Bachoo AK, 2010. Landcover change in the Kruger to Canyons Biosphere Reserve (19932006): a first step towards creating a conservation plan for the subregion. S Afr J Sci 106:1-10. Cossins NJ, Upton M, 1987. The Borana pastoral system of Southern Ethiopia. Agric Syst 25:199-218. Cowpertwait PSP, Metcalfe AV, 2009. Introductory time series with R. Springer-Verlag, New York, NY, USA. Desta S, Coppock DL, 2002. Cattle population dynamics in the southern Ethiopian rangelands, 1980-97. J Range Manage 55:439-51. Dovie DBK, Shackleton CM, Witkowski ETF, 2006. Valuation of communal area livestock benefits, rural livelihoods and related policy issues. Land Use Policy 23:260-71. Dovie DBK, Witkowski ETF, Shackleton CM, 2003. Direct-use value of smallholder crop production in a semi-arid rural South African village. Agric Syst 76:337-57. DĂźvel GH, Afful DB, 1996. Sociocultural constraints on sustainable cattle production in some communal areas of South Africa. Development Southern Africa 13:429-40. Eastman JR, 2006. Idrisi Andes. Clark University, Worcester, MA, USA. ESRI, 2009. Available from: http://www.esri.com/ Gastwirth JL, 1972. The estimation of the Lorenz curve and Gini index. Rev Econ Stat 54:306-16. GeoterraImage, 2008. Land-cover classification for peace parks foundation: Greater Limpopo Transfrontier Park priority plus Banhine plus Kruger National Park (West) dataset. GeoterraImage (Pty) Ltd, Pretoria, South Africa. Giannecchini M, Twine W, Vogel C, 2007. Land-cover change and human-environment interactions in a rural cultural landscape in South Africa. Geogr J 173:26-42. Goqwana WM, Machingura C, Mdlulwa Z, Mkhari R, Mmolaeng O, Selomane AO, 2008. A facilitated process towards finding options for improved livestock production in the communal areas of Sterkspruit in the Eastern Cape province, South Africa. Afr J Range Forage Sci 25:63-9. Herrero M, Thornton PK, Gerber P, van der Zijpp A, van de Steeg J, Notenbaert AM, Lecomte P, Tarawali S, Grace D, 2010. The way forward for livestock and the environment. In: Swanepoel F, Stroebel A, Moyo S, eds. The role of livestock in developing communities: enhancing multifunctionality. University of the Free State and CTA, Cape Town, South Africa, pp 51-76. Hijmans RJ, Cameron SE, Parra JL, Jones PG, Jarvis A, 2005. Very high resolution interpolated climate surfaces for global land areas. Int J Climatol 25:1965-78. Jones PG, Thornton PK, 2009. Croppers to livestock keepers: livelihood transitions to 2050 in Africa due to climate change. Environ Sci 12:427-37. Joubert S, 2007. The era 1946 to 1960. In: Joubert S, ed. The Kruger National Park: a history. High Branching (Pty) Ltd, Johannesburg, South Africa. Available from: http://www.thekruger.com/stories/ history_of_kruger_plant_life.htm Kirsten JF, Moldenhauer W, 2006. Measurement and analysis of rural household income in a dualistic economy: the case of South

[Geospatial Health 2016; 11:338]


gh-2016_2.qxp_Hrev_master 31/05/16 11:44 Pagina 93

Article

Africa. Agrekon 45:60-77. Klingseisen B, Stevenson M, Corner R, 2013. Prediction of Bluetongue virus seropositivity on pastoral properties in northern Australia using remotely sensed bioclimatic variables. Prev Vet Med 110:159-68. Kulldorff M, 1997. A spatial scan statistic. Commun Stat Theory Methods 26:1481-96. Lambin EF, Turner BL, Geist HJ, Agbola SB, Angelsen A, Bruce JW, Coomes OT, Dirzo R, Fischer G, Folke C, George PS, Homewood K, Imbernon J, Leemans R, Li X, Moran EF, Mortimore M, Ramakrishnan PS, Richards JF, Skånes H, Steffen W, Stone GD, Svedin U, Veldkamp TA, Vogel C, Xu J, 2001. The causes of landuse and land-cover change: moving beyond the myths. Global Environ Change 11:261-9. Lee WC, 1996. Analysis of seasonal data using the Lorenz curve and the associated Gini Index. Int J Epidemiol 25:426-34. Lybbert TJ, Barrett CB, Desta S, Coppock DL, 2004. Stochastic wealth dynamics and risk management among a poor population. Econ J 114:750-77. Mahabile M, Lyne MC, Panin A, 2005. An empirical analysis of factors affecting the productivity of livestock in southern Botswana. Agrekon 44:99-117. Mapiye C, Chimonyo M, Dzama K, 2009a. Seasonal dynamics, production potential and efficiency of cattle in the sweet and sour communal rangelands in South Africa. J Arid Environ 73:529-36. McPeak J, 2004. Contrasting income shocks with asset shocks: livestock sales in northern Kenya. Oxford Econ Pap 56:263-84. McPeak JG, Barrett CB, 2001. Differential risk exposure and stochastic poverty traps among East African pastoralists. Am J Agr Econ 83:674-9. Moll HAJ, 2005. Costs and benefits of livestock systems and the role of market and nonmarket relationships. Agr Econ 32:181-93. Moyo S, 1996. The productivity of indigenous and exotic beef breeds and their crosses at Matopos, Zimbabwe. PhD Thesis. University of Pretoria, Pretoria, South Africa. Mukhebi AW, Knipscheer HC, Sullivan G, 1991. The impact of foodcrop production on sustained livestock production in semi-arid regions of Kenya. Agric Syst 35:339-51. Musemwa L, Mushunje A, Chimonyo M, Fraser G, Mapiye C, Muchenje V, 2008. Nguni cattle marketing constraints and opportunities in the communal areas of South Africa: review. Afr J Agric Res 3:23945. Nkhori PA, 2004. The impact of transaction costs on the choice of cattle markets in Mahalapye District, Botswana. MSc thesis. University of Pretoria, Pretoria, South Africa: Available from: repository.up.ac.za/bitstream/handle/2263/26363/00dissertation.p df?sequence=1&isAllowed=y Nqeno N, Chimonyo M, Mapiye C, Marufu MC, 2010. Ovarian activity, conception and pregnancy patterns of cows in the semiarid communal rangelands in the Eastern Cape Province of South Africa. Anim Reprod Sci 118:140-7. Nthakheni ND, 2006. A livestock production systems study amongst resource-poor livestock owners in the Vhembe District of Limpopo Province. PhD Thesis. University of the Free State, Bloemfontein, South Africa. R Core Team, 2013. R: a language and environment for statistical computing. Version: 2.13. R Foundation for Statistical Computing, Vienna, Austria. Randolph TF, Schelling E, Grace D, Nicholson CF, Leroy JL, Cole DC, Demment MW, Omore A, Zinsstag J, Ruel M, 2007. Invited review: role of livestock in human nutrition and health for poverty reduc-

tion in developing countries. J Anim Sci 85:2788-800. Rocha A, Starkey P, Dionisio AC, 1991. Cattle production and utilisation in smallholder farming systems in Southern Mozambique. Agric Syst 37,:55-75. Ryan SJ, Knechtel CU, Getz WM, 2007. Ecological cues, gestation length, and birth timing in African buffalo (Syncerus caffer). Behav Ecol 18:635-44. Scharlemann JPW, Benz D, Hay SI, Purse BV, Tatem AJ, Wint GRW, Rogers DJ, 2008. Global data for ecology and epidemiology: a novel algorithm for temporal fourier processing MODIS data. PLoS One 3:e1408. Scogings P, de Bruyn T, Vetter S, 1999. Grazing into the future: Policy making for South African communal rangelands. Dev South Afr 16:403-14. Scoones I, Bishi A, Mapitse N, Moerane R, Penrith ML, Sibanda R, Thomson G, Wolmer W, 2010. Foot-and-mouth disease and market access: challenges for the beef industry in southern Africa. Pastoralism 1:135-64. Scoones I, 1992a. Coping with drought: responses of herders and livestock in contrasting Savanna environments in southern Zimbabwe. Hum Ecol 20:293-314. Scoones I, 1992b. The economic value of livestock in the communal areas of southern Zimbabwe. Agric Syst 39:339-59. Shackleton CM, Shackleton SE, Cousins B, 2001. The role of landbased strategies in rural livelihoods: the contribution of arable production, animal husbandry and natural resource harvesting in communal areas in South Africa. Dev South Afr 18:581-604. Shackleton CM, Shackleton SE, Netshiluvhi TR, Mathabela FR, 2005. The contribution and direct-use value of livestock to rural livelihoods in the Sand River catchment, South Africa. Afr J Range Forage Sci 22:127-40. Statistics South Africa, 2001. Census 2001: small area statistics. Statistics South Africa, Pretoria, South Africa. Statistics South Africa, 2008. Community survey 2007: statistical release basic results municipalities. Statistics South Africa, Pretoria, South Africa. Stroebel A, 2004. Socio-economic complexities of smallholder resource-poor ruminant livestock production systems in SubSaharan Africa. PhD Thesis. University of the Free State, Bloemfontein, South Africa. Stroebel A, Swanepoel FJC, Nthakheni ND, Nesamvuni AE, Taylor G, 2008. Benefits obtained from cattle by smallholder farmers: a case study of Limpopo Province, South Africa. Aust J Exp Agric 48:8258. Stroebel A, Swanepoel FJC, Pell AN, 2011. Sustainable smallholder livestock systems: a case study of Limpopo Province, South Africa. Livest Sci 139:186-90. Swanepoel FJC, Stroebel A, Nthakheni D, 2000. Productivity measures in small-holder livestock production systems and social development in southern Africa. Asian-Aus J Anim Sci 13:321-4. Thomson GR, 1996. Foot and mouth disease in the African buffalo. In: Penzhorn BL, ed. African buffalo as a game ranch animal. University of Pretoria, Pretoria, South Africa, pp 113-6. Vandamme E, D’Haese M, Speelman S, D’Haese L, 2010. Livestock against risk and vulnerability: multifunctionality of livestock keeping in Burundi. In: Swanepoel F, Stroebel A, Moyo S, eds. The role of livestock in developing communities: enhancing multifunctionality. University of the Free State and CTA, Cape Town, South Africa, pp 106-121. Van Schlalkwyk OL, 2015. A spatio-temporal probability model of cattle and African buffalo (Syncerus caffer) contact as a proxy for foot-

[Geospatial Health 2016; 11:338]

[page 93]


gh-2016_2.qxp_Hrev_master 31/05/16 11:44 Pagina 94

Article

and-mouth disease risk: a case study at the wildlife-livestock interface of the Kruger National Park, South Africa. PhD Thesis. University of Pretoria, Pretoria, South Africa. Vosloo W, Bastos ADS, Boshoff CI, 2006. Retrospective genetic analysis of SAT-1 type foot-and-mouth disease outbreaks in southern Africa. Arch Virol 151:285-98. Vosloo W, Bastos ADS, Sahle M, Sangare O, Dwarka RM, 2005. Virus topotypes and the role of wildlife in foot and mouth disease in Africa. In: Osofsky SA, Cleaveland S, Karesh WB, Kock MD, Nyhus PJ, Starr L, Yang A, eds. Conservation and development interventions at the wildlifelivestock interface: implications for wildlife, livestock and human health. IUCN - The World Conservation

[page 94]

Union, Gland, Switzerland, pp 67-74. Vosloo W, Bastos ADS, Sangare O, Hargreaves SK, Thomson GR, 2002. Review of the status and control of foot and mouth disease in subSaharan Africa. Available from: http://web.oie.int/boutique/ extrait/11vosloo.pdf Wilson RT, Clarke SE, 1976. Studies on the livestock of Southern Darfur, Sudan. II. Production traits in cattle. Trop Anim Health Pro 8:47-57. Wint GRW, Robinson TP, 2007. Gridded livestock of the world 2007. Food and Agriculture Organization, Rome, Italy. Available from: ftp://ftp.fao.org/docrep/fao/010/a1259e/a1259e00.pdf

[Geospatial Health 2016; 11:338]


gh-2016_2.qxp_Hrev_master 31/05/16 11:44 Pagina 95

Geospatial Health 2016; volume 11:377

Dynamic risk model for Rift Valley fever outbreaks in Kenya based on climate and disease outbreak data David Gikungu,1 Jacob Wakhungu,2 Donald Siamba,2 Edward Neyole,2 Richard Muita,1 Bernard Bett3 1Kenya Meteorological Service, Nairobi; 2Masinde Muliro University of Science and Technology, Kakamega; 3International Livestock Research Institute, Nairobi, Kenya

Abstract Rift Valley fever (RVF) is a mosquito-borne viral zoonotic disease that occurs throughout sub-Saharan Africa, Egypt and the Arabian Peninsula, with heavy impact in affected countries. Outbreaks are episodic and related to climate variability, especially rainfall and flooding. Despite great strides towards better prediction of RVF epidemics, there is still no observed climate data-based warning system with sufficient lead time for appropriate response and mitigation. We present a dynamic risk model based on historical RVF outbreaks and observed meteorological data. The model uses 30-year data on rainfall, temper-

Correspondence: David Gikungu, Kenya Meteorological Service, P.O. Box 25725, 00603 Nairobi, Kenya. Tel: +254.722.624691 - Fax: +254.203.876955. E-mail: dgikungu@gmail.com; irungu@meteo.go.ke Key words: Rift Valley fever; Prediction model; Livestock; Early warning systems; Kenya. Acknowledgements: this work was done by means of archived secondary data within the affiliate institutions of some of the authors and did not, therefore, require direct funding. We are grateful to the Kenya Meteorological Department (KMD) for providing the weather data required for this work. We thank the International Research Institute for Climate and Society (IRI) for availing various forms of remote sensing rainfall and temperature data. We thank too the team of experts from the Center for Disaster Management and Humanitarian Assistance (CDMHA) of Masinde Muliro University of Science and Technology for their relevant comments. The views expressed herein are those of the authors and do not necessarily reflect the views of KMD or any of the other institutions mentioned. We also thank Geoffrey Ogutu and Andrew Njogu, both from KMD, for their input in the statistical analyses. Received for publication: 22 May 2015. Revision received: 15 July 2015. Accepted for publication: 18 October 2015. ©Copyright D. Gikungu et al., 2016 Licensee PAGEPress, Italy Geospatial Health 2016; 11:377 doi:10.4081/gh.2016.377 This article is distributed under the terms of the Creative Commons Attribution Noncommercial License (CC BY-NC 4.0) which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

ature, relative humidity, normalised difference vegetation index and sea surface temperature data as predictors. Our research on RVF focused on Garissa, Murang’a and Kwale counties in Kenya using a research design based on a correlational, experimental, and evaluational approach. The weather data were obtained from the Kenya Meteorological Department while the RVF data were acquired from International Livestock Research Institute, and the Department of Veterinary Services. Performance of the model was evaluated by using the first 70% of the data for calibration and the remaining 30% for validation. The assessed components of the model accurately predicted already observed RVF events. The Brier score for each of the models (ranging from 0.007 to 0.022) indicated high skill. The coefficient of determination (R2) was higher in Garissa (0.66) than in Murang’a (0.21) and Kwale (0.16). The discrepancy was attributed to data distribution differences and varying ecosystems. The model outputs should complement existing early warning systems to detect risk factors that predispose for RVF outbreaks.

Introduction Rift Valley fever (RVF) is a viral zoonosis that has had pronounced health and economic impacts in much of sub-Saharan Africa (Anyamba et al., 2010). This arbovirus has been responsible for devastating outbreaks of severe human and animal disease, which have gone beyond Africa, reaching the Arabian Peninsula in 2000 (Bird et al., 2008). The last major outbreak in East Africa took place 2006-2007 and is reported to have resulted in economic losses exceeding USD 60 million (Anyamba et al., 2010). RVF is a vector-borne disease caused by a virus that belongs to the family Bunyaviridae, genus Phlebovirus that affects domestic livestock such as sheep, cattle, camels and goats, in which animal species it causes abortions (Cook and Zumla, 2003) associated with high neonatal mortality (Davies and Martin, 2003). The RVF virus infects also humans (Soti et al., 2012). It is transmitted transovarially by Aedes or Culex mosquitoes (Hightower et al., 2012). Since its isolation and characterisation in Kenya in 1931, RVF has been seen to disproportionately affect vulnerable communities with poor resilience to economic and environmental challenge (WHO, 2009; Osman et al., 2013). Major outbreaks were experienced in Egypt in 1977-1978 and 1993, in the Senegal River Valley in 1987, in Madagascar in 1990, 1992 and 2008, in northern Kenya and Somalia in 1997, 1998 and 2007, in Saudi Arabia and Yemen in 2000, in Sudan in 2007 and in Southern Africa in 2010 (Soti et al., 2012), often with devastating consequences. For example, the outbreaks in Somali and Ethiopia led to a loss of USD 132 million following Saudi Arabia’s imposition of a trade ban on live animals from Ethiopia, Somalia and Kenya (Rich and Wanyoike,

[Geospatial Health 2016; 11:377]

[page 95]


gh-2016_2.qxp_Hrev_master 31/05/16 11:44 Pagina 96

Article

2010). In Kenya, the spread of RVF has been rising systematically, from the confines of a single district in the Rift Valley area in the 19121950 period to 55% of the national districts in 2007. As a result, various parts of the country have gradually become enzootic, resulting in periodic epizootics following the first report of the symptoms connected with this disease in 1912 (Murithi et al., 2010). The areas in Kenya where RVF is enzootic include Nakuru, Nairobi, Thika, Maragua, Laikipia, Uasin Gishu, Trans Nzoia, Kiambu, Machakos, Kilifi, and Kwale. Maragua sub-County in Murang’a has had 23 outbreaks since the first report in 1951; Garissa Central sub-County in Garissa has had 21 outbreaks since 1961, when the disease was first reported there, while Kwale in Kwale County has experienced 21 outbreaks since 1961 (Murithi et al., 2010). Early warning messages for RVF outbreaks, especially in East Africa, are often given by international institutions such as the Emergency Prevention System (EMPRES-i), the National Aeronautics and Space Administration (NASA) and the World Health Organization (WHO). In Kenya, Somalia and Tanzania, a RVF model based on satellite measurements of sea surface temperatures (SST) and the normalised difference vegetation index (NDVI) data (a measure of greenness that can vary between -1 and + 1) were used to provide a two to six weeks early warning for December 2006 to May 2007 (Anyamba et al., 2009), while other studies have predicted longer lead times. For example, Linthicum et al. (1999) showed that the RVF can be predicted 5 months in advance using SST anomalies and satellite NDVI data. The numerous studies and predictions on the risk of RVF outbreaks have mainly been based on satellite data, whereas the current work has used observed climate data from three study sites in Kenya. The general objective of this study was to develop a dynamic model for predicting the risk of RVF outbreaks in Kenya with a lead-time of at least three months in epizootic areas of the country. The choice of type of model and variables were largely informed by the documented findings of other scientists in an attempt to fill the obvious gaps. Special reference to rainfall, temperature, relative humidity and the normalised difference vegetation index (NDVI).

Materials and Methods Study sites The study sites were counties Garissa (0o27’25”S, 39o39’30”E), Murang’a (0o45’S 37o7’E) and Kwale (4o10’S, 39o27E). The specific locations selected were Garissa Central and Bura in Garissa county, Makuyu and Maragua in Murang’a county and Kwale and Kinango in Kwale county as shown in Figure 1. The three sites are at varying altitudes with Garissa at 138 m, Murang’a (Thika) at 1501 m and Kwale 422 m. Meteorological data for Murang’a were obtained from Thika Meteorological Station (0o01’S 37o06E). The three sites were selected because they are key RVF-prone geographical areas of Kenya that are known to have experienced serious RVF outbreaks in the past as they are endemic for the disease (Murithi et al., 2010). All the three counties are in the zone of Kenya that generally experiences two main rainfall seasons, the long rains from March to May, and the short rains from October to December as shown in Figure 2. The mean maximum and minimum monthly temperature patterns for the study sites are also shown in the figure. As Kwale station does not record temperatures, we used mean monthly temperature data from Moi International Airport (Mombasa), approximately 28 km away from the county headquarters, as proxy data for Kwale.

[page 96]

Primary data The study sites were drawn from RVF hotspots in Kenya that had experienced at least 21 years of RVF outbreaks since the year of its introduction in the respective sub-counties, and at least 45% of years of involvement in national outbreaks after the RVF introduction. Stratified random sampling was done to obtain two villages per study site, where 50 farmers were then selected for sampling, ensuring that those selected all kept livestock. The sampled respondents provided information, through questionnaires, on their knowledge of the climatic seasons in their geographical areas, the types of livestock kept, and the types of diseases their animals had suffered during the preceding ten years. The information thus obtained was validated by means of interviews with key informants from each of the study sites.

Dynamic risk prediction A dynamic risk model based on historical RVF outbreaks and climate data to guide veterinary and public health policies on prevention and control of RVF outbreaks was developed from an experimental research design. This is a logistic regression model (within the framework of a generalised linear model) with the RVF cases as response variable where 1 is defined as occurrence and 0 as non-occurrence (Quinn and Keough, 2004). The predictor variables were rainfall, NDVI, relative humidity at 06:00 and at 12:00 GMT, maximum and minimum temperatures and SST. The model was constructed and run by means of the R statistical software (R Core Team, 2014). All predictors were investigated with the aim of a three-month lag period except NDVI, for which a four-month lag period was considered, as it is an indicator for rainfall in the preceding month (Hightower et al., 2012). The fitted logistic regression model for each location was the following: glm(formula) = Cases~factor(Month) + Rain + NDVI + RH06 + RH12 + Tmax + Tmin + SSTs where glm (generalised linear model) is the function designed to fit all the predictors and describes how response variable relates to the linear predictors. Cases indicate occurrences (1) or non-occurrence (0) of RVF; Rain is the measure of the monthly total rainfall in mm; NDVI denotes the mean monthly NDVI figure (between -1 and + 1); RH06 and RH12 are the observed mean monthly relative humidity data (expressed as %) at 06:00 GMT and 12:00 GMT; Tmax and Tmin are the mean monthly maximum and minimum temperatures in centigrades; SSTs is sea surface temperature in °C. Month is included in the model as a factor to distinguish it from the predictor variables since it contains repetitive units of 1-12.

Model training and validation The training model was run on 70% of the weather data and validated on 30% of these data. The training data were from 1981 to 2000, while the validation data were from 2001 to 2010. The RVF prediction models for Garissa, Murang’a (Thika municipality) and Kwale were based on the variables confirmed as significant predictors in the initial run. The adequacy of each of the regression models was checked through an examination of the goodness of fit and determining how similar the observed response variables were to the expected or predicted values (Quinn and Keough, 2004).

Model skill To evaluate the quality of our models, we used both the summary

[Geospatial Health 2016; 11:377]


gh-2016_2.qxp_Hrev_master 31/05/16 11:44 Pagina 97

Article

Figure 1. Location of Kenyan Rift Valley fever hotspots exemplified by Garissa, Murang’a and Kwale counties. The figure at top-right is the map of Kenya and the boundaries of its 47 counties. The connecting lines indicate locations of Murang’a, Kiambu, Kwale (bottom left) and Garissa (bottom right) counties. The three counties are a sample of Rift Valley fever hotspots in Kenya. Thika municipality is marked in purple, near Kakuzi, in the top map (Kiambu and Murang’a).

[Geospatial Health 2016; 11:377]

[page 97]


gh-2016_2.qxp_Hrev_master 01/06/16 13:48 Pagina 98

Article

A

B

C

Figure 2. Mean monthly rainfall and temperature patterns: A) Garissa, B) Thika, C) Kwale.

[page 98]

[Geospatial Health 2016; 11:377]


gh-2016_2.qxp_Hrev_master 31/05/16 11:44 Pagina 99

Article

output of the models as well as the coefficient of determination (R 2), which measure the variation in the dependent variable explained by the variation in the independent variable (Keller, 2011) and the Brier Score (BS). The latter is a measure of model errors where BS=0 indicates best skill and BS=1 no skill (Wilks, 2011). This was done by employing the logistic regression function using the R statistical software.

Results Tables 1-3 show the outputs of the model for Garissa, Thika, and Kwale meteorological stations. They present the estimate, standard

error, Z value, P value for the outcome and the variables (rainfall, NDVI, relative humidity, temperatures of air and the sea surface). The variables that the generalised linear model depicted as significant (P<0.05) were re-run on the model and the prediction outputs plotted as shown in Figures 3-5. The predictors in the second run were therefore correlated on the basis of their significance in the first run as shown in Tables 1-3. The model for Garissa (Table 1) indicated rainfall, NDVI, RH12, minimum temperature and SST as the stronger predictors (P<0.05). Table 2 represents the model output for Thika. It shows the relative humidity at 06:00 GMT and minimum temperature as the only two significant variables in this run of the model. Table 3 depicts the model output for Kwale. It shows that the significant predictors in this zone are relative humidity at 06:00 GMT, minimum temperature and SST. This observation is expected as relative humidity increases with

Table 1. Garissa model output. Variable (3-month lag)

Estimate

Standard error

Z value

P value

Significance

R2

Briers score

-7.279 -10.285 -14.916 34.972 22.998 14.587 15.270

3.345 4.609 8.369 17.435 12.457 7.492 7.384

-2.176 -2.232 -1.782 2.006 1.846 1.947 2.068

0.0295 0.0256 0.0747 0.0449 0.0649 0.0515 0.0387

Positive Positive Negative Positive Negative Positive Positive

0.66

0.007

Rain NDVI RH06 RH12 Tmax Tmin SSTs

NDVI, normalised difference vegetation index; RH06, relative humidity at 06:00 GMT; RH12, relative humidity at 12:00 GMT; Tmax, temperature maximum; Tmin, temperature minimum; SSTs, sea surface temperatures.

Table 2. Thika model output. Variable (3-month lag)

Estimate

Standard error

Z value

P value

Significance

R2

Briers score

-0.070 1.825 1.294 0.601 1.188 -2.535 84.832

0.596 0.674 0.989 1.510 1.649 1.280 136.108

-0.118 2.708 1.307 0.398 0.720 -2.065 0.623

0.906 0.007 0.191 0.690 0.471 0.039 0.533

Negative Positive Negative Negative Negative Positive Negative

0.21

0.022

Rain NDVI RH06 RH12 Tmax Tmin SSTs

NDVI, normalised difference vegetation index; RH06, relative humidity at 06:00 GMT; RH12, relative humidity at 12:00 GMT; Tmax, temperature maximum; Tmin, temperature minimum; SSTs, sea surface temperatures.

Table 3. Kwale model output. Variable (3-month lag) Rain NDVI RH06 RH12 Tmax Tmin SSTs

Estimate

Standard error

Z value

P value

Significance

R2

Briers score

0.363 -742.813 -55.101 31.865 17.743 -137.579 364.209

0.832 416.474 21.067 18.440 93.146 61.699 153.384

0.436 -1.784 -2.615 1.728 0.190 -2.230 2.374

0.663 0.075 0.009 0.084 0.849 0.026 0.018

Negative Negative Positive Negative Negative Positive Positive

0.16

0.018

NDVI, normalised difference vegetation index; RH06, relative humidity at 06:00 GMT; RH12, relative humidity at 12:00 GMT; Tmax, temperature maximum; Tmin, temperature minimum; SSTs, sea surface temperatures.

[Geospatial Health 2016; 11:377]

[page 99]


gh-2016_2.qxp_Hrev_master 31/05/16 11:44 Pagina 100

Article

decrease in temperature reaching its highest value very early in the morning under conditions of an unchanging dew point temperature. It is also noteworthy that the minimum temperature variable was found to be significant in all the respective runs of the model in Garissa, Murang’a and Kwale. NDVI seemed to be a stronger predictor than rainfall, as it is a measure of greenness recorded every month compared to dry periods that at times record no rainfall in a month. The SST variable was depicted as a significant predictor for Garissa and Kwale but not for Murang’a, a fact that could be attributed to the proximity of the two sites to the Indian Ocean. Proximity to the sea puts Garissa and Kwale within the same surface wind regimes. An earlier study by Indeje and Semazzi (1999) confirms the existence of a strong correlation between rainfall over parts of East Africa and the lower equatorial stratospheric zonal wind during the months of March-May and June-August. Wolff et al. (2011) observe the broad inverse relation between rainfall and windiness in their description of the characteristic surface ocean warming in the western Indian Ocean that leads to intensification and shifts of the Inter-Tropical Convergence Zone (ITCZ), resulting in increased precipitation over East Africa and weakening of the local surface winds.

Rift Valley fever prediction for Garissa county Figure 3 (A) is the training model output, while Figure 3 (B) is the validation model output for Garissa county. Figure 3 (A) shows that the model accurately predicts the outbreaks of Rift Valley fever in 1997 and 1998 and Figure 3 (B) that the validation model predicts the outbreak as observed at the end of 2006 and beginning of 2007. These outputs were based on rainfall, NDVI, relative humidity at 12:00 GMT, minimum temperature and SST, the variables found to be significant predictors in the initial run of the model aimed to determine the strength of the meteorological predictors (Table 1). The outcome suggests that the selected parameters are part of the variables that may be related to the development of meteorological sys-

Figure 3. Rift Valley fever prediction in Garissa: model outputs. A) training, B) validation. The red lines are for predicted outbreaks while the blue ones represent actual occurrences.

[page 100]

tems that are conducive to the development of the RVF vectors as well as the virus that causes the RVF. As a measure of vegetation, NDVI is related to rainfall, which is especially evident in rain-fed natural ecosystems and agricultural areas (Anyamba et al., 2010). Warm minimum temperatures facilitate the development of mosquito larvae in flooded waters following heavy rainfall episodes. This argument is supported by the temperature patterns that characterise the Garissa area.

Rift Valley fever prediction for Murang’a county Figure 4 presents the RVF prediction for Murang’a county. The prediction model was trained using weather data from 1981 to 2000 resulting in the output as shown in Figure 4A. The significant parameters in the initial run of the model were NDVI and minimum temperature (Table 2). The model was validated by means of weather data from 2001 to 2010 resulting in the output as shown in Figure 4B. The outcomes of both models depict accurate prediction, as the Department of Veterinary Services (DVS) reports confirm that there were RVF outbreaks in 1983, 1989, 1993, 1997 and 1998 as indicated in Figure 4A and in 2006-07 as shown in Figure 4B. The topography of Murang’a county, at an average altitude of 1200 m, shares the proximity of the central highlands, where greenness may be assumed even during long periods of absence of rain. NDVI may therefore be a stronger predictor than rainfall, though it may not ensure availability of vector habitats without water. Elevated minimum temperatures during the peak months of the rainfall seasons are conducive to mosquito larval development and may therefore be a factor that enhances the predictive aspect of the parameter.

Rift Valley fever prediction for Kwale county The outputs of the prediction and validation models for RVF in Kwale county are shown in Figure 5A and B, respectively. This output is based on relative humidity at 06:00 GMT, minimum temperature and SST, the

Figure 4. Rift Valley fever prediction in Murang’a: model outputs. A) training, B) validation. The red lines are for predicted outbreaks while the blue ones represent actual occurrences.

[Geospatial Health 2016; 11:377]


gh-2016_2.qxp_Hrev_master 31/05/16 11:44 Pagina 101

Article

variables that were found to be significant predictors in the initial run of the model that was aimed to determine the strength of the meteorological predictors (Table 3). The training model accurately predicted the RVF outbreaks of 1983, 1993 and 1998. Although it gives very weak signals for 1989 and 1997, when RVF outbreaks also occurred according to the DVS records, the validation model was found to predict the 2006 RVF outbreak fairly accurately (Figure 5B).

Discussion Bearing in mind the need for massive applications of control measures at the earliest indications of elevated rainfall and flooding, it was felt to be important to consider the accuracy of the data used in modelling, given the huge costs involved in administering sustained mosquito larval control.

General performance of the model Varying coefficients of determination were obtained. Garissa had R2=0.66, Murangâ&#x20AC;&#x2122;a R2=0.21 and Kwale R2=0.16. While the topographical differences between the three sites may be appreciated, the low coefficients of determination for Murangâ&#x20AC;&#x2122;a and Kwale suggest the need for further refinement of the model. In order to improve on the skill of the model, the climatological patterns should be considered with regard to the difference in dates of onset of the rainfall seasons by the geographical and ecological zones. Further research on season-based patterns of discreet weather variables with respect to RVF outbreaks could yield important results. With respect to risk mapping, thresholds could also vary with geographical location, while some may flood upon receiving seasonal rainfall in excess of 400 mm, others may do so already at

200 mm in a season (Anyamba, 2010). These could at least explain part of the variation in model output. The training and validation models also show peaks when the weather conditions could have favoured RVF outbreaks. Absence of outbreaks where such peaks do not coincide with observed or reported outbreaks could be attributed to acquired immunity. It was noted that besides the positive predictions that were confirmed by reports of actual outbreaks, especially the 1983, 1989, 1993, 1997-98 and 2006-07 RVF outbreaks, the model output included positive predictions where there were no outbreaks. Even though we could not explain all the false positives, we found that most of them coincided with periods of extreme rainfall events. Some of the notable years in this respect are 1987-88, 1991-92 and 2000-01 as shown in Figure 6AC, which present the rainfall anomalies time series from 1981 to 2010 at Garissa, Thika and Kwale, respectively. In our view, the model predicts risk during these periods based on the favourable factors, rainfall being a key variable. It may also be pointed out that interventions, following regular surveillance by the DVS, may have provided immunity to livestock, thereby reducing the risk of a RVF outbreak in spite of meteA

B

C

Figure 5. Rift Valley fever prediction in Kwale: model outputs. A) training, B) validation. The red lines are for predicted outbreaks while the blue ones represent actual occurrences.

Figure 6. A time series plot of rainfall anomalies and Rift Valley fever prediction epidemics in the period January 1981 to December 2010. A) Garissa, B) Thika, C) Kwale.

[Geospatial Health 2016; 11:377]

[page 101]


gh-2016_2.qxp_Hrev_master 31/05/16 11:44 Pagina 102

Article

orological factors favourable for infection indicated by the model. Besides the acquired immunity suggested here, there may also be undetected, yet significant, demographic and spatial expansion of the RVF virus during the intervening period between one and two outbreaks (Bird et al., 2008).

Variables of importance Variables thought to have an influence on the epizootic include the Southern Oscillation Index (SOI) (http://www.cgd.ucar.edu/cas/catalog/climind/soi.html) and the SST. The SOI has been considered in the effort to determine the best predictors of RVF by means of autoregressive integrated moving average (ARIMA) models (http://people. duke.edu/~rnau/411arim.htm). Past RVF events in East Africa have been found to be closely linked to the occurrence of the warm phase of the El Niño Southern Oscillation (ENSO) (http://climate.ncsu.edu/climate/patterns/ENSO.html), a phenomenon that is accompanied by prolonged periods of warm SSTs in the central, eastern equatorial Pacific Ocean and the Indian Ocean (Kelly-Hope and Thomson, 2008). These conditions lead to heavy rainfall and flooding of vast areas that serve as habitats for Culex and Aedes mosquitoes, the primary vectors of the RVF virus (Anyamba et al., 2010). It is on this basis that the SST as a variable was included as a factor in the models. Anyamba et al. (2010) used NDVI as proxy for ecological dynamics and rainfall. This requires a lag of about one month as NDVI is a measure of vegetation greenness, which in turn is closely related to rainfall. In an attempt to develop a model for the prediction of RVF in the East African region with a lead time of 2 to 5 months, Linthicum et al. (1999) used equatorial Pacific and Indian Ocean SST as well as NDVI anomaly data. We used lagged NDVI data by one month as input in the model for comparison purposes, since this study was largely based on observed actual meteorological data. As proxy for rainfall, it may, however, be misleading if applied to areas that rely on irrigation or where there have been sudden or recent changes of land use. While NDVI and SST components are commonly referred to, both in this study and the one by Linthicum et al. (1999), satellite-generated data on weather variables such as rainfall, relative humidity and temperature must also be taken into account. Most RVF outbreaks in Kenya are associated with above-normal rainfall, which is mainly responsible for favouring breeding of the vector mosquitoes by creating conditions allowing sufficient amounts of surface water. The fact that mosquito eggs can only hatch in water enhances the recognition of rainfall as one of the major factors influencing the transmission of RVF. In arid and semi arid zones, such as Garissa, where the average monthly rainfall does not exceed 120 mm (Figure 2A), rainfall curtails RVF transmission. It is thus an important component in the model owing to its impact on the vectorial capacity of RVF. Temperature is recognised as another key variable that also influences the vectorial capacity through its dual effect on the vector mosquito and the growth of the RVF parasite in its body. Temperature variations influence the extrinsic incubation period (EIP) of the virus, effects that not only vary with the mosquito species but also the virus genotype (Reisen et al., 2006). Temperature affects also the development rates of mosquito larvae, the gonotrophic cycle as well as the survivorship of both the adults and the larvae (Ceccato et al., 2012). In Kwale county (coast) and Garissa [arid and semi arid lands (ASAL)], temperature is not the limiting factor for the development of the vector, as average temperatures rarely go below 18°C, as it is in Murang’a, where marked seasonal variations are observed. The three models presented portray minimum temperature as an important variable, while maximum temperature is downplayed rendering credence to the minimum temperature variable as an important factor in the development [page 102]

of mosquito larvae. Figure 2A-C shows that April, the peak month of the Long Rains season, is also the month with the highest average mean minimum temperature, a situation that clearly favours the development of mosquito larvae (Muturi et al., 2007). Relative humidity was considered as an important component by us as different Culex species have been found generally not to live long enough to complete their transmission cycle when the relative humidity is consistently below 60% (Grover-Kopec et al., 2006; Muturi et al., 2007). Relative humidity is therefore an important predictor alongside temperature and rainfall. Common Culex habitats are ponds, bamboo, fallen logs, leaf axils, streams and rock pools (Muturi et al., 2007). One of the features observable in these habitats is their high capacity for holding water as well as retention of humidity. Relative humidity is a limiting factor in Garissa as the monthly average never goes beyond 62% even in April and November, which are the wettest months. It is notable that the model picks relative humidity at 12:00 GMT as a significant predictor for Garissa and RH at 06:00 GMT for Kwale county given that it is a limiting factor in Garissa (ASAL) but not in Kwale (coast). The presence of RVF may have been sustained and often amplified by the ease of movement of animals within different ecological zones. The Kenyan coast, where Kwale county is located, is listed among zones considered to be outside the potential epizootic area mask with regard to the 2006-07 RVF epidemic (Anyamba et al., 2010). This observation was corroborated by respondents and key informants during interviews conducted for this study.

Conclusions A dynamic high-skill model for the prediction of RVF outbreaks in three specific Kenyan counties was developed and validated. The model outcomes varied with geographical location and meteorological variables, such as rainfall, NDVI, temperature, relative humidity and SST at a lead-time of three months. The differences were also attributed to data distribution differences as well as the varying ecosystems represented by the three sites. Besides rainfall, minimum temperature was found to be the most significant predictor in each of the models. The results suggest the need for consideration of strategic uses of the dynamic risk prediction models with respect to geographical zoning and other meteorological predictors. The installation of automatic weather stations, especially in areas without meteorological stations, would improve the accuracy of the risk prediction considerably. Strategic uses of this model approach include timing of mitigation programmes such as vaccination, guided by the dynamic prediction model. Awareness programmes on the predictive signals from the model would also go a long way towards reaching livestock herders and farmers.

References Anyamba A, Chretien JP, Small J, Tucker CJ, Formenty PB, Richardson JH, Britch SC, Schnabel DC, Erickson RL, Linthicum KJ, 2009. Prediction of a Rift Valley fever outbreak. P Natl Acad Sci USA106:955-9. Anyamba A, Linthicum KJ, Small J, Britch SC, Pak E, Rocque SL, Formenty P, Hightower AW, Breiman RF, Chretien JP, Tucker CJ, Schnabel D, Sang R, Haagsma K, Latham M, Lewandowski HB, Magdi SO, Mohamed MA, Nguku PM, Reynes JM, Swanepoel R, 2010. Prediction, assessment of the Rift Valley fever activity in East

[Geospatial Health 2016; 11:377]


gh-2016_2.qxp_Hrev_master 31/05/16 11:44 Pagina 103

Article

and Southern Africa 2006-2008 and possible vector control strategies. Am J Trop Med Hyg 83(Suppl.2):43-51. Bird BH, Githinji JWK, Macharia JM, Kasiiti JL, Muriithi RM, Gacheru SG, Musaa JO, Towner JS, ARSerena, Oliver JB, Stevens TL, Erickson BR, Morgan LT, Khristova ML, Hartman AL, Comer JA, Rollin PE, Ksiazek TG, Nichol ST, 2008. Multiple virus lineages sharing recent common ancestry were associated with a large Rift Valley fever outbreak among livestock in Kenya during 2006-2007. J Virology 82:11152-66. Ceccato P, Vancutsem C, Klaver R, Rowland J, Connor SJ, 2012. A vectorial capacity product to monitor changing malaria transmission potential in epidemic regions of Africa. J Trop Med 2012:595948. Cook GC, Zumla A, 2003. Manson’s tropical diseases. 21st ed. Saunders, Philadelphia, PA, USA. Davies FG, Martin V, 2003. Recognizing Rift Valley fever. FAO, Rome, Italy. Grover-Kopec EK, Blumenthal MB, Ceccato P, Dinku T, Omumbo JA, Connor SJ, 2006. Web-based climate information resources for malaria control in Africa. Malaria J 5:38. Hightower A, Kinkade C, Nguku PM, Anyangu A, Mutonga D, Omolo J, Njenga MK, Feikin DR, Schnabel D, Ombok M, Breiman RF, 2012. Relationship of climate, geography, and geology to the incidence of Rift Valley fever in Kenya during the 2006-2007 outbreak. Am J Trop Med Hyg 86:373-80. Indeje M, Semazzi FHM, 1999. Relationships between QBO in the lower equatorial stratospheric zonal winds and East African seasonal rainfall. Meteorol Atmos Phys 73:227-44. Keller G, 2011. Statistics for management and economics. 10th ed. Cengage Learning, Boston, MA, USA. Kelly-Hope L, Thomson MC, 2008. Climate and infectious diseases. Seasonal forecasts, climate change and human health. Springer, Amsterdam, The Netherlands. Linthicum KJ, Anyamba A, Tucker CJ, Kelley PW, Myers MF, Peters CJ, 1999. Climate and satellite indicators to forecast Rift Valley fever epidemics in Kenya. Science 285:397-400.

Murithi RM, Munyua P, Ithondeka PM, Macharia JM, Hightower A, Luman ET, Breiman RF, Kariuki Njenga M, 2010. Rift Valley fever in Kenya: history of epizootics and identification of vulnerable districts. Cambridge University Press, Cambridge, UK. Muturi EJ, Shililu JI, Weidong GU, Jacob BG, Githure JI, Novak RJ, 2007. Larval habitat dynamics and diversity of Culex mosquitoes in rice agro-ecosystem in Mwea, Kenya. Am J Trop Med Hyg 76:95102. Osman D, McIntyre S, Hogarth S, Heymann D, 2013. Rift Valley Fever and a new paradigm of research and development for zoonotic disease control. Emerg Infect Dis 19:189-93. Quinn GP, Keough MJ, 2004. Experimental design and data analysis for biologists. Cambridge University Press, Cambridge, UK. R Core Team, 2014. R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. Available from: http://www.R-project.org/ Reisen WK, Fang Y, Martinez VM, 2006. Effects of temperature on the transmission of West Nile virus by Culex tarsalis (Diptera: Culicidae). J Med Entomol 43:309-17. Rich M, Wanyoike F, 2010. An assessment of the regional and national socio-economic impacts of the 2007 Rift Valley fever outbreak in Kenya. Am J Trop Med Hyg 83(Suppl.2):52-7. Soti V, Tran A, Degenne P, Chevalier V, Lo Seen D, Thiongane Y, Diallo M, Guégan JF, Fontenille D, 2012. Combining hydrology and mosquito population models to identify the drivers of Rift Valley fever emergence in semi-arid regions of West Africa. PLoS Negl Trop Dis 6:e1795. WHO, 2009. Rift Valley fever outbreaks forecasting models. FAO-WHO, Rome, Italy. Wilks DS, 2011. Statistical methods in the atmospheric sciences. Academic Press, Cambridge, MA, USA. Wolff C, Haug GH, Timmermann A, Damsté JSS, Brauer A, Sigman DM, Cane MA, Verschuren D, 2011. Reduced interannual rainfall variability in East Africa during the last ice age. Science 333:743-7.

[Geospatial Health 2016; 11:377]

[page 103]


gh-2016_2.qxp_Hrev_master 31/05/16 11:44 Pagina 104

Geospatial Health 2016; volume 11:414

The spatial distribution pattern of human immunodeficiency virus/acquired immune deficiency syndrome in China Ying Wang,1 Yongli Yang,1 Xuezhong Shi,1,2 Saicai Mao,1 Nian Shi,3 Xiaoqing Hui1 1School of Public Health, Zhengzhou University, Zhengzhou; 2Editorial Department, Zhengzhou University, Zhengzhou, China; 3Cardiovascular Department of Internal Medicine, University College London Medical School, London, UK

Abstract Human immunodeficiency virus (HIV) infection and the acquired immune deficiency syndrome (AIDS) exhibit variable patterns among the provinces of China. Knowledge of the geographical distribution of the HIV/AIDS epidemic is needed for the prevention and control of AIDS. Thus, the cumulative number of reported cases of HIV/AIDS from

Correspondence: Xuezhong Shi, Department of Health Statistics, School of Public Health, Zhengzhou University, 100 Kexue Avenue, Zhengzhou, Henan 450001, China. Tel: +86.371.67781728 - Fax: +86.371.67781728. E-mail: xzshi@126.com

the period 1985-2013, and the incidence rate of AIDS in 2013 were determined. Spatial autocorrelation analysis and hotspot analysis were conducted using ArcGIS10.2 to explore the spatial distribution of the HIV/AIDS epidemic. Both the thematic map and the global spatial autocorrelation Moranâ&#x20AC;&#x2122;s I statistics revealed a clustered distribution of the spatial pattern. A local spatial autocorrelation analysis indicated hotspots of AIDS incidence rate that were confined to the provinces of Guangxi, Yunnan and Sichuan. The hotspots encompassed Guangxi and Yunnan, while Henan Province displayed a negative autocorrelation with more variable numbers that included neighbouring regions. The Getis-Ord Gi* statistics identified 6 hotspots and 8 coldspots for the incidence of AIDS, and 7 hotspots and 1 coldspot for the cumulative number of reported cases of HIV/AIDS. The spatial distribution pattern of the HIV/AIDS epidemic in China is clustered, demonstrating hotspots located in the Southwest. Specific interventions targeting provinces with severe HIV/AIDS epidemic are urgently needed.

Key words: HIV/AIDS; Geographic information system; Spatial autocorrelation analysis; Hot spot analysis. Contributions: YW, YLY and XZS conceived the study, performed the statistical analysis and drafted the manuscript. YW, SCM, NS and XQH assisted with the data collection and statistical analysis. YW and YLY contributed equally to this work. All authors contributed to the interpretation of the data and preparation of the manuscript. All authors read and approved the final manuscript. Conflict of interest: the authors declare no potential conflict of interest. Funding: this study was funded by the national major project of science and technology of the 12th five-year plan of China (2012ZX10004905-001) and partially by the Henan province medical science and technology project (201303003). Acknowledgments: the authors thank the staff in the surveillance system of the Chinese Center for Disease Control and Prevention and all involved individuals. We also give special thanks to all the experts who participated in the expert consultation. Received for publication: 24 September 2015. Revision received: 27 November 2015. Accepted for publication: 27 November 2015. ŠCopyright Y. Wang et al., 2016 Licensee PAGEPress, Italy Geospatial Health 2016; 11:414 doi:10.4081/gh.2016.414 This article is distributed under the terms of the Creative Commons Attribution Noncommercial License (CC BY-NC 4.0) which permits any noncommercial use, distribution, and reproduction in any medium, provided the

[page 104]

Introduction Since the discovery of human immunodeficiency virus (HIV) in the early 1980s and the disease caused by the infection, i.e. the acquired human immunodeficiency syndrome (AIDS), has posed a serious threat to human health. The HIV/AIDS epidemic has become a serious public health problem because of its rapid worldwide spread (Tanser et al., 2009). Studies investigating HIV/AIDS in China and abroad have shown that the epidemic varies geographically (Al-Ahmadi and AlZahrani, 2013). Based on available figures from 2012, sub-Saharan Africa has the most serious epidemic of this virus in the world, with roughly 25 million people living with HIV or AIDS, while the Middle East and North Africa (MENA) region has one of the lowest rates prevalence, with an estimated 260,000 people living with HIV (UNAIDS, 2013). Even within a country, the epidemic can vary widely (Li et al., 2014b). For example, in China, the cumulatively reported number of HIV infections for the period 1985 to 2013 in the provinces of Yunnan, Guangxi, Sichuan and Henan exceeded 50,000, while fewer than 1000 HIV infections were reported in Qinghai Province and the Autonomous Region of Ningxia that are situated near each other but separated by Gansu Province in the centre-northwest of the country. This finding demonstrated the strong geographical variation of the epidemic. The HIV infectiousness governs its occurrence, but this is also related to the local environment, social economy and culture among other factors. Because of this, the development and prevalence of AIDS is a spatial phenomenon with interacting diffuse characteristics. For example, individuals infected with HIV are more likely to develop tuberculosis (TB) resulting in increasing numbers of TB-HIV co-infections, and AIDS can be transmitted from high-risk groups to

[Geospatial Health 2016; 11:414]


gh-2016_2.qxp_Hrev_master 31/05/16 11:44 Pagina 105

Article

the general population (Silva and Torres, 2015). To understand these developments, the relevant data need to be based on the spatial properties of the infection. Compared with traditional statistical methods, spatial statistical methods have certain advantages. For example, spatial statistics can be used to analyse correlations in space and the clustering analysis can be used to describe the occurrence of HIV infection in specified areas (Qian et al., 2014). Geographic information systems (GIS) represent an important tool for spatial analysis that plays a vital role in strengthening the whole process of epidemiological surveillance, information management and analysis (Kandwal et al., 2009). It can effectively manage spatial data, describe the geographical distribution and variation of diseases and analyse spatial and temporal trends (Tang et al., 2014). Advances in GIS technology provide new opportunities for epidemiologists to study associations between demographics and the spatial distribution of disease (Vanmeulebrouk et al., 2008). Some scholars have applied GIS technology to explore temporal and spatial differences in infectious diseases such as malaria and severe acute respiratory syndrome, among other infections (Shiode et al., 2015). However, studies on the spatial distribution of HIV infection are limited. Peng et al. (2011) described the geographical characteristics of HIV/AIDS in 125 counties of Yunnan Province based on spatial statistics and using data from the HIV/AIDS sentinel surveillance system of the Chinese Center for Disease Control and Prevention (China CDC) from 1991 to 2009. The results indicate that the epidemic was most severe in the far west and northeastern regions of the Yunnan Province. Zhang et al. (2015) analysed the spatial-temporal clustering of the HIV/AIDS epidemic in Chongqing, a large city in central-north China that previously belonged to Sichuan Province but now constitutes a separate province of its own. They use data from the annual reports of Chongqing Municipal Center for Disease Control from 2006 to 2012. The results revealed epidemic hotspots distributed in 15 mid-western counties. Since the spatial distribution and potential HIV/AIDS clusters in the provincial level of the mainland of China has not been clarified, an improved understanding of the distribution pattern of the epidemic is needed. Because the incubation period of AIDS is long (average estimated period of 7-10 years), it is difficult to acquire the incidence rate of HIV. On the other hand, we have a good source of data on AIDS incidence as HIV carriers are followed up every 6 months. In the present study, province-level AIDS incidence rates and the cumulative number of reported cases of HIV/AIDS in China were used to examine the spatial distribution and hotspots of the HIV/AIDS epidemic. Our goal was to provide a more comprehensive understanding of the epidemic and its geographical distribution to provide a rational basis for the development of a regional policy investigating AIDS prevention and control (Xiao et al., 2013).

Materials and Methods

Statistical approach In order to create a thematic map, data for the incidence rates of AIDS and the cumulative number of reported cases of HIV/AIDS in the 31 provinces were loaded into China’s provincial boundary administrative division vector diagram. The data were classified by the natural breaks method, which identifies the partitions capable of achieving the minimum difference between the specimens in the same segment and the maximum difference between specimens in the various segments (Bivand, 2013). General spatial autocorrelation Moran’s autocorrelation coefficient (I) was used to measure the correlation among neighbouring observations and the levels of spatial clustering among neighbouring districts (Zulu et al., 2014). It indicates the direction and degree of a single variable that is strictly attributable to a relatively close position and can therefore be used to explore overall spatial patterns (Osei and Duker, 2008). The general Moran’s I was calculated as follows:

eq. 1

where n is the number of study areas, wij(d) represents the adjacent weight matrix from the distance andrefers to the incidence rate of AIDS (or the cumulative number of reported cases of HIV/AIDS) in the ith or jth province. Under random conditions, the index hypothesis of an approximately normal distribution is usually standardised by the Z score. The value of Moran’s I ranges between approximately -1 and 1. The magnitude of I stands for the strength of the spatial autocorrelation, where I=1 denotes a strong positive spatial autocorrelation between study regions and I=-1 a strong negative spatial autocorrelation. Local spatial autocorrelation The global Moran’s I statistic only discloses whether a distribution pattern is clustered or random without giving detailed cluster information about site or features, such as high-high or low-low clusters, highlow or low-high outliers (Moraga and Montes, 2011). The local spatial autocorrelation makes up the disadvantage of the global Moran’s I statistic. It is used to examine whether there are regional patterns involving specific areas (Barrell and Grant, 2013). The local Moran’s I index was used to analyse cluster features and the local Getis-Ord Gi* statistics (Getis and Ord, 1992) to test the focal provinces of the HIVAIDS epidemic. The local Moran’s I was calculated as follows:

Data sources We obtained data for cumulative number of reported cases of HIV/AIDS in mainland China for the period 1985 to 2013 from the surveillance system of the China CDC, also called HIV network direct reporting information system, established by the Ministry of Health based on the disease prevention and control information system platform to improve the quality and relevance of the report. Data for the incidence of AIDS in 2013 was obtained from the China Public Health Statistics Yearbook (2014).

eq. 2

The value of the local Moran’s I ranges from +1 (indicating highhigh or low-low clusters) through 0 (=random pattern) to -1 (indicating high-low or low-high outliers). The local G index was calculated as

[Geospatial Health 2016; 11:414]

[page 105]


gh-2016_2.qxp_Hrev_master 31/05/16 11:44 Pagina 106

Article

follows:

Results eq. 3

The meaning of the parameters in both equations are same as those described for the general Moran’s I formula. If Gi* is positive, the local clusters are identified as high-value correlations, i.e. statistically significant hotspots; if Gi* is negative, the local clusters can be identified as low-value correlations, i.e. statistically significant coldspots. ArcGIS software version 10.2 (ESRI, Redlands, CA, USA; http://www.esri.com/software/arcgis/arcgis-for-desktop), was used to map the province-level HIV/AIDS epidemic in China. The data were subjected to spatial autocorrelation analysis and hotspot analysis. The level of significance was 0.05.

The thematic map of the HIV/AIDS epidemic in China in 2013, stratified by province and classified into five levels with respect to incidence is shown in Figure 1. Guangxi and Yunnan had the highest rates and were thus classified as being at the fifth level, while the eight provinces with the lowest rates, including Inner Mongolia, Gansu and Ningxia among others, were classified as being at the first level. Based on the cumulative number of reported cases of HIV/AIDS, Guangxi and Yunnan showed the largest figures and were therefore placed at the fifth level. The eight provinces with the smallest numbers of AIDS cases were positioned at the first level and included Inner Mongolia, Gansu, Ningxia and so on. In general, the regions with the most severe HIV/AIDS epidemics were concentrated in southwestern China, while regions with a low prevalence were located in north-eastern China, north-central regions, and the western plateaus including Qinghai and Tibet. Table 1 shows the indexes for the global spatial autocorrelation. There was a statistically significant positive spatial autocorrelation

B

A

Figure 1. Thematic map of the human immunodeficiency virus/acquired immune deficiency syndrome (HIV/AIDS) epidemic in China. A) AIDS incidence rate; B) cumulatively reported cases of HIV/AIDS.

Table 1. Global Moran’s I statistics for the human immunodeficiency virus/acquired immune deficiency syndrome epidemic in China.

AIDS incidence rate Cumulative number of HIV/AIDS cases

Moran’s I

E (I)°

Var (I)#

P

0.203 0.075

-0.033 -0.033

0.002 0.003

4.746 2.140

<0.001 0.032

AIDS, acquired immune deficiency syndrome; HIV, human immunodeficiency virus. °Expected index value; #theoretical variance; §standard normal deviation [

[page 106]

[Geospatial Health 2016; 11:414]

]. P<0.05 is considered significant.


gh-2016_2.qxp_Hrev_master 31/05/16 11:44 Pagina 107

Article

between the incidence rate of AIDS and the cumulative number of HIV/AIDS cases. These findings indicate that the distribution of the HIV/AIDS epidemic in China was clustered distribution throughout the entire study region. Based on the incidence rate of AIDS, high-high regions included Guangxi, Yunnan and Sichuan, which not only demonstrated a high incidence rate of AIDS but also were surrounded by provinces with high incidence rates of the disease. Based on the cumulative number of reported cases of HIV/AIDS, Guangxi and Yunnan were located in high-

high regions. Henan was located in a high-low region, which presented a large number of cases, but was surrounded by provinces with few cumulative numbers of reported cases. In general, high-high regions were located in south-western China (Figure 2). Figure 3 shows the hotspots, six of which were identified based on the incidence of AIDS, including the provinces of Guangxi, Sichuan, Yunnan, Guizhou, Hainan and Hunan, which represented the higher transmission of HIV and were located in the South of the country. Eight coldspots, including Beijing and the provinces of Tianjin, Hebei,

A

B

Figure 2. Local spatial autocorrelation of the human immunodeficiency virus/acquired immune deficiency syndrome (HIV/AIDS) epidemic in China. A) AIDS incidence rate; B) cumulatively reported cases of HIV/AIDS.

A

B

Figure 3. Hotspot analysis of the human immunodeficiency virus/acquired immune deficiency syndrome (HIV/AIDS) epidemic in China. A) AIDS incidence rate; B) cumulatively reported cases of HIV/AIDS.

[Geospatial Health 2016; 11:414]

[page 107]


gh-2016_2.qxp_Hrev_master 31/05/16 11:44 Pagina 108

Article

Shanxi, Shandong, Jiangsu, Inner Mongolia and Liaoning in northeastern China were identified. Based on the cumulative number of reported cases of HIV/AIDS, Chongqing and the Guangxi, Sichuan, Yunnan, Guizhou, Hunan and Hainan provinces were hotspots, while Inner Mongolia was a coldspot.

Discussion In the present study, we took advantage of advancements in the GIS to explore geographic distribution patterns and hotspots of the HIV/AIDS epidemic in China. The main results indicated that there was a clustered spatial distribution pattern of the HIV/AIDS epidemic; hotspots were located in the southwestern part of the country and coldspots in the Northeast. Based on the spatial autocorrelation analysis, we found that the HIV/AIDS epidemic exhibited geographical variation and a clustered distribution pattern, indicating the presence of hotspots. Figure 1 shows that the most severe HIV epidemic areas were located in the Southwest, which indicates that the specific interventions targeting these areas should be strengthened and regional health planning and resource allocation adopted. The results of the local spatial autocorrelation analysis revealed that three provinces were autocorrelated based on the incidence rate of AIDS and two were autocorrelated due to the cumulative number of reported cases of HIV/AIDS. Henan was found to be located in a highlow region, with more HIV/AIDS cases there but less HIV/AIDS cases in its neighbouring provinces. Owing to its poorer economic status, Henan accepts paid blood/plasma donation since the 1990s (Yan et al., 2013), which could have an impact on the number of cases there. Indeed, an increasing number of HIV/AIDS cases have been reported due to blood transfusion, which is the primary transmission route of HIV in Henan (Wu et al., 2006). In other provinces, risky sexual behaviour and intravenous drug use are the main routes of transmission (Zhang et al., 2011). Another reason for more HIV/AIDS cases in Henan is that the province has a big population, ranking third in this respect in China in 2010 (Hu et al., 2015). It is also worth mentioning that the incidence rate of AIDS in Henan has remained relatively stable in recent years. A convincing explanation for this phenomenon results from the great measures implemented by the government to minimise the illegal blood trade. The Getis-Ord statistics further support the findings obtained for the local spatial autocorrelation analysis. Hotspot regions were identified in six provinces based on the AIDS incidence rate and in seven provinces based on the cumulative number of reported cases of HIV/AIDS. There are two probable explanations for the severe HIV epidemic in Guangxi and Yunnan. One explanation concerns geographical factors (Shan et al., 2013). Guangxi and Yunnan are located in the heroin trafficking route, which begins at the Golden Triangle and continues to the south-western or western provinces of the country, potentially expanding the transmission of HIV/AIDS. In addition, frontier trade and travel contribute to the growing number of HIV/AIDS cases (Li et al., 2014a). Another explanation is the presence of a large mobile population, which is a population that is associated with a high risk of HIV transmission due to poor knowledge of the disease and thus lower usage rates of condoms and/or other high-risk behaviour (Zhou et al., 2014). Sichuan Province is adjacent to Yunnan province, where there is active drug trade. Due to its less developed economy, there is a large outflow of rural residents to places like Yunnan, who may assume drugrelated habits (Li et al., 2015). In addition, some of the ethnic minorities in Sichuan have tolerant attitudes regarding sexual behaviour (Liu [page 108]

et al., 2013). These high-risk factors have most probably played role in the spread of AIDS. Some other risk factors may be present in other hotspots, including risky sexual behaviour (homosexual and bisexual contacts), disorderly management of blood and blood products, unsafe medical service, low levels of AIDS-related knowledge and poor economic status (Zhuang et al., 2012). Compared with the status of hotspots, coldspots were confined to eight provinces based on AIDS incidence rates and to one province based on the cumulative number of reported cases of HIV/AIDS. There are several likely explanations for these observations: highly developed economy and culture, high-speed flow of information high level of public cognition and knowledge of AIDS and more widespread understanding of the importance of self-protection (Teng and Shao, 2011). It is also worth noting that there was an increased proportion of men having sex with men (MSM) among the annually reported HIV cases in these regions (Liu et al., 2015). The concealed features of MSM constitute a dangerous bridge for HIV transmission from high-risk groups to the general population. Comparatively, the proportion of blood transmission, mother to child transmission, and intravenous drug abuse transmission is decreasing and stabilising at low levels in these provinces, thus maintaining a small epidemic of HIV/AIDS (He et al., 2011). The spatial analysis found a spatial autocorrelation of HIV/AIDS at the provincial level, which was consistent with the spatial distribution of HIV/AIDS epidemic in Yunnan and Chongqing at the county level. Meanwhile, the hotspots observed in the south-western part of China provide rational evidence for making for a regional policy concerning AIDS prevention and control. This analysis has a few limitations. First, our analysis is based on the cumulative numbers of reported cases of HIV/AIDS and may not be a true representation of the actual spread. However, it shows strength in reflecting the disease burden and the effect of long-term intervention as having a direct influence on the allocation of health resources. Second, due to limitations associated with the surveillance system data sources, and the unclear high-risk population base, we could not obtain detailed cumulative numbers of HIV/AIDS cases and incidence rates classified by the transmission route. In future studies, it will be attempted to estimate the high-risk population base and the incidence rate by transmission route to better explore the spatial distribution of HIV/AIDS with the goal of identifying early warning of potential new epidemics.

Conclusions The present study analysed the spatial distribution and hotspots of the HIV/AIDS epidemic, thereby providing a scientific basis for the development of regional policies for AIDS prevention and control of the spread of HIV/AIDS epidemics.

References Al-Ahmadi K, Al-Zahrani A, 2013. Spatial autocorrelation of cancer incidence in Saudi Arabia. Int J Environ Res Public Health 10:7207-28. Barrell J, Grant J, 2013. Detecting hot and cold spots in a seagrass landscape using local indicators of spatial association. Landscape Ecol 28:2005-18. Bivand R, 2013. Spatial statistics: geospatial information modeling and thematic mapping. Environ Plann B 40:189.

[Geospatial Health 2016; 11:414]


gh-2016_2.qxp_Hrev_master 31/05/16 11:44 Pagina 109

Article

Getis A, Ord JK, 1992. The analysis of spatial association. Geogr Anal 24:189-206. He Q, Xia YH, Raymond HF, Peng R, Yang F, Ling L, 2011. HIV trends and related risk factors among men having sex with men in mainland China: findings from a systematic literature review. Southeast Asian J Trop Med Public Health 42:616-33. Hu SB, Wang F, Yu CH, 2015. Evaluation and estimation of the provincial infant mortality rate in China’s sixth census. Biomed Environ Sci 28:410-20. Kandwal R, Garg PK, Garg RD, 2009. Health GIS and HIV/AIDS studies: perspective and retrospective. J Biomed Inform 42:748-55. Li L, Assanangkornchai S, Duo L, McNeil E, Li J, 2014a. Risk behaviors, prevalence of HIV and hepatitis C virus infection and population size of current injection drug users in a China-Myanmar border city: results from a respondent-driven sampling survey in 2012. PLoS One 9:e106899. Li L, Wei DY, Hsu WL, Li TY, Gui T, Wood C, Liu YJ, Li HP, Bao ZY, Liu SY, Wang XL, Li JY, 2015. CRF07_BC strain dominates the HIV-1 epidemic in injection drug users in Liangshan prefecture of Sichuan, China. AIDS Res Hum Retrov 31:479-87. Li M, Shen Y, Jiang X, Li Q, Zhou X, Lu H, 2014b. Clinical epidemiology of HIV/AIDS in China from 2004-2011. Biosci Trends 8:52-8. Liu GW, Lu HY, Wang J, Xia DY, Sun YM, Mi GD, Wang LM, 2015. Incidence of HIV and syphilis among men who have sex with men (MSM) in Beijing: an open cohort study. PloS One 10:e0138232. Liu S, Wang QX, Nan L, Wu CL, Wang ZF, Bai ZZ, Liu L, Cai P, Qin S, Luan RS, 2013. The changing trends of HIV/AIDS in an ethnic minority region of China: modeling the epidemic in Liangshan prefecture, Sichuan Province. Biomed Environ Sci 26:562-70. Moraga P, Montes F, 2011. Detection of spatial disease clusters with LISA functions. Stat Med 30:1057-71. Osei FB, Duker AA, 2008. Spatial and demographic patterns of cholera in Ashanti region - Ghana. Int J Health Geogr 7:44. Peng ZH, Cheng YJ, Reilly KH, Wang L, Qin QQ, Ding ZW, Ding GW, Ding KQ, Yu RB, Chen F, Wang N, 2011. Spatial distribution of HIV/AIDS in Yunnan province, People’s Republic of China. Geospat Health 5:177-82. Qian S, Guo W, Xing J, Qin Q, Ding Z, Chen F, Peng Z, Wang L, 2014. Diversity of HIV/AIDS epidemic in China: a result from hierarchical clustering analysis and spatial autocorrelation analysis. AIDS 28:1805-13. Shan D, Sun JP, Anna Y, Chen ZD, Yuan JH, Li T, Zhang G, Yang X, Wei M, Duan S, Xiang LF, Ye RH, Yang YC, 2013. Comprehensive evaluation of AIDS spending in Dehong prefecture of Yunnan province in 2010. Zhonghua Yu Fang Yi Xue Za Zhi 47:991-5.

Shiode N, Shiode S, Rod-Thatcher E, Rana S, Vinten-Johansen P, 2015. The mortality rates and the space-time patterns of John Snow’s cholera epidemic map. Int J Health Geogr 14:21. Silva CJ and Torres DFM, 2015. A Tb-Hiv/Aids coinfection model and optimal control treatment. Discret Contin Dyn S 35:4639-63. Tang F, Cheng Y, Bao C, Hu J, Liu W, Liang Q, Wu Y, Norris J, Peng Z, Yu R, Shen H, Chen F, 2014. Spatio-temporal trends and risk factors for Shigella from 2001 to 2011 in Jiangsu Province, People’s Republic of China. PLoS One 9:e83487. Tanser F, Barnighausen T, Cooke GS, Newell ML, 2009. Localized spatial clustering of HIV infections in a widely disseminated rural South African epidemic. Int J Epidemiol 38:1008-16. Teng T, Shao Y, 2011. Scientific approaches to AIDS prevention and control in China. Adv Dent Res 23:10-2. UNAIDS, 2013. Global Report 2013. Avaliable from: http://www.unaids.org/sites/default/files/en/media/unaids/contentassets/documents/epidemiology/2013/gr2013/UNAIDS_Global_ Report_2013_en.pdf Vanmeulebrouk B, Rivett U, Ricketts A, Loudon M, 2008. Open source GIS for HIV/AIDS management. Int J Health Geogr 7:53. Wu ZY, Sun XH, Sullivan SG, Detels R, 2006. Public health: HIV testing in China. Science 312:1475-6. Xiao Y, Tang S, Zhou Y, Smith RJ, Wu J, Wang N, 2013. Predicting the HIV/AIDS epidemic and measuring the effect of mobility in mainland China. J Theor Biol 317:271-85. Yan J, Xiao S, Zhou L, Tang Y, Xu G, Luo D, Yi Q, 2013. A social epidemiological study on HIV/AIDS in a village of Henan Province, China. AIDS Care 25:302-8. Zhang M, Chu ZX, Wang HL, Xu JJ, Lu CM, Shang H, 2011. A rapidly increasing incidence of HIV and Syphilis among men who have sex with men in a major city of China. AIDS Res Hum Retrov 27:113940. Zhang YQ, Xiao Q, Zhou L, Ma DH, Liu L, Lu RR, Yi DL, Yi D, 2015. The AIDS epidemic and economic input impact factors in Chongqing, China, from 2006 to 2012: a spatial-temporal analysis. BMJ Open 5:e006669. Zhou YB, Liang S, Wang QX, Gong YH, Nie SJ, Nan L, Yang AH, Liao Q, Song XX, Jiang QW, 2014. The geographic distribution patterns of HIV-, HCV- and co-infections among drug users in a national methadone maintenance treatment program in Southwest China. BMC Infect Dis 14:134. Zhuang X, Liang YX, Chow EPF, Wang YF, Wilson DP, Zhang L, 2012. HIV and HCV prevalence among entrants to methadone maintenance treatment clinics in China: a systematic review and meta-analysis. BMC Infect Dis 12:130.

[Geospatial Health 2016; 11:414]

[page 109]


gh-2016_2.qxp_Hrev_master 31/05/16 11:44 Pagina 110

Geospatial Health 2016; volume 11:396

How much incident lung cancer was missed globally in 2012? An ecological country-level study Benn Sartorius,1 Kurt Sartorius1,2 1School of Nursing and Public Health, University of KwaZulu-Natal, Durban; 2Faculty of Commerce, University of the Witwatersrand, Johannesburg, South Africa

Abstract Lung cancer incidence is increasing in many low-to-middle-income countries and is significantly under-reported in Africa, which could potentially mislead policy makers when prioritising disease burden. We employed an ecological correlation study design using countrylevel lung cancer incidence data and associated determinant data. Lagged prevalence of smoking and other exposure data were used to account for exposure-disease latency. A multivariable Poisson model was employed to estimate missed lung cancer in countries lacking incidence data. Projections were further refined to remove potential deaths from infectious/external competing causes. Global lung cancer incidence was much lower among females vs males (13.6 vs 34.2 per 100,000). Distinct spatial heterogeneity was observed for incident lung cancer and appeared concentrated in contiguous regions. Our model predicted a revised global lung cancer incidence in 2012 of 23.6 compared to the Globocan 2012 estimate of 23.1, amounting to ~38,101 missed cases (95% confidence interval: 28,489-47,713). The largest relative under-estimation was predicted for Africa, Central America and the Indian Ocean regions (Comoros, Madagascar, Mauritius, Mayotte, Reunion, Seychelles). Our results suggest substantial underreporting of lung cancer incidence, specifically in developing countries (e.g. Africa). The missed cost of treating these cases could amount to >US$ 130 million based on recent developing setting costs for treating earlier stage lung cancer. The full cost is not only under-estimated, but

Correspondence: Benn Sartorius, Discipline of Public Health Medicine, 2nd floor George Campbell Building, Howard College Campus, University of KwaZulu-Natal, Durban 4001, South Africa. Tel: +27.031.2604459 - Fax: +27.031.2604211. E-mail: Sartorius@ukzn.ac.za Key words: Under reporting; Incident lung cancer; Ecological determinant model. Received for publication: 3 July 2015. Revision received: 12 October 2015. Accepted for publication: 19 October 2015. ŠCopyright B. Sartorius and K. Sartorius, 2016 Licensee PAGEPress, Italy Geospatial Health 2016; 11:396 doi:10.4081/gh.2016.396 This article is distributed under the terms of the Creative Commons Attribution Noncommercial License (CC BY-NC 4.0) which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

[page 110]

also requires substantial additional social/family inputs as evidenced in more developed settings like the European Union. This represents a major public health problem in developing settings (e.g. Africa) with limited healthcare resources.

Introduction Lung cancer remains the most common cancer worldwide, and recent surveys illustrate persistent high levels of lung cancer and an increasing trends of female prevalence in many developed regions like North America and Europe (WHO-IARC, 2014; Islami et al., 2015; Lortet-Tieulent et al., 2014). Many low to middle income countries also show increasing levels of male-led incidence in regions like Eastern and Central Europe, particularly in Hungary, Poland and Serbia (WHOIARC, 2014; Lortet-Tieulent et al., 2014; Tesfaye et al., 2007). A recent study in 27 countries belonging to the European Union (EU) indicates that the economic costs of lung cancer for 2009 were â&#x201A;Ź18.8 bn and the biggest contributor in economic productivity (Edge et al., 2010). Meanwhile, the potential cost of lung cancer in developing countries is projected to increase further because of the trend of rising incidence in the next decade (Farnsworth et al., 2004; Murray, 1997; Nishida and Kudo, 2013). Recent estimates report that the developing countries account for 58 per cent of the global burden of lung cancer incidence (WHO-IARC, 2014) and contain an estimated 1.38 billion smokers, who are associated with a 7-fold higher risk of acquiring lung cancer (Nishida and Kudo, 2013). Given the high economic cost, rising lung cancer incidence in developing countries is likely to further impact available health resources due to the limited resources of these regions (Shorrocks, 2013). The complexity of lung cancer incidence is illustrated by the stratified nature of its biology and geno-environmental determinants (Minagawa et al., 2007; Tesfaye et al., 2007) that show both inter- and intra-country variations. The Globocan 2012 report shows marked inter-country incidence differences that are compounded by intra-country differences across regions, ethnic groups (WHO-IARC, 2014; Lortet-Tieulent et al., 2014), gender (Islami et al., 2015), a long latency period (30 years) and the degree of inhalation (Farnsworth et al., 2004). Although smoking is widely accepted as the primary agent that triggers lung cancer (Bruce et al., 2000; Cohen et al., 2005), evidence exists that shows that the disease is also linked to other social determinants like passive smoking, indoor cooking, occupational hazards, motor vehicle emissions and industrial pollution (Lary et al., 2014; Youlden et al., 2008). A combination of factors therefore suggests diverse inter- and intra-country differences in the incidence of lung cancer that are further magnified by the inequality gap between socioeconomic groups (Deaton, 2013). In addition, the management of lung cancer is compromised because its incidence may be significantly under-reported in (many) developing countries because of lack-

[Geospatial Health 2016; 11:396]


gh-2016_2.qxp_Hrev_master 31/05/16 11:44 Pagina 111

Article

ing data or poor quality thereof (WHO-IARC, 2014; Ferlay et al., 2010). Although the geography of lung cancer incidence is well recorded in the literature (Tesfaye et al., 2007), the under-reporting of its incidence in many countries could potentially mislead policy makers when prioritising disease burden. A need therefore exists to identify geographic areas that lack or significantly under-report lung cancer as this has implications for disease prioritisation, resource allocation and tailored policy interventions (Islami et al., 2015; Tesfaye et al., 2007). Lung cancer management in developing countries will continue to be compromised by its complexity as well as under-estimation of the impact of the disease because of poor or lacking data. The aim of this study was to quantify potential under-estimation of lung cancer using Globocan 2012 data and available lagged determinant (risk factor) data. The level and implications of this underestimation are discussed in order to propose some relevant policy implications.

the Global Burden of Disease Project Study 2013 (Naghavi et al., 2015) to realistically reduce projected under-estimation from the model as some incident lung cancer would not have been realised (Figure 1).

Latency period between exposure and lung cancer A review of available literature suggests an approximate 30-year population latency period between smoking prevalence and subsequent lung cancer mortality (Burbank, 1972; Devesa et al., 1987; Finkelstein, 1991; Polednak, 1974; Weiss, 1997). This estimate was used to estimate the period to be used between historical population level smoking exposure and its subsequent impact on current lung cancer incidence. This study employed an ecological country-level correlation design.

Data analysis Spatial analysis

Materials and Methods Incidence data Age- and gender-standardised lung cancer incidence as well as raw counts for 184 countries were extracted from the GLOBOCAN 2012 database (WHO-IARC, 2014). A detailed description of the data sources, methodologies and potential limitations have been published previously (WHO-IARC, 2014). Data from countries are classified according to availability and quality, namely: A [high quality national data or high quality regional (coverage greater than 50%)], B [high quality2 regional (coverage between 10% and 50%)], C [high quality2 regional (coverage lower than 10%)], D [national data (rates)], E [regional data (rates)], F (frequency data), G (no data).

Local indicators of spatial autocorrelation (LISA) (Anselin, 1996) were estimated using the freely available GeoDa software (Anselin et al., 2006). Local Moranâ&#x20AC;&#x2122;s I statistic based on rates (cases and associated populations at risk) was used to identify the existence of significant spatial clustering of lung cancer by gender. GeoDa implements the recommended Empirical Bayes (EB) standardisation procedure (Assuncao and Reis, 1999) in its global Moran scatter plot and LISA maps i.e. standardisation of the raw rates. Significance was set at 5% after 99,999 iterations.

Determinants The following key determinants were included in the analysis. Published data for historical levels and trends in age standardised prevalence of smoking by country were utilised (Ng et al., 2014). These data are for 187 countries and estimated age-standardised prevalence of smoking in 1980, 1996, 2006 and 2012. We used estimates from 1980 in our model as this would most closely correspond with the 30-year latency period described below. Missing smoking prevalence estimates for countries were imputed using a Bayesian spatial conditional autoregressive (CAR) model to use neighbouring country values. Historical outdoor air pollution data with reference to particulate matter concentrations less than 10 microns in diameter (PM10) expressed as micrograms per m3 were extracted from the World Bank Data repository for the period 19902012 (World Bank, 2013). We used estimates for 209 countries from 1990 in our model (data were not available prior to that in the repository). Proportions of households using solid fuel for cooking (indoor pollution) by country in 1990, 2000 and 2010 were extracted from Bonjour et al. (2013) and these are based on data extracted from the World Health Organization (WHO) Household Energy Database (WHO, 2009, 2012). Data used by Bonjour et al. (2013) covered 155 countries with at least one survey per country between 1974 and 2010. For countries lacking solid-fuel data but classified as high-income countries, they assumed the proportion using solid fuels for cooking to be <5% (Bonjour et al., 2013). We used estimates for 184 countries from 1990 in our model. Gross domestic product (GDP) data per capita (in US$) by country in 1980 were also extracted from the World Bank Data repository (World Bank, 2013). Furthermore, mortality from competing causes of death (specifically communicable disease and injury) were extracted for each country from

Figure 1. Fraction of mortality due to competing infectious disease and injury by country. The values on the vertical axis represent ISO 3166 country codes.

[Geospatial Health 2016; 11:396]

[page 111]


gh-2016_2.qxp_Hrev_master 31/05/16 11:44 Pagina 112

Article

Statistical analyses Analyses were performed using STATA software version 13.0 (StataCorp, 2011). Using observed numbers of age standard incident lung cancer cases and associated population at risk, we calculated incidence rates and associated 95% Poisson confidence limits. Country level lung cancer incidence was considered significantly above average if the lower 95% confidence interval (CI) limit (Îą=0.025) of the incidence proportion would be above the global average (Pickle et al., 1987). We employed a regression framework to quantify and identify determinant associated with lung cancer utilised previously. We employed an ecological (countrylevel) generalised linear Poisson modelling approach with robust standard errors (to correctly adjust the standard errors and not over-estimate significance) to estimate risk ratios (RR) for each determinant vs lung cancer incidence. Factors significant at the 10% level, based on the bivariate regressions, were selected for inclusion into a multivariable model. The final multivariable model was then used to predict potentially missed lung cancer cases based on observed covariate values, along with 95% confidence intervals (uncertainty). We only predicted missing lung cancer incidence in countries with potentially under-reported or missing national level cancer data as described in the outcome data section above.

Results Global incidence The overall global estimated incidence rate for lung cancer in 2012 was found to be 23.1 per 100,000 population (before accounting for potential

Figure 2. Countries above the global average age standardised lung cancer incidence (plus 95% confidence intervals) per 100,000 population in 2012. The horizontal red line represents the overall global incidence.

A

B

Figure 3. Map depicting observed age-standardised lung cancer incidence per 100,000 population by gender (A, female; B, male) in 2012.

[page 112]

[Geospatial Health 2016; 11:396]


gh-2016_2.qxp_Hrev_master 31/05/16 11:44 Pagina 113

Article

under-reporting) with male incidence more than twice that of females. The highest overall age-standardised incidence of lung cancer was in Hungary (51.6 per 100,000) followed by Serbia, Democratic Republic of Korea and Macedonia with rates of 45.6, 44.2 and 40.8 per 100,000 respectively (Figure 2). Six of the top 20 are classified as developed countries. China and USA were the leading countries in terms of absolute burden of new cases in 2012 with 652,842 and 214,226 cases, respectively, followed by Japan (94,855), India (70,275) and Russia (55,805). Very low incidence rates were observed in Middle and Western Africa (likely as a result of under-estimation due to poor data quality), with the lowest incidences estimated in Niger (0.2 per 100,000), Tanzania (0.7), Malawi (0.9) and the Democratic Republic of Congo (1.0).

Spatial risk by gender Global and country-level incidence was generally much lower among females compared to males (globally 13.6 vs 34.2 per 100,000, respectively). The highest female incidence rates were mainly contained in developed countries like North America [Canada (rank 2) and USA (3)] and Western Europe [Denmark (1), Hungary (5), Netherlands (6), Iceland (7), Ireland (8), Norway (9) and United Kingdom (10)] (Figure 3A). The highest burden of absolute incident female cases was observed in China (193,347) and USA (102,172). Conversely, high levels of male incidence (Figure 3B) were more widespread and included Europe, North America, the Eastern Mediterranean and Asia while lower levels were reflected in Africa (except for South Africa). The highest male incidence was largely concentrated in south-eastern and south-western Europe and Central Asia. The 10 leading countries (in descending order) were: Hungary,

Armenia, Macedonia, Serbia, Turkey, Montenegro, Poland, Kazakhstan, Romania and the Democratic Republic of Korea. The highest burden of absolute incident male cases was again observed in China (459,495) and USA (112,054). Prominent differences in the clustering of gender-specific incidence (based on neighbouring contiguity) were also observed (Figure 4). Significant spatial clustering of higher male lung cancer incidence was largely concentrated in Central-Eastern Europe and Northern Asia (Russia) with isolated significant high-risk countries in Africa (South Africa, Libya, and Morocco) and South America (Argentina). Significant spatial clustering of higher female incidence was largely concentrated in Central-Eastern Europe with isolated significant highrisk countries in Africa (South Africa, Morocco, i.e. similar to males) and Cuba. Significant clustering of low lung cancer incidence was largely confined to Africa for both genders (again see potential underestimation in data and methods section).

Lung cancer incidence and associated historical determinants Summary statistics for the selected determinants are presented in Table 1. Global prevalence of age standardised smoking in 1980 was estimated to be 25.9% [95% CI: 25.1, 26.6] by Ng and colleagues (2014). However, this varies by continent, with the highest historical prevalence of smoking and historical GDP per capita in Europe (~31% and US$ 12,389, respectively). The highest historical exposure to indoor pollution was observed in Africa (~77%). A detailed summary of all the determinants is shown in Table 1.

A

B

Figure 4. Significant (P<0.05) spatial clustering of age-standardised lung cancer incidence (per 100,000 population) by gender (A, female; B, male) in 2012.

[Geospatial Health 2016; 11:396]

[page 113]


gh-2016_2.qxp_Hrev_master 31/05/16 11:44 Pagina 114

Article

Causal associations Age-standardised smoking prevalence in 1980 as well as indoor and outdoor pollution levels in 1990 were all significantly associated with lung cancer incidence based on bivariate regressions. Figure 5 shows the strong positive correlation between lagged prevalence of smoking (in 1980) and observed lung cancer incidence in 2012. Following multivariable adjustment, increasing smoking prevalence and outdoor pollution remained significant risk factors for current age-stan-

dardised lung cancer incidence with relative risks (RR) of 1.12 and 1.01, respectively (Table 2) for each unit increase respectively. Figure 6 below depicts the distribution of predicted age standardised incidence of lung cancer for countries with lack of, or poor (under-estimated) data.

Under-estimation of lung cancer incidence by region, country and additional direct cost Predictions from our model suggest that the true global incidence of

Table 1. Summary of historical determinant values by continent. Continent

Age-standardised smoking prevalence (%) in 1980

Indoor pollution (% of households using solid fuel for cooking) in 1990

Outdoor pollution (PM10 per m3) in 1990

GDP per capita (US$) in 1980

Africa Americas* Asia Europe Oceania°

12.74 (11.23,14.26) 17.11 (13.93,20.3) 23.66 (21.28,26.03) 30.78 (29.08,32.48) 30.33 (25.01,35.64)

77.08 (68.88,85.27) 32.83 (24.91,40.74) 43.2 (32.32,54.09) 13.38 (8.16,18.59) 44.67 (15.9,73.44)

60.88 (48.06,73.71) 42.93 (35.35,50.51) 84.81 (66.19,103.44) 51.21 (43.73,58.69) 31.24 (23.85,38.62)

821 (517,1126) 3171 (2100,4242) 6571 (2508,10,635) 12,389 (8267,16,511) 4028 (1097,6958)

GDP, gross domestic product. Values are expressed as mean (95% confidence interval). *Includes North, Central and South America; °comprising Australia and proximate islands.

Table 2. Bivariate and multivariable adjusted (using a Poisson regression model) determinants associated with lung cancer in 2012. Determinant Age-standardised prevalence of smoking in 1980 (%) Outdoor pollution in 1990 (PM10 per m3) Indoor pollution (% of households using solid fuel for cooking) GDP per capita (US$)

Unadjusted risk ratio (95% CI)

P

Adjusted risk ratio (95% CI)

P

1.09 (1.03,1.16) 1.02 (1.01,1.03) 1.03 (1.01,1.04) 0.999 (0.999,0.999)

0.004 0.001 <0.001 0.007

1.12 (1.05,1.18) 1.01 (1.00,1.02) 1.01 (0.99,1.00) 0.999 (0.999,1.000)

<0.001 0.020 0.193 0.613

CI, confidence interval; GDP, gross domestic product.

Table 3. Top ten largest absolute number of potentially missed incident lung cancer cases by country in 2012. Country Nepal Bolivia Papua New Guinea Greece Congo Lao Kosovo Burundi Sierra Leone Macao, China

Region

Missed cases*

95% CI

South Asia South America Pacific South East Europe Central Africa South East Asia South East Europe Central Africa Western Africa East Asia

2902 1463 803 587 526 439 380 225 147 118

2833-2970 1401-1526 763-844 569-604 497-554 417-461 342-420 204-247 129-166 97-141

CI, confidence interval. *Absolute number.

Figure 5. Relationship between lagged age-standardised prevalence of smoking (1980) and current observed age-standardised lung cancer incidence (2012). Solid red line represents locally weighted scatterplot smoothing [LOWESS] fitted line, solid green line stands for linear trend, while dashed red line is the 95% confidence limits around LOWESS fitted line. Marginal boxplots are also presented alongside respective axis.

[page 114]

[Geospatial Health 2016; 11:396]


gh-2016_2.qxp_Hrev_master 31/05/16 11:44 Pagina 115

Article

lung cancer in 2012 may have been 23.6 per 100,000 (95% CI: 23.4-23.7) even after adjustment for competing causes of mortality compared to reported estimates of 23.1 per 100,000 i.e. ~2.2% or 0.5 additional incident cases per 100,000 population globally. In absolute terms, this could have resulted in potentially 38 101 missed cases (95% CI: 28 489-47 713) in 2012. The under-estimation by WHO region, illustrated in Figure 7A, compares the ratio of predicted incidence by the model vs observed incidence where a value significantly in excess of 1 suggests under-estimation. Our model suggests that lung cancer incidence has been significantly under-estimated in a number of regions, particularly Central Africa, Central America, Western Africa, Eastern Africa and the Pacific region. In absolute numbers, the burden of under-estimated cases is largest in South Asia (5900 cases) followed by Central America (5812 cases) and South America (3312 cases). The under-estimation of the burden of lung cancer incidence, illustrated in Figure 7B, compares the ranked predicted incidence to the observed incidence in all the WHO sub-regions. The largest absolute burden increase was predicted in Central America with potentially 3.5

Figure 6. Model predicted age-standardised incidence of lung cancer in countries with missing or poor quality data in 2012.

A

B

Figure 7. A) Ratio of and comparative absolute age-standardised incidence predicted by the model vs observed age-standardised incidence by region in 2012. Blue line represents the relative underestimation ratio (primary x axis); red line is the null underestimation, i.e. ratio=1 (primary y axis); green line shows the absolute number of missing cases (secondary y axis). B) Under-estimation of the burden of lung cancer incidence.

Table 4. Top ten largest relative number of potentially missed incident lung cancer cases by country in 2012. Country Cabo Verde Bolivia Comoros Sierra Leone Papua New Guinea Burundi Djibouti Nepal Solomon islands Congo

Region

Number of cases*

Predicted number of cases°

Relative under-estimation

95% CI

Western Africa South America Indian Ocean Western Africa Pacific Central Africa Eastern Africa South Asia Pacific Central Africa

5 659 11 110 706 206 29 4160 51 809

16.1 2121.2 26.9 257.0 1508.4 431.0 58.8 7060.6 85.9 1334.7

3.3 3.2 2.5 2.3 2.1 2.1 2.1 1.7 1.7 1.7

1.9-5.4 3.1-3.4 1.7-3.7 2.1-2.6 2.0-2.2 1.9-2.3 1.6-2.7 1.7-1.7 1.3-2.1 1.6-1.7

CI, confidence interval. *Observed number of age standardised cases; °predicted by the model.

[Geospatial Health 2016; 11:396]

[page 115]


gh-2016_2.qxp_Hrev_master 31/05/16 11:44 Pagina 116

Article

missed lung cancer cases per 100,000 populations. This was followed by the Pacific Region (2.6 missed cases per 100,000), eastern Europe (1.9 missed cases per 100,000), southeastern Europe (1.6 missed cases per 100,000) and central Africa (1.1 missed cases per 100,000). The model suggests that the highest, absolute numbers of potentially missed cases of lung cancer were in Nepal, Bolivia and Papua New Guinea (Table 3). In relative terms, the highest relative under-estimation was predicted for Cabo Verde followed by Bolivia, Comoros and Sierra Leone (Table 4).

Discussion Our results confirm some common established determinants of lung cancer that include lagged smoking prevalence (Allison, 1978), indoor (Bruce et al., 2000) and outdoor pollution (Cohen et al., 2005) in order to develop a predictive model for potential under-reporting. The multivariable adjusted results confirm historical prevalence of smoking as the most prominent risk factor for current lung cancer incidence. Previous literature suggests that smoking is linked to 85 per cent of lung cancer incidence (Nishida and Kudo, 2013). We used the global prevalence of age standardised smoking in 1980 (estimated to be 25.9%) as our primary predictor of incidence (Ng et al., 2014). This lagged prevalence was based on the assumption of an approximate 30year latency period between exposure and incidence lung cancer rates (Gandini et al., 2008; Weiss, 1997). The multivariable (adjusted) results also indicated a higher risk of lung cancer that was related to indoor pollution which had a positive association with the two variables that has been found in a majority of studies related to use of coal stoves (Lissowska et al., 2005; Mumford et al., 1987) and passive smoking (Hirayama, 2000). The association, however, excluded the high prevalence of indoor pollution in Africa due to wood and coal stoves because of the presumed high level of underreporting of lung cancer incidence in this region (Ferlay et al., 2010; Lam et al., 2004). The bivariate (unadjusted) results confirmed a highly significant association between outdoor pollution (PM10) that has also been suggested as a potential trigger for lung cancer (Lewtas, 2007; RaaschouNielsen et al., 2013; Youlden et al., 2008). This particular kind of air pollution could be influenced by a large range of pollutants that includes radon, asbestos, chromium, cadmium and arsenic, organic chemicals, radiation and coal smoke. A recent study has also demonstrated the potential of remote sensed and meteorological data products as being reliable representation of global observations of PM2.5 for epidemiological studies (Lary et al., 2014). Our bivariate results also indicated a positive association between GDP and lung cancer incidence illustrating the fact that rising affluence in developing countries may possibly be associated with an increase in smoking (Ezzati et al., 2005). Developing countries, moreover, now account for 80% of all smokers and lung cancer is projected to increase markedly in these countries (Mathers and Loncar, 2011; Murray, 1997; Tesfaye et al., 2007). Predictions from our model suggest that the true global incidence of lung cancer in 2012 may have been at least 23.6 per 100,000 compared to reported estimates of 23.1 per 100,000 resulting in potentially ~38,101 missed cases of lung cancer. The high levels of under-estimation reflect the need to improve the quality of cancer incidence data noted in the data and methods section of the Globocan 2012 report. In the recent Globocan 2012 report, 62 out of 184 countries had no cancer

[page 116]

incidence data available and only 66 reported the availability of highquality data (WHO-IARC, 2014). The estimated incidence rate of lung cancer in many countries with poor or absence of data relies on a wide range of different algorithms (Curado et al., 2007; Forman et al., 2013). The countries lacking data were distributed among the WHO regions as follows: Africa (19), the Americas (17), Europe (13), South East Asia (5), Eastern Mediterranean (4) and Western Pacific (4). A majority of the 62 countries that had no data were also developing countries whose resources for cancer management programmes are likely to be limited. Our results reflect low-low levels of incidence clustering in Africa, which may explain the under-estimation of incidence in countries in this region if neighbouring country data were used to extrapolate incidence. Further analysis of the 10 countries with the highest numbers of absolute missed lung cancer cases suggested that many them were African or Central-South American countries, where high levels of poor or lacking cancer data were to be found (Curado et al., 2007; Forman et al., 2013). Interestingly, separate projections of lung cancer incidence in Africa that indicate lung cancer will be a leading cause of mortality by 2020 may contradict the low Globocan 2012 incidence levels in many African countries (Mathers and Loncar, 2011; Murray, 1997). Another concern is that the suggested high levels of under-reporting of lung cancer incidence in developing countries like Lesotho, Sierra Leone, Namibia, Madagascar, Rwanda and Burundi is projected against a backdrop of limited healthcare resources (Narasimhan et al., 2004). A recent study in 27 EU countries suggests that the costs of lung cancer for 2008 was EUR 18.8 bn and that a considerable portion of this expense had to be covered by friends and relatives (Edge et al., 2010). The full cost of lung cancer in the EU is not only under-estimated, but also currently requires substantial inputs by friends and relatives (Jemal et al., 2011). This has implications for developing regions like Africa where a rising incidence of the disease is projected in coming years (Youlden et al., 2008). Based on a recent study from Northern Africa (Tachfouti et al., 2012) which attempted to cost the treatment of early and advanced stage lung cancer for the first year after diagnosis, the above mentioned missed cases at the global level could have amounted to a ranged estimated cost of ~US$ 175,264,600 (95% CI US$: 131,049,400-219,479,800) provided an assumed average early stage cost of US$ 4600 per patient (worst case scenario) and US$ 130,305,420 (95% CI US$: 97,432,380163,178,460) if all assumed average late stage cost of US$ 3420 per patient (best case scenario from cost perspective). These estimates would be far greater if first world treatment costs were factored in as well as indirect costs (e.g. social and lost productivity).

Limitations of the study Given the ecological study design, the potential for ecological fallacy cannot be discounted. As this is a secondary data analysis, missing or incorrect incidence and lagged determinant data could potentially impact the model and our projections. The potential limitations of the Globocan data are discussed in more detail (WHO-IARC, 2014). National cancer reporting systems vary by country, and data quality may thus have varied between regions. Multivariable models allow taking into account confounding factors (Benichou, 2001) and the model needs to be as complete as possible. We cannot therefore discount that potentially important missing determinants of lung cancer may have confounded our findings. Poor quality of the included determinants could also have impacted the estimated number of missed cases in particular countries, leading to over or under-estimation of lung cancer incidence.

[Geospatial Health 2016; 11:396]


gh-2016_2.qxp_Hrev_master 31/05/16 11:44 Pagina 117

Article

Conclusions The WHO Framework Convention on Tobacco Control should continue its work and expand the comprehensive implementation of the framework to all 168 countries that ratified this policy in March 2010 (Tesfaye et al., 2007). In particular, the results suggest significant clusters of high-high male and female incidence across neighbouring countries that could provide synergies for WHO regional policy development. Nevertheless, changing inter-country and gender patterns will pose differential levels of future risk for lung cancer requiring additional resource requirements. Global environmental policies to reduce emissions continue to be disabled by a lack of participation from the world’s leading polluters including North America and China. The signatory parties of the Kyoto Protocol are increasingly being questioned because of the absence of binding resolutions to reduce emissions that are fair to all. International and national health agencies should be urged to geographically target countries that reflect under-reported incidence and develop and/or strengthen programmes, as well as establish cancer registries to prioritise interventions given a limitation of resources in many developing countries (Edge et al., 2010; Tesfaye et al., 2007). The high-high clusters of countries showing an elevated level of incidence in our results suggest regional efficiencies could be obtained by international health bodies if they coordinate programmes for these clusters. Blocks of countries in Central-South America, Asia and Africa, for instance, could be targeted in order to develop synergistic interventions to both manage the disease, as well as better implement programmes like the WHO Framework Convention on Tobacco Control.

References Allison PD, 1978. Measures of inequality. Am Sociol Rev 43:865-80. Anselin L, 1996. Local Indicators of Spatial Association - LISA. Geogr Anal 27:93-115. Anselin L, Ibnu S, Youngihn K, 2006. GeoDa: an introduction to spatial data analysis. Geogr Anal 38:5-22. Assuncao RM, Reis EA, 1999. A new proposal to adjust Moran’s I for population density. Statistics in Medicine 18:2147-62. Benichou J, 2001. A review of adjusted estimators of attributable risk. Stat Methods Med Res 10:195-216. Bonjour S, Adair-Rohani H, Wolf J, Bruce N, Mehta S, Prüss-Ustün A, Lahiff M, Rehfuess E, Mishra V, Smith K, 2013. Solid fuel use for household cooking: country and regional estimates for 1980-2010. Environ Health Persp 121:784-90. Bruce N, Perez-Padilla R, Albalak R, 2000. Indoor air pollution in developing countries: a major environmental and public health challenge. B World Health Organ 78:1078-92. Burbank F, 1972. US lung cancer death rates begin to rise proportionately more rapidly for females than for males: a dose-response effect? J Chronic Dis 25:473-9. Cohen AJ, Ross Anderson H, Ostro B, Pandey KD, Krzyzanowski M, Künzli N, Gutschmidt K, Pope A, Romieu I, Samet JM, 2005. The global burden of disease due to outdoor air pollution. J Toxicol Env Heal A 68:1301-7. Curado M, Edwards B, Shin H, Storm H, Ferlay J, Heanue M, Boyle P, 2007. Cancer incidence in five continents. Vol. IX. IARC, Lyon, France. Deaton A, 2013. What’s wrong with inequality? Lancet 381:363.

Devesa SS, Silverman DT, Young JL, Pollack ES, Brown CC, Horm JW, Percy CL, Myers MH, McKay FW, Fraumeni JF, 1987. Cancer incidence and mortality trends among whites in the United States, 1947-84. J Natl Cancer I 79:701-70. Edge SB, Byrd DR, Compton CC, Fritz AG, Greene FL, Trotti A, 2010. AJCC cancer staging manual. Springer, New York, NY, USA. Ezzati M, Vander Hoorn S, Lawes CM, Leach R, James WPT, Lopez AD, Rodgers A, Murray CJ, 2005. Rethinking the “diseases of affluence” paradigm: global patterns of nutritional risks in relation to economic development. PLoS Med 2:e133. Farnsworth N, Fagan SP, Berger DH, Awad SS, 2004. Child-TurcottePugh vs MELD score as a predictor of outcome after elective and emergent surgery in cirrhotic patients. Am J Surg 188:580-3. Ferlay J, Shin HR, Bray F, Forman D, Mathers C, Parkin DM, 2010. Estimates of worldwide burden of cancer in 2008: GLOBOCAN 2008. Int J Cancer 127:2893-917. Finkelstein MM, 1991. Use of “time windows” to investigate lung cancer latency intervals at an Ontario steel plant. Am J Ind Med 19:22935. Forman D, Bray F, Brewster D, Gombe Mbalawa C, Kohler B, Piñeros M, Steliarova-Foucher E, Swaminathan R, Ferlay J, 2013. Cancer incidence in five continents. Vol. X. IARC, Lyon, France. Gandini S, Botteri E, Iodice S, Boniol M, Lowenfels AB, Maisonneuve P, Boyle P, 2008. Tobacco smoking and cancer: a meta analysis. Int J Cancer 122:155-64. Hirayama T, 2000. Non-smoking wives of heavy smokes have a higher risk of lung cancer: a study from Japan. B World Health Organ 78:940-2. Islami F, Torre LA, Jemal A, 2015. Global trends of lung cancer mortality and smoking prevalence. Transl Lung Cancer Res 4:327. Jemal A, Bray F, Center MM, Ferlay J, Ward E, Forman D, 2011. Global cancer statistics. CA-Cancer J Clin 61:69-90. Lam W, White N, Chan-Yeung M, 2004. Lung cancer epidemiology and risk factors in Asia and Africa state of the art. Int J Tuberc Lung D 8:1045-57. Lary DJ, Faruque F, Malakar N, Moore A, Roscoe B, Adams Z, Eggelston Y, 2014. Estimating the global abundance of ground level presence of particulate matter (PM2.5). Geospat Health 8:611-30. Lewtas J, 2007. Air pollution combustion emissions: characterization of causative agents and mechanisms associated with cancer, reproductive, and cardiovascular effects. Mutat Res-Rev Mutat 636:95133. Lissowska J, Bardin-Mikolajczak A, Fletcher T, Zaridze D, SzeszeniaDabrowska N, Rudnai P, Fabianova E, Cassidy A, Mates D, Holcatova I, 2005. Lung cancer and indoor pollution from heating and cooking with solid fuels. The IARC international multicentre case-control study in Eastern/Central Europe and the United Kingdom. Am J Epidemiol 162:326-33. Lortet-Tieulent J, Soerjomataram I, Ferlay J, Rutherford M, Weiderpass E, Bray F, 2014. International trends in lung cancer incidence by histological subtype: adenocarcinoma stabilizing in men but still increasing in women. Lung Cancer 84:13-22. Mathers C, Loncar D, 2011. Updated projections of global mortality and burden of disease, 2002-2030: data sources, methods and results. World Health Organization, Geneva, Switzerland. Minagawa M, Ikai I, Matsuyama Y, Yamaoka Y, Makuuchi M, 2007. Staging of hepatocellular carcinoma: assessment of the Japanese TNM and AJCC/UICC TNM systems in a cohort of 13,772 patients in Japan. Ann Surg 245:909. Mumford J, He X, Chapman R, Harris D, Li X, Xian Y, Jiang W, Xu C, Chuang J, 1987. Lung cancer and indoor air pollution in Xuan Wei,

[Geospatial Health 2016; 11:396]

[page 117]


gh-2016_2.qxp_Hrev_master 31/05/16 11:44 Pagina 118

Article

China. Science 235:217-20. Murray CJLLAD, 1997. Alternative projections of mortality and disability by cause 1990-2020: Global Burden of Disease Study. Lancet 349:1498. Naghavi M, Wang H, Lozano R, Davis A, Liang X, Zhou M, Vollset SEV, Abbasoglu Ozgoren A, Norman RE, Vos T, 2015. Global, regional, and national age-sex specific all-cause and cause-specific mortality for 240 causes of death, 1990-2013: a systematic analysis for the Global Burden of Disease Study 2013. Lancet 385:117-71. Narasimhan V, Brown H, Pablos-Mendez A, Adams O, Dussault G, Elzinga G, Nordstrom A, Habte D, Jacobs M, Solimano G, 2004. Responding to the global human resources crisis. Lancet 363:1469-72. Ng M, Freeman MK, Fleming TD, Robinson M, Dwyer-Lindgren L, Thomson B, Wollum A, Sanman E, Wulf S, Lopez AD, 2014. Smoking prevalence and cigarette consumption in 187 countries, 1980-2012. J Am Med Assoc 311:183-92. Nishida N, Kudo M, 2013. Recent advancements in comprehensive genetic analyses for human hepatocellular carcinoma. Oncology 84(Suppl.1):93-97. Pickle LW, Mason T, Howard N, Hoover R, Fraumeni J, 1987. Atlas of US cancer mortality among whites: 1950-1980. US Department of Health and Human Services, Public Health Service, National Institutes of Health, Washington, DC, USA. Polednak AP, 1974. Latency periods in neoplastic diseases. Am J Epidemiol 100:354-56. Raaschou-Nielsen O, Andersen ZJ, Beelen R, Samoli E, Stafoggia M, Weinmayr G, Hoffmann B, Fischer P, Nieuwenhuijsen MJ, Brunekreef B, Xun WW, Katsouyanni K, Dimakopoulou K, Sommar J, Forsberg B, Modig L, Oudin A, Oftedal B, Schwarze PE, Nafstad P, De Faire U, Pedersen NL, Ostenson CG, Fratiglioni L, Penell J, Korek M, Pershagen G, Eriksen KT, Sørensen M, Tjønneland A, Ellermann T, Eeftens M, Peeters PH, Meliefste K, Wang M, Bueno-de-Mesquita B, Key TJ, de Hoogh K, Concin H, Nagel G, Vilier A, Grioni S, Krogh V, Tsai MY, Ricceri F, Sacerdote C, Galassi C, Migliore E, Ranzi A,

[page 118]

Cesaroni G, Badaloni C, Forastiere F, Tamayo I, Amiano P, Dorronsoro M, Trichopoulou A, Bamia C, Vineis P, Hoek G, 2013. Air pollution and lung cancer incidence in 17 European cohorts: prospective analyses from the European Study of Cohorts for Air Pollution Effects (ESCAPE). Lancet Oncol 14:813-22. Shorrocks AF, 2013. Decomposition procedures for distributional analysis: a unified framework based on the Shapley value. J Econ Inequal 11:99-126. StataCorp, 2011. Stata statistical software: release 12. StataCorp LP, College Station, TX, USA. Tachfouti N, Belkacemi Y, Raherison C, Bekkali R, Benider A, Nejjari C, 2012. First data on direct costs of lung cancer management in Morocco. Asian Pac J Cancer Prev 13:1547-51. Tesfaye F, Nawi N, Van Minh H, Byass P, Berhane Y, Bonita R, Wall S, 2007. Association between body mass index and blood pressure across three populations in Africa and Asia. J Hum Hypertens 21:28-37. Weiss W, 1997. Cigarette smoking and lung cancer trends. A light at the end of the tunnel? Chest 111:1414-6. WHO, 2009. Global health observatory data repository. World Health Organization, Geneva, Switzerland. Available from: http://apps.who.int/gho/data/node.main WHO, 2012. WHO household energy database. World Health Organization, Geneva, Switzerland. Available from: http://www.who.int/indoorair/health_impacts/he_database/en/inde x.html WHO-IARC, 2014. GLOBOCAN: estimated cancer incidence, mortality, and prevalence worldwide in 2012. Available from: http://globocan.iarc.fr/Default.aspx World Bank, 2013. World development indicators (WDI). Available from: http://data.worldbank.org/indicator Youlden DR, Cramb SM, Baade PD, 2008. The international epidemiology of lung cancer: geographical distribution and secular trends. J Thorac Oncol 3:819-31.

[Geospatial Health 2016; 11:396]


gh-2016_2.qxp_Hrev_master 31/05/16 11:44 Pagina 119

Geospatial Health 2016; volume 11:405

Comparison of single- and multi-scale models for the prediction of the Culicoides biting midge distribution in Germany Renke Lühken,1,2 Jörn Martin Gethmann,3 Petra Kranz,3 Pia Steffenhagen,4 Christoph Staubach,3 Franz J. Conraths,3 Ellen Kiel2 1Bernhard Nocht Institute for Tropical Medicine, WHO Collaborating Centre for Arbovirus and Hemorrhagic Fever Reference and Research, Hamburg; 2Research Group Aquatic Ecology and Nature Conservation, Department of Biology and Environmental Sciences, Carl von Ossietzky University of Oldenburg, Oldenburg; 3Friedrich-Loeffler-Institut, Greifswald-Insel, Riems; 4Institute of Environmental Planning, Leibniz University of Hannover, Hannover, Germany

Abstract This study analysed Culicoides presence-absence data from 46 sampling sites in Germany, where monitoring was carried out from April 2007 until May 2008. Culicoides presence-absence data were analysed in relation to land cover data, in order to study whether the prevalence

Correspondence: Renke Lühken, Bernhard Nocht Institute for Tropical Medicine, WHO Collaborating Centre for Arbovirus and Hemorrhagic Fever Reference and Research, Bernhard-Nocht-Straße 74, 20359 Hamburg, Germany. Tel: +49.040.42818959 - Fax: +49.04042818959. E-mail: renkeluhken@gmail.com Key words: Ceratopogonidae; Culicoides; Species distribution model; Multiscale model. Acknowledgements: this work was financially supported as an internal call project (IC 6.7 BT-DYNVECT) by the EU network of Excellence (EPIZONE, contract nr FOOD-CT-2006 016236). We want to thank Dr. G. Liebisch (ZeckLab), Prof. G.A. Schaub (University of Bochum), Dr. M. Geier and Dr. T. Hörbrand (Biogents), who permitted us to analyse Culicoides data they sampled during the German BTD-monitoring, which was funded by the German Federal Ministry of Food and Agriculture (BMEL). We like to express our sincere gratitude to Prof. Dr. H.-J. Bätza (BMEL), who initiated the German BTD-monitoring. Finally, many thanks to Dr. Ute Bradter (University of Leeds), who provided the R code for the multi-scale variable selection and the random forest variable selection as published in Bradter et al. (2013). Received for publication: 6 August 2015. Revision received: 20 October 2015. Accepted for publication: 21 October 2015. ©Copyright R. Lühken et al., 2016 Licensee PAGEPress, Italy Geospatial Health 2016; 11:405 doi:10.4081/gh.2016.405 This article is distributed under the terms of the Creative Commons Attribution Noncommercial License (CC BY-NC 4.0) which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

of biting midges is correlated to land cover data with respect to the trapping sites. We differentiated eight scales, i.e. buffer zones with radii of 0.5, 1, 2, 3, 4, 5, 7.5 and 10 km, around each site, and chose several land cover variables. For each species, we built eight single-scale models (i.e. predictor variables from one of the eight scales for each model) based on averaged, generalised linear models and two multiscale models (i.e. predictor variables from all of the eight scales) based on averaged, generalised linear models and generalised linear models with random forest variable selection. There were no significant differences between performance indicators of models built with land cover data from different buffer zones around the trapping sites. However, the overall performance of multi-scale models was higher than the alternatives. Furthermore, these models mostly achieved the best performance for the different species using the index area under the receiver operating characteristic curve. However, as also presented in this study, the relevance of the different variables could significantly differ between various scales, including the number of species affected and the positive or negative direction. This is an even more severe problem if multi-scale models are concerned, in which one model can have the same variable at different scales but with different directions, i.e. negative and positive direction of the same variable at different scales. However, multi-scale modelling is a promising approach to model the distribution of Culicoides species, accounting much more for the ecology of biting midges, which uses different resources (breeding sites, hosts, etc.) at different scales.

Introduction Bluetongue disease (BTD) is a reportable, non-contagious viral infection of ruminants, which occurred in Germany for the first time in late summer 2006 (Conraths et al., 2012). Several species of the genus Culicoides (Diptera: Ceratopogonidae) are considered to be potential vectors of the bluetongue virus (BTV) (Meiswinkel et al., 2007; Dijkstra et al., 2008), while the concrete vector competence of the different species is still unresolved. In 2006 and 2007, a massive spread of BTD was observed in Germany and, at the end of 2007, nearly all federal states were affected. Until spring 2008, more than 17,000 cattle, sheep, and goat died from this disease, resulting in total costs of approximately 250 million Euros. Hence, Germany decided to start a compulsory vaccination program in 2008. The recent epidemic of the Schmallenberg virus in Europe again highlights the importance of

[Geospatial Health 2016; 11:405]

[page 119]


gh-2016_2.qxp_Hrev_master 31/05/16 11:44 Pagina 120

Article

Culicoides species as capable vectors (Beer et al., 2013), as these are also here considered to be the main vectors (De Regge et al., 2012; Rasmussen et al., 2012). Although there is huge lack of knowledge about the causal connection between environmental variables and the distribution of biting midges, several studies modelled biting midge distribution and phenology using different sets of environmental data (e.g. Purse et al., 2004, 2011; Calvete et al., 2008; Rigot et al., 2012; Kluiters et al., 2013). These modelling approaches used environmental data from various scales, e.g. all environmental data at one scale (e.g. 1 km, Kluiters et al., 2013) or at different scales (e.g. between 1 and 8 km, Calvete et al., 2008; Purse et al., 2011). However, as previously shown (Hamer and Hill, 2000), the selection of the spatial scales affects the outcome of the modelling, e.g. decrease the variance explained or bias regression coefficients, which might result in wrong conclusions and interpretations (reviewed by Bradter et al., 2013). Therefore, a selection of the appropriate scale is important to allow accurate species distribution modelling. Furthermore, as generally described by Bradter et al. (2013), species distribution can also be affected by land cover variables at multiple scales, e.g. if breeding sites, resting sites, and hosts of Culicoides biting midges are distributed over several scales. Land cover changes with distance to farm buildings, where sampling of biting midges commonly takes place (e.g. Kiel et al., 2009). The environment is generally modified most intensively around the main buildings in order to optimise farm management. The percentage of other, natural land cover variables (e.g. forest) increase with increasing scales around the trapping sites. This should, depending on breeding and resting sites, host preferences or species-specific flight range of Culicoides, result into different scale-specific variables, which are useful for the prediction of biting midge distribution. Therefore, an impact of multiple spatial scales relative to the trapping sites might be expected. A land cover variable can be a predictor at several scales. For example, when focusing on grassland as breeding site at the local scale, hosts are scattered on the grassland at a medium scale, while resting sites are on the edge of the grassland were the vegetation might be higher at the largest scale. Moreover, different variables can be predictors at different scales, e.g., a species breeds in the forest at a large distance, the hosts are present in direct vicinity of the trapping sites, and the resting sites are at a medium distance. In this study, we investigated the performance of single- and multiscale models to predict the distribution of Culicoides species on farms in western Germany with land cover variables at different scales. The objectives were: i) the evaluation of the spatial scales giving the best predictions for the species distribution of different biting midge species; ii) the evaluation if multi-scale models increase predictive ability; and iii) the

determination of the most important landscapes variables for the prediction of Culicoides species distribution at the different scales.

Materials and Methods Culicoides and landscape data In this study, we analysed a dataset from 46 trapping sites (for trapping site information see Werner, 2010; Hoffmann et al., 2009), covering a gradient from northwest to southwest Germany. At every site, adult Culicoides were sampled for 14 months (April 2007 until May 2008). Sampling was conducted during the first seven consecutive days of each month, using the BG-sentinel trap with black light following a standardised sampling protocol (Mehlhorn et al., 2009). All traps were placed in the immediate vicinity of the predominant residences of cattle. The main objective of this monitoring was to document the distribution and spread of BTV, but did not concern the distribution and abundance of the biting midge vectors. Therefore, most Culicoides samples were sorted at the group level only. Species identification was restricted to aliquots and based on morphological characters (Werner, 2010). These aliquots were restricted to a maximum of 10% of the total Culicoides sample. During the monitoring, the total number of trapped Culicoides ranged from zero to several thousands (Mehlhorn et al., 2009). Therefore, the number of Culicoides with respect to species differed strongly between the study months and trapping sites. Thus, only aggregated presence-absence data over a time-span of 14 month were analysed in this study. Furthermore, species with a prevalence of less than 10% or more than 90% were excluded. Biting midge data were obtained from light-trap sampling and landscape variables from the Authoritative Topographic-Cartographic Information System (Amtliches Topographisch-Kartographische Informationssystem, ATKIS速). As little is known about the ecology and flight range of Culicoides biting midges, an a priori selection of the appropriate scale for the modelling of species distribution was not possible. Therefore, we extracted the same landscape variables at eight different spatial scales (radii of 0.5, 1, 2, 3, 4, 5, 7.5, 10 km), which were used separately for single-scale models or all together for multi-scale modelling approaches for the prediction of species distribution. In order to analyse the land cover of each trapping site, we referred to a selection of land cover attributes provided by ATKIS速, assumed to be important for Culicoides biting midges. ATKIS速 provides linear and polygon vector data with a resolution of 1:5000 +/- 2.5 m positional accuracy. The same 14 landscape attributes were measured for all scales (Table 1). We extracted the percentage of

Table 1. The ATKIS速 land cover variables used for Culicoides species distribution modelling.

Forested areas/woodland

Agricultural and urban

Water bodies

[page 120]

Deciduous forest/coniferous forest (undifferentiated) Deciduous forest Coniferous forest Other forest (unspecified) Forest (sum of all forest) Other vegetation (unspecified) Arable land Grassland Garden Fallow land Settlement Ditch length Stream length Water

[Geospatial Health 2016; 11:405]

Abbrevation

Data type

Deco Deci Coni Othf Fore Othe Acre Gras Gard Fall Sett Ditc Stre Wate

Polygon Polygon Polygon Polygon Polygon Polygon Polygon Polygon Polygon Polygon Polygon Line Line Polygon


gh-2016_2.qxp_Hrev_master 31/05/16 11:44 Pagina 121

Article

surface per circular zone for each variable provided as polygon vector data and the line length per circular zone for each variable provided as linear vector data. This data collection was carried out using ArcGIS9.2 (ESRI, Redlands, CA, USA).

Statistical analyses

this set of models can be interpreted as having a 95% confidence that the best approximating model is included. The final averaged models were built by multiplication of estimated model coefficients with corresponding (see Strauss and Biedermann, 2006 for an example). Weighted coefficients were summed for each variable including all models per species on each buffer zone (single-scale models) or all buffer zones (multi-scale models).

Selection of scales for the variables included in the multi-scale models

Multi-scale models with random forest variable selection

Selection of variables for the multi-scale models was applied as proposed by Bradter et al. (2013). This preceding reduction of variables was selected to prevent inclusion of several variables at neighbouring scales that are often highly correlated with each other (Figure 1). Another advantage of such exclusion of variables is a significant reduction of computation time. We used univariate binomial logistic regression models for presence or absence data of each Culicoides species for each variable and all eight scales. Due to the small sample size (n=46), we used the corrected form of the Akaike information criterion (AICc), which indicates the best compromise between model complexity and likelihood for each model. The predictors of the different variables at the eight different scales were selected if i) the AICc was at least two lower than the AICc of the null model (intercept only); ii) the AICc was less than the next smaller or larger scale; and iii) the AICc was less than the AICc at the second smaller or larger scale (not applicable for the smallest and largest scale). With this method, we selected all local minima of the AICc, which had at least a difference of two compared to the null model for each predictor and each scale.

Although the modelling approach with averaging of multiple, generalised linear models are considered to be relative robust against overfitting (Burnham and Anderson, 2002), the large number of potential land cover variables included in the multiple models might cause such problems. Additionally, the exclusion of highly correlated variables can lead to the inclusion of variables at not meaningful, spatial scales. Therefore, but for multi-scale models only, we used a second modelling approach based on random forests for the variable selection, which was found to be robust even if the number of response data is generally small in comparison to the number of predictors (Strobl et al., 2007). This variable selection method was applied as described in detail by Bradter et al. (2013). In the random forest approach, several classification or regression trees are built from random subsets of the dataset (Breiman, 2001; Liaw and Wiener, 2002). The procedure uses a selection based on the unscaled permutation importance (Genuer et al., 2010). Each predictor is permutated in turn and the prediction error, i.e. the out of bag (OOB) error, before and after permutations is used as a measure of variable importance (Liaw and Wiener, 2002; Strobl et al., 2008). A training set is created by sampling 2/3 of the data set (with

Single- and multi-scale models built with model-averaging According to methods for species distribution modelling applied in other studies (e.g. Kattwinkel et al., 2009; Gray et al., 2010), singlescale and multi-scale generalised linear models were built in four steps. First, highly correlated variables (Spearman’s rho≥ 0.7) were excluded. For each highly correlated pair, the variable with the largest mean correlation with other variables were dropped. Second, univariate binomial logistic regression models were calculated for all variables on all scales for each species. Variables with P≥0.15 were excluded from further analysis, as they were not regarded as statistically significant. Third, multivariate binomial logistic regression models were built with every combination of the remaining variables from the previous two steps. We considered all possible models and did not use a stepwise model selection strategy, which are often criticised, e.g. because the results of this methods depend on the order in which variables enter the model (Burnham and Anderson, 2002; Whittingham et al., 2006). The large number of variables in different buffer zones per species results in a very large number of possible models. Such a brute force method might therefore not be the optimal approach (Burnham and Anderson, 2002). However, we had not enough information about the ecology of biting midges (e.g. breeding site preferences or resting sites) for an a priori exclusion of variables or restriction to a subset of possible models. Fourth, if several models were obtained for one species, model averaging was conducted (Burnham and Anderson, 2002). Model averaging approaches are considered to overcome problems such as overfitting or variable selection, which are found in modelling approaches aiming for a single best model (Burnham and Anderson, 2002). The Akaike weight, using AICc for calculation, can be interpreted as a measure of the strength of evidence for each model. We selected a 95% confidence set of models by sequentially summing until 0.95 was reached. According to Burnham and Anderson (2002),

Figure 1. Correlation network of all one hundred and fourteen predictors. The eight different scales of the fourteen variables are grouped. All correlations with a Spearman rho≥0.7 are indicated by a connection (red=negative correlation, green=positive correlation). See Table 1 for the abbreviation of the coefficients. Starting with acre on the 12 o’clock position and continuing clockwise as indicated in the legend.

[Geospatial Health 2016; 11:405]

[page 121]


gh-2016_2.qxp_Hrev_master 31/05/16 11:44 Pagina 122

Article

replacement) for each classification tree, which is then used to predict the remaining 1/3 of the data. The proportion of false classified classes over trees is the OOB error (Breiman, 2001; Liaw and Wiener, 2002). There were five steps to identify the number of predictors suitable for the model interpretation (Genuer et al., 2010): i) all predictors were ranked by the unscaled permutation importance (average value over 50 repetitions); ii) a regression tree was fitted to the curve of the plot of standard deviations of the importance measures ordered by their mean importance, with variables with a mean importance of less than the smallest predicted value of the regression tree model discarded; iii) the OOB errors for the models (average over 50 repetitions) were computed by starting with the most important variables and adding the other predictors in order of their ranking; iv) the model with the smallest OOB error, augmented with the standard deviation of the 50 repetitions, was selected; and finally v) the nested model with OOB error smaller than this with fewer predictors was selected. Parameters which have to be specified in the random forest were used as proposed by Genuer et al. (2010): number of trees built in the forest ntree=2000, the number of predictors available at each node split mtry=p/2 with p denoting the number of predictors, and error default values were used for the calculation of the OOB. Spatial autocorrelation For all models built with model averaging, Moran eigenvector filtering was applied for the full model without highly correlated and nonsignificant variables (Powney et al., 2010). If significant, these eigenvectors were added to the model and included in the model averaging procedure. Furthermore, as recommended by Bradter et al. (2013), we applied Moran eigenvector filtering for all multi-scale models selected

with random forest variable selection (Dray et al., 2006; Griffith and Peres-Neto, 2006). Spatial eigenvectors were added until residual spatial autocorrelation was no longer significant at the P=0.05 level. Performance assessment Nagelkerke’s R squared (R2N) was used as a measure of model calibration (Hosmer and Lemeshow, 2000). Area under the receiver operating characteristic curve (AUC) was used to compare prediction performance (Fielding and Bell, 1997). AUC thresholds were interpreted as proposed by Hosmer and Lemeshow (2000): 0.7-0.8 is considered an acceptable prediction; 0.8-0.9 is excellent and >0.9 is outstanding. Although this index is criticised as unreliable by some authors (Lobo et al., 2008), we predominantly referred to AUC, because it is the most commonly used performance indicator for species distribution models. However, as recommended by Lobo et al. (2008), we present further accuracy indices: root mean square error (RMSE), overall correct classification rate (CCR), sensitivity (SENS), specificity (SPEC), positive predictive value (PPV), negative predictive value (NPV), true skill statistic (TSS), and Cohen’s kappa (KAPPA) (Liu et al., 2009b, for accuracy indices formulae). For threshold dependent indices (e.g. CCR or KAPPA) and prevalence prediction, requiring binary results, presence and absence were differentiated using a threshold value set to achieve the observed prevalence in the training data set (Freeman and Moisen, 2008). We used bootstrapping 95% percentile confidence intervals to evaluate the statistical differences between the model performances on different scales (Pearman et al., 2008; Liu et al., 2009a). We generated 1000 bootstrap data sets (with replacement) for each species on each scale (single- and multi-scale models). Models were refitted with the

Figure 2. Performance of the Culicoides species models with area under the receiver operating characteristic curve (AUC) values ≥0.7. For each performance criterion [AUC, overall correct classification rate (CCR), Cohen’s kappa (KAPPA), negative predictive value (NPV), positive predictive value (PPV), root mean square error (RMSE), sensitivity (SENS), specificity (SPEC), true skill statistics (TSS)], the range and distribution of values for all models is shown. For each criterion, the left boxplot represents the single-scale model (single) and the right boxplot the multi-scale (multi) model.

[page 122]

[Geospatial Health 2016; 11:405]


gh-2016_2.qxp_Hrev_master 31/05/16 11:44 Pagina 123

Article

bootstrap data set. 95% confidence intervals (upper and lower 2.5% quartiles of the distribution) were calculated for each accuracy index. Non-overlapping confidence intervals were interpreted as significant differences between the scales. A threshold of 0.7 for the lower 2.5% quantile of the AUC, i.e. AUC2.5, was used to select acceptable models.

2012), randomForest (Liaw and Wiener, 2002), spdep (Bivand, 2014) and the built-in function glm for logistic regression models.

Results Software used Data visualisation and statistical analyses were conducted with R (R Core Team, 2014) using functions from the packages ggplot2 (Wickham, 2009), plyr (Wickham, 2011), qgraph (Epskamp et al.,

Eighteen species of the 26 species in the available dataset had prevalence higher than 10% and lower 90%, and thus were used in this modelling study. From these, 57 models for thirteen species (C. albicans, C.

Figure 3. Area under the receiver operating characteristic curve values with 95% bootstrapped confidence intervals. Upper and lower 2.5% quantiles of the distribution for all Culicoides species with prevalence between 10 and 90% and the different models are shown. Single-scale models at eight different scales and multi-scale models [multi-scale model built with model-averaging (mGLM) and random forest variable selection (mRF)] are shown.

[Geospatial Health 2016; 11:405]

[page 123]


gh-2016_2.qxp_Hrev_master 31/05/16 11:44 Pagina 124

Article

chiopterus, C. clastrieri, C. dewulfi, C. fagineus, C. grisescens, C. lupicaris, C. newsteadi, C. nubeculosous, C. riethi, C. stigma, C. scoticus, and C. vexans) fulfilled our performance criteria, i.e. at least one single- or multi-scale model with AUC2.5 â&#x2030;Ľ0.7. Only seven of these models provided a better model fit with spatial eigenvectors, i.e. three for C. albicans (0.5 km, 2 km, and multi-scale models built with model-averaging), one for C. lupicaris (3 km), one for C. newsteadi (0.5 km) and two for C. riethi (1 and 2 km). This result indicates that spatial auto-

correlation has little or no influence on the presence-absence at the other scales. R2N ranged from 0.2 to 0.5, which can be considered to be good for logistic regression models (Hosmer and Lemeshow, 2000; Kattwinkel et al., 2009). Moreover, according to the other accuracy indices, the performance of these models was satisfactory and indicated a better prediction than occurrence by chance (Figure 2). Most of the Culicoides species studied here had a relative high prevalence resulting in a higher specificity and positive predictive value compared

Figure 4. Percentage of Culicoides species influenced by each variable in the different models. Single-scale models on the eight different scales and multi-scale models [multi-scale model built with model-averaging (mGLM) and random forest variable selection (mRF)] are shown (gray=positive coefficient, black=negative coefficient). For the abbreviations of the coefficients see Table 1.

[page 124]

[Geospatial Health 2016; 11:405]


gh-2016_2.qxp_Hrev_master 31/05/16 11:44 Pagina 125

Article

to sensitivity and negative predictive value. In general, the accuracy indices did not show statistically significant differences, i.e. they had overlapping confidence intervals for the different species and scales (Figure 3). Nevertheless, the mean accuracy indices were overall slightly higher for the multi-scale models compared to the single-scale ones (Figures 2 and 3). The summary of the results with the multi- and single-scale models tells us that nearly all of the species studied were influenced by agricultural/urban and forest variables, while around 50% of the species were also influenced by water-related variables. However, looking into more detail, the percentage of species showing correlations with the different land cover variables could strongly differ between the different models and scales (Figure 4), while the weights of the different variables in the models built with model averaging did not show this (Figure 5). For most of these species (9 out of 13), multi-scale models showed the best performance, i.e. the highest AUC2.5 value per species. According to the mean AUC, seven of these models were characterised by excellent, another six by outstanding performance. These models were exemplarily applied for three different artificial landscapes for evaluation of the impact of the different land cover variables on the distribution of Culicoides species: i) increasing grassland-related and decreasing forest-related variables (Figure 6); ii) increasing arableland type variables and decreasing forest-related ones (Figure 6); and

Figure 5. Range and distribution of factor weights for Culicoides single- and multi-scale models. Models built with model-averaging separately shown for multi-scale and single-scale models. For the abbreviations of the coefficients see Table 1.

Figure 6. The Culicoides species occurrence in relation to variables illustrated by single-species habitat models. Black bars signify occurrence of Culicoides species. The gradient refers to the 95%-percentile of the data distribution: from low (5%-percentile) to high (95%-percentile) or vice versa. A) The gradient from left to right at all scales runs from low to high values of grassland (variable grass) and from high to low values of all forest variables (coni, deci, deco, fore, othf ). Water variables (ditc, stre, and wate) are fixed to high values. All other values are fixed to mean values. B) The gradient from left to right at all scales runs from low to high values of arable land (variable acre), and from high to low values of all forest variables (coni, deci, deco, fore, and othf ). Water variables (ditc, stre, and wate) are fixed to high values. All other values are fixed to mean values. C) The gradient from left to right in all scales runs from low to high values of all water variables (ditc, stre, and wate). All other values are fixed to mean values. For the abbreviations of the coefficients see Table 1.

[Geospatial Health 2016; 11:405]

[page 125]


gh-2016_2.qxp_Hrev_master 31/05/16 11:44 Pagina 126

Article

iii) increasing water-related variables (Figure 6). Some of the Culicoides species responded with a wide range under these scenarios: e.g. C. grisescens in the scenario increasing grassland type of variables and decreasing forest ones (Figure 6) or C. dewulfi in the scenario with increasing water-related variables (Figure 6). In contrast, C. lupicaris did not occur under the three applied scenarios (Figure 6). In the best model, the species had a negative association with the proportion of fallow land at the 3-km scale that was not studied in the three scenarios. However, the other species showed a distinct response under at least one of the scenarios, e.g. C. chiopterus, C. scoticus and C. stigma were more restricted to the left of the gradient for the forest type of variables (low grassland/low arable-land, and high forest variables), while C. clastrieri were more restricted to the right end of the gradient (Figure 6). Under the same landscape scenarios, the presence-absence predictions changed in dependence of the applied single-scale models on the different scales (Figure 7). In addition, the landscape context could be important. For example, the distribution of C. chiopterus is affected by grassland, but the prevalence predictions differ depending on the scale choosen (Figure 7).

Discussion Species distribution modelling is the most important method to predict species distribution in general, including Culicoides biting midges. Since the availability of digital datasets of land cover, temperature, or potential hosts is continuously increasing, several studies have also used this kind of data to predict the prevalence of biting midge species, e.g. the normalised difference vegetation index (NDVI) (Purse et al., 2004; Calvete et al., 2008; Kluiters et al., 2013) or the CORINE land cover data (Kirkeby et al., 2009; Purse et al., 2011). These data are available or used on different scales raising the question, which spatial scale, or scales, should be chosen to reach the best possible predictions for different biting midge species. At the same time, there are huge knowledge gaps on the ecology of Culicoides species, which would allow choosing the appropriate scale of predictors, e.g. missing information on the flight range or resting sites. Therefore, a priori selections of appropriate scaling of the variables used for Culicoides distribution modelling is not possible. Active dispersal of Culicoides is generally expected to be limited. Kettle (1995) identifies the zone of about 500 m around the farmyard to be the most important and a substantial reduction of the number of adult C. molestus and C. subimmaculatus was achieved by measures targeting breeding sites within this radius. Furthermore, Culicoides abundance was found to decrease with increasing distance to potential hosts or breeding sites (Kettle, 1995; LĂźhken and Kiel, 2012; Rigot et al., 2012; Kirkeby et al., 2013a). Moreover, it has been proposed that the direct surroundings of farms provide a huge number of potential breeding sites (Zimmer et al., 2008, 2014; Foxi and Delrio, 2010; GonzĂĄlez et al., 2013). In this study, all traps were placed in immediate vicinity to the predominant residences of the cattle directly on the farms, thus, it might be expected that Culicoides species are captured near their breeding sites and land cover information at the smaller scales around the lighttraps should have the highest predictive performance. However, in the majority of cases, the model performance did not differ significantly between the models based on variables from different buffer zones. This matches the study by Kirkeby et al. (2013a), where the covariate distance to the breeding site also did not explain differences in [page 126]

Figure 7. Occurrence of Culicoides chiopterus illustrated by different habitat models at different scales (A) and illustrated by different single-scale models at different scales and multi-scale models (B). Occurrence of Culicoides chiopterus is signified by black bars. The gradient refers to the 95%-percentile of the data distribution: from low (5%-percentile) to high (95%-percentile) or vice versa. The multi-scale model shown in B was built with model-averaging (mGLM) and random forest variable selection (mRF). A) For each scale, the gradient from left to right runs from low to high values of grassland (variable gras), and from low to high values of water variables (ditc, stre, and wate) in the same scale are fixed to high values. Furthermore, the gradient over all scales runs from high to low values for all forest variables (coni, deci, deco, fore, and othf ). All other values are fixed to mean values. B) The gradient from left to right runs from low to high values of grassland (variable gras), and from high to low values of all forest variables (coni, deci, deco, fore, othf ). Water variables (ditc, stre, and water) are fixed to high values. All other values are fixed to mean values. For the abbreviations of the coefficients see Table 1.

[Geospatial Health 2016; 11:405]


gh-2016_2.qxp_Hrev_master 31/05/16 11:44 Pagina 127

Article

Culicoides trapping. One explanation for this result might be that the dispersal of Culicoides is much higher in general than generally expected. Indeed, the small number of data available from markrelease-recapture studies, indicate dispersal distances between two and six km (reviewed by Kirkeby et al., 2013b). Another explanation for the lack of higher performance of models at smaller scales could be the underlying data for Culicoides. They represent aggregated presenceabsence data from a sampling conducted over several months. Therefore, the probability to trap rare Culicoides species might have been high. Furthermore, at this point, it must also be taken into consideration that the analysed dataset was relative small (N=46), which might result in reduced predictive accuracy (Stockwell and Peterson, 2002). It would perhaps be more appropriate to interpret the modelling results as descriptive rather than predictive (Williams and Hero, 2001; Stockwell and Peterson, 2002). A comprehensive interpretation of our modelling results is hampered by different circumstances. According to our data, the same variables, e.g. the forest-related ones, had a significant correlation with each other at different scales or with other variables. At the same time, highly correlated variables should not be included in the same statistical regression models, because then small changes in the model or data can result in strong changes of the coefficient estimates (reviewed by Dormann et al., 2013). Therefore, as conducted in this study, it might be recommended to conduct a threshold-based pre-selection to exclude highly correlated variables. However, a preliminary exclusion of variables can result in problems regarding the interpretation of final models and omitted variables have to be considered in the conclusions to be drawn (Dormann et al., 2013). Furthermore, as presented in this study, several species were influenced by different land cover variables at different scales or the same variables have a different algebraic sign (positive or negative) at different scales, e.g. a negative correlation with forest variables in the model at the local scale and a positive correlation with forest variables in the model at a larger scale. This causes problems for the interpretation, which even increases in multi-scale models where one final model can include the same variable at different scales with different algebraic signs, e.g. a negative and positive correlation with the forest variable at different scales in the same model. Our analysis was restricted to Culicoides presence-absence data from 46 sampling sites, as part of a wide-meshed monthly monitoring over 14 months in Germany and not primarily focused on entomological data, but virus detection in biting midges. However, additional data on species abundance or data covering longer time periods with shorter sampling intervals do not exist at present. Nevertheless, the available data give a first impression on land cover variables explaining the distribution of the German Culicoides fauna. Moreover, the German land cover data ATKIS® were successfully used to develop species distribution models for thirteen Culicoides species, including C. chiopterus, C. dewulfi, and C. scoticus as potential vectors of the BTV (Meiswinkel et al., 2007; Dijkstra et al., 2008) and Schmallenberg virus (Meiswinkel et al., 2007; Dijkstra et al., 2008; De Regge et al., 2012; Rasmussen et al., 2012). Furthermore, our study showed that multi-scale modelling is a promising approach to model the distribution of Culicoides species. Although multi-scale models did often not show significant differences compared to single-scale models, the overall performance of these models was higher. Furthermore, multi-scale models principally fulfilled the best performance for the different species using the AUC values. A multi-scale approach offers the opportunity to include a diverse set of variables at different scales. This is especially important for haematophagous insects, e.g. when breeding sites, resting sites or host density have to be taken into account for modelling generally distributed across several scales.

Conclusions Although several studies have increased our knowledge on the breeding sites and their colonisation by different Culicoides species (Foxi and Delrio, 2010; González et al., 2013; Harrup et al., 2013; Zimmer et al., 2014), the causal connections with environmental parameters mostly remain unknown. Therefore, besides the evaluation of different modelling techniques and the implementation of further environmental parameters, there is an urgent need for experimental studies on these relationships.

References Beer M, Conraths FJ, Van Der Poel WHM, 2013. ‘Schmallenberg virus‘: a novel orthobunyavirus emerging in Europe. Epidemiol Infect 141:1-8. Bivand R, 2014. Spdep: spatial dependence: weighting schemes, statistics and models. Available from: https://cran.r-project.org/web/packages/spdep/index.html Bradter U, Kunin WE, Altringham JD, Thom TJ, Benton TG, 2013. Identifying appropriate spatial scales of predictors in species distribution models with the random forest algorithm. Methods Ecol Evol 4:167-74. Breiman L, 2001. Random forests. Mach Learn 45:5-32. Burnham KP, Anderson DR, 2002. Model selection and multimodel inference: a practical information-theoretic approach. 2nd ed. Springer, New York, NY, USA. Calvete C, Estrada R, Miranda MA, Borrás D, Calvo JH, Lucientes J, 2008. Modelling the distributions and spatial coincidence of bluetongue vectors Culicoides imicola and the Culicoides obsoletus group throughout the Iberian Peninsula. Med Vet Entomol 22:12434. Conraths FJ, Eschbaumer M, Freuling C, Gethmann J, Hoffmann B, Kramer M, Probst C, Staubach C, Beer M, 2012. Bluetongue disease: an analysis of the epidemic in Germany 2006-2009. In: Mehlhorn H, ed. Arthropods as vectors of emerging diseases. Springer, Heidelberg, Germany, pp 103-35. De Regge N, Deblauwe I, De Deken R, Vantieghem P, Madder M, Geysen D, Smeets F, Losson B, van den Berg T, Cay AB, 2012. Detection of Schmallenberg virus in different Culicoides spp. by real-time RTPCR. Transbound Emerg Dis 59:471-5. Dijkstra E, van der Ven IJK, Meiswinkel R, Holzel DR, Van Rijn PA, Meiswinkel R, 2008. Culicoides chiopterus as a potential vector of bluetongue virus in Europe. Vet Rec 162:422. Dormann CF, Elith J, Bacher S, Buchmann C, Carl G, Carré G, Marquéz JRG, Gruber B, Lafourcade B, Leitão PJ, Münkemüller T, McClean C, Osborne PE, Reineking B, Schröder B, Skidmore AK, Zurell D, Lautenbach S, 2013. Collinearity: a review of methods to deal with it and a simulation study evaluating their performance. Ecography 36:27-46. Dray S, Legendre P, Peres-Neto PR, 2006. Spatial modelling: a comprehensive framework for principal coordinate analysis of neighbour matrices (PCNM). Ecol Model 196:483-93. Epskamp S, Cramer AOJ, Waldorp LJ, Schmittmann VD, Borsboom D, 2012. qgraph: network visualizations of relationships in psychometric data. J Stat Softw 48:1-18. Fielding AH, Bell JF, 1997. A review of methods for the assessment of prediction errors in conservation presence/absence models.

[Geospatial Health 2016; 11:405]

[page 127]


gh-2016_2.qxp_Hrev_master 31/05/16 11:44 Pagina 128

Article

Environ Conserv 24:38-49. Foxi C, Delrio G, 2010. Larval habitats and seasonal abundance of Culicoides biting midges found in association with sheep in northern Sardinia, Italy. Med Vet Entomol 24:199-209. Freeman EA, Moisen GG, 2008. A comparison of the performance of threshold criteria for binary classification in terms of predicted prevalence and kappa. Ecol Model 217:48-58. Genuer R, Poggi J-M, Tuleau-Malot C, 2010. Variable selection using random forests. Pattern Recogn Lett 31:2225-36. González M, López S, Mullens BA, Baldet T, Goldarazena A, 2013. A survey of Culicoides developmental sites on a farm in northern Spain, with a brief review of immature habitats of European species. Vet Parasitol 191:81-93. Gray TNE, Phan C, Long B, 2010. Modelling species distribution at multiple spatial scales: gibbon habitat preferences in a fragmented landscape: gibbon distribution in fragmented landscapes. Animal Conserv 13:324-32. Griffith DA, Peres-Neto PR, 2006. Spatial modeling in ecology: the flexibility of eigenfunction spatial analyses. Ecology 87:2603-13. Hamer KC, Hill JK, 2000. Scale-dependent effects of habitat disturbance on species richness in tropical forests. Conserv Biol 14:143540. Harrup LE, Purse BV, Golding N, Mellor PS, Carpenter S, 2013. Larval development and emergence sites of farm-associated Culicoides in the United Kingdom. Med Vet Entomol 27:441-9. Hoffmann B, Bauer B, Bätza H-J, Beer M, Clausen PH, Geier M, Gethmann JM, Kiel E, Liebisch G, Liebisch A, Mehlhorn H, Schaub G, Werner D, Conraths FJ, 2009. Monitoring of putative vectors of bluetongue virus serotype 8, Germany. Emerg Infect Dis 15:1481-4. Hosmer DW, Lemeshow S, 2000. Applied logistic regression. Wiley, New York, NY, USA. Kattwinkel M, Strauss B, Biedermann R, Kleyer M, 2009. Modelling multi-species response to landscape dynamics: mosaic cycles support urban biodiversity. Landscape Ecol 24:929-41. Kettle DS, 1995. Ceratopogonidae (biting midges). In: Kettle DS, ed. Medical and veterinary entomology. CAB International, Wallingford, UK, pp 152-76. Kiel E, Liebisch G, Focke R, Liebisch A, Werner D, 2009. Monitoring of Culicoides at 20 locations in northwest Germany. Parasitol Res 105:351-7. Kirkeby C, Bødker R, Stockmarr A, Enøe C, 2009. Association between land cover and Culicoides (Diptera: Ceratopogonidae) breeding sites on four Danish cattlefarms. Entomol Fenn 20:228-32. Kirkeby C, Bødker R, Stockmarr A, Lind P, 2013a. Spatial abundance and clustering of Culicoides (Diptera: Ceratopogonidae) on a local scale. Parasite Vector 6:43. Kirkeby C, Bødker R, Stockmarr A, Lind P, Heegaard PMH, 2013b. Quantifying dispersal of European Culicoides (Diptera: Ceratopogonidae) vectors between farms using a novel markrelease-recapture technique. PLoS One 8:e61269. Kluiters G, Sugden D, Guis H, McIntyre KM, Labuschagne K, Vilar MJ, Baylis M, 2013. Modelling the spatial distribution of Culicoides biting midges at the local scale. J Appl Ecol 50:232-42. Liaw A, Wiener M, 2002. Classification and regression by randomForest. R News 2:18-22. Liu C, White M, Newell G, 2009a. Assessing the accuracy of species distribution models more thoroughly. Available from: www.mssanz.org.au/modsim09/J1/liu_c_J1a.pdf Liu C, White M, Newell G, 2009b. Measuring the accuracy of species distribution models: a review. Available from: www.mssanz.org.au/modsim09/J1/liu_c_J1b.pdf [page 128]

Lobo JM, Jiménez-Valverde A, Real R, 2008. AUC: a misleading measure of the performance of predictive distribution models. Global Ecol Biogeogr 17:145-51. Lühken R, Kiel E, 2012. Distance from the stable affects trapping of biting midges (Diptera, Ceratopogonidae). J Vector Ecol 37:453-7. Mehlhorn H, Walldorf V, Klimpel S, Schaub G, Kiel E, Focke R, Liebisch G, Liebisch A, Werner D, Bauer C, Clausen H, Bauer B, Geier M, Hörbrand T, Bätza H-J, Conraths FJ, Hoffmann B, Beer M, 2009. Bluetongue disease in Germany (2007-2008): monitoring of entomological aspects. Parasitol Res 105:313-9. Meiswinkel R, van Rijn P, Leijs P, Goffredo M, 2007. Potential new Culicoides vector of bluetongue virus in northern Europe. Vet Rec 161:564-5. Pearman PB, Randin CF, Broennimann O, Vittoz P, van der Knaap WO, Engler R, Lay GL, Zimmermann NE, Guisan A, 2008. Prediction of plant species distributions across six millennia. Ecol Lett 11:35769. Powney GD, Grenyer R, Orme CDL, Owens IPF, Meiri S, 2010. Hot, dry and different: Australian lizard richness is unlike that of mammals, amphibians and birds: hot, dry and different. Global Ecol Biogeogr 19:386-96. Purse BV, Falconer D, Sullivan MJ, Carpenter S, Mellor PS, Piertney SB, Mordue Luntz AJ, Albon S, Gunn GJ, Blackwell A, 2011. Impacts of climate, host and landscape factors on Culicoides species in Scotland. Med Vet Entomol 26:168-77. Purse BV, Tatem AJ, Caracappa S, Rogers DJ, Mellor PS, Baylis M, Torina A, 2004. Modelling the distributions of Culicoides bluetongue virus vectors in Sicily in relation to satellite-derived climate variables. Med Vet Entomol 18:90-101. Rasmussen LD, Kristensen B, Kirkeby C, Rasmussen TB, Belsham GJ, Bødker R, Bøtner A, 2012. Culicoides as vectors of Schmallenberg virus. Emerg Infect Dis 18:1204-5. R Core Team 2014. R: a language and environment for statistical computing. R foundation for statistical computing, Vienna, Austria. Rigot T, Conte A, Goffredo M, Ducheyne E, Hendrickx G, Gilbert M, 2012. Predicting the spatio-temporal distribution of Culicoides imicola in Sardinia using a discrete-time population model. Parasite Vector 5:270. Rigot T, Vercauteren Drubbel M, Delécolle J-C, Gilbert M, 2012. Farms, pastures and woodlands: the fine-scale distribution of Palearctic Culicoides spp. biting midges along an agro-ecological gradient. Med Vet Entomol 27:29-38. Stockwell DR, Peterson AT, 2002. Effects of sample size on accuracy of species distribution models. Ecol Model 148:1-13. Strauss B, Biedermann R, 2006. Urban brownfields as temporary habitats: driving forces for the diversity of phytophagous insects. Ecography 29:928-40. Strobl C, Boulesteix A-L, Kneib T, Augustin T, Zeileis A, 2008. Conditional variable importance for random forests. Bioinformatics 9:307. Strobl C, Boulesteix A-L, Zeileis A, Hothorn T, 2007. Bias in random forest variable importance measures: illustrations, sources and a solution. Bioinformatics 8:25. Werner D, 2010. Forschungsvorhaben 2808HS007. Available from: http://download.ble.de/08HS007.pdf Whittingham MJ, Stephens PA, Bradbury RB, Freckleton RP, 2006. Why do we still use stepwise modelling in ecology and behaviour? J Anim Ecol 75:1182-9. Wickham H, 2009. Ggplot2: elegant graphics for data analysis. Springer, New York, NY, USA. Wickham H, 2011. The split-apply-combine strategy for data analysis. J

[Geospatial Health 2016; 11:405]


gh-2016_2.qxp_Hrev_master 31/05/16 11:44 Pagina 129

Article

Stat Softw 40:1-29. Williams SE, Hero J-M, 2001. Multiple determinants of Australian tropical frog biodiversity. Biol Conserv 98:1-10. Zimmer J-Y, Brostaux Y, Haubruge E, Francis F, 2014. Larval development sites of the main Culicoides species (Diptera: Ceratopogonidae) in northern Europe and distribution of

coprophilic species larvae in Belgian pastures. Vet Parasitol 205:676-86. Zimmer J-Y, Haubruge E, Francis F, Bortels J, Simonon G, Losson B, Mignon B, Paternostre J, De Deken R, De Deken G, Deblauwe I, Fassotte C, Cors R, Defrance T, 2008. Breeding sites of bluetongue vectors in northern Europe. Vet Rec 162:131.

[Geospatial Health 2016; 11:405]

[page 129]


gh-2016_2.qxp_Hrev_master 31/05/16 11:44 Pagina 130

Geospatial Health 2016; volume 11:381

Trends in obesity at the national and local level among South Korean adolescents Kyungsoo Han,1 Sejin Park,1 JĂźrgen Symanzik,2 Sookhee Choi,3 Jeongyong Ahn1 1Department of Statistics, Chonbuk National University, Jeonbuk, Korea; 2Department of Mathematics and Statistics, Utah State University, Logan, UT, USA; 3Woosuk University, Jeonbuk, Korea

Abstract

Introduction

Obesity is a global phenomenon affecting all socioeconomic groups, irrespective of age, gender or ethnicity. In many countries, obesity trends are causing serious public health concerns threatening the viability of basic health care delivery. In this article, we examine the trends of adolescent obesity at the national level in South Korea introducing a new approach for visualising data at the local level based on linked micromap plots. Our analysis shows that the obesity rates for 2013 have only increased slightly since 2006 for South Korean adolescents of both genders in various age groups. However, considerable increases could be observed for the subgroup of adolescent males and adolescent females living in rural areas. Trends at the local level show a slight increase of the prevalence of obesity in most regions of the country, with the highest obesity prevalence found in the Northeast.

The prevalence of obesity among children and adolescents has become a serious epidemic health problem that is now estimated to be the fifth leading cause of mortality in the world (James et al., 2008). Excess body weight can also affect the quality of life, education and income potential (Kolotkin et al., 2001; McLaren, 2007; Puhl and Heuer, 2009). In addition, obesity is also associated with various diseases and it increases the risk of premature death (Fontaine et al., 2003; Pichainarong et al., 2006). Analysing the data and reporting the statistics are therefore important for the identification of health trends and establishment of normal health patterns. Overweight and obesity are most likely the result of complex interactions between genes, lifestyles, dietary habits and socioeconomic factors. As the targets of many public health strategies, life-related factors are modifiable and have been highlighted in many investigations. It is evident that life-related factors, such as physical activity and eating habits, are associated with paediatric overweight and obesity (Han et al., 2010). In many cases, health data are presented in extensive data tables that are often sorted alphabetically. However, such tabular presentations tend to be uninformative as it is difficult to quickly find the extreme values (minimum and maximum) in a table and to identify table rows with similar contents. In case of geographic information, tables usually do not allow for a quick link between table rows representing specific geographic regions and the locations of these regions on a map. This fact suggests that the conversion of tabular data into a visual context would better illustrate patterns and relationships in the data as pointed out by Gebreab et al. (2008). In addition, interpreting and understanding health data usually requires the consideration of the appropriate spatial context because many health datasets come with geographic information. Some health studies in the past incorporated the spatial context in the data analysis (Edsall, 2003; Ezzati et al., 2006; Jacob et al., 2010). Most of the past studies, however, used only simple images and maps, such as aerial images, choropleth or isopleth maps for the visualisation of health data. Such images and maps have been popular in spite of several problems and limitations (Dent, 1993; Harris, 1999). The greatest problem is that it is difficult to show more than one statistical variable in an aerial image or a map. Another problem is that a viewer cannot easily observe trends, relationships and anomalies that may be present (Gebreab et al., 2008). To solve these problems, we use linked micromap (LM) plots, a statistical graph that combines multiple small maps and statistical data panels. LM plots were first introduced in the United States (USA) in 1996 by Carr and Pierson (1996) and have been used for various medical and public health applications in the past (Carr, 2001; Wang et al., 2002). Comprehensive overviews for the use and creation of LM plots have been published by Symanzik and Carr (2008) and Carr and Pickle (2010). LM plots are graphs that link statistical information to an

Correspondence: Jeongyong Ahn, Department of Statistics, Institute of Applied Statistics, Chonbuk National University, 561-756 Jeonbuk, Korea. Tel: +82.63.2703392. E-mail: jyahn@jbnu.ac.kr Key words: Data visualisation; Geographic patterns; Linked micromap plots; Obesity; South Korea. Acknowledgments: this paper was supported by research funds of Chonbuk National University in 2012. This research was also supported by the Basic Science Research Program through the National Research Foundation (NRF) of Korea funded by the Ministry of Education (NRF2012R1A1A4A01002729). Received for publication: 29 May 2015. Revision received: 15 September 2015. Accepted for publication: 10 December 2015.

ŠCopyright K. Han et al., 2016 Licensee PAGEPress, Italy Geospatial Health 2016; 11:381 doi:10.4081/gh.2016.381 This article is distributed under the terms of the Creative Commons Attribution Noncommercial License (CC BY-NC 4.0) which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

[page 130]

[Geospatial Health 2016; 11:381]


gh-2016_2.qxp_Hrev_master 31/05/16 11:44 Pagina 131

Article

ordered set of small maps in order to easily explore and communicate patterns in the outcome variable, explanatory variables, geographic locations and the associations among them. LM plots make use of aggregated values, so each individualâ&#x20AC;&#x2122;s information is kept strictly confidential. In fact, LM plots are not designed to show specific data at a particular geographic location but, rather, to group information into meaningful geographical subregions, such as states, provinces, or districts. LM plots have been adapted for Korea only recently (Ahn, 2013; Han et al., 2014). Although they have been used for a variety of other medical and health data sets in the United States (Carr, 2001; Gebreab et al., 2008), to the best of our knowledge, they have not been used in the obesity context. In this study, we describe how spatial patterns of the prevalence of obesity and life-related indicators of adolescents in South Korea can be examined visually and simultaneously using LM plots. First, we examine the trends of adolescent obesity at the national level comparing the rates of obesity and other indicators in the two groups, metropolitan versus rural and male versus female. Second, we visualise the obesity data at the local level using LM plots, with which we can better explore and understand the relationships of the variables and geographic patterns, which are inherent in the data.

Metropolitan areas often provide better socioeconomic conditions (such as educational level, income, and wealth) than rural areas. Therefore, we divided the 16 upper-level local autonomies of Korea into two groups, metropolitan (Seoul, Busan, Daegu, Incheon, Gwangju, Daejeon, Ulsan, and Gyeonggi) and rural (Gangwon, Chungbuk, Chungnam, Jeonbuk, Jeonnam, Gyeongbuk, Gyeongnam, and Jeju), according to the extent of urbanisation to determine whether the means of these two groups are equal.

Visualisation To visualise the geographically referenced data, LM plots use three or more sequence panels (micromap panel, label panel and one or more statistical summary panels) linked in parallel by location. The micromap panel contains miniature drawings outlining a region. This makes the shape and neighbourhood relationships of small geographic

Table 1. Variables from the Korean Statistical Information Service data set. Category

Variable

Data sources

Obesity, physical activity, being without breakfast, fast-food consumption, stress, depression, smoking, drinking Demographics Gender, age, year Location Name of region

To analyse and visualise the prevalence of obesity of adolescent in Korea, we used the Korean Youth Risk Behavior Web-based Survey (KYRBWS) (http://www.yhs.cdc.go.kr) data for the period from 2006 to 2013. The data were obtained from the Korean Statistical Information Service (KOSIS) (http://www.kosis.kr). KOSIS is the gateway for Koreaâ&#x20AC;&#x2122;s official statistics and provides access to data from more than 120 statistical agencies covering more than 500 different topics. We aggregated the data into the 16 upper-level, local autonomies of Korea using the traditional geographic breakdown prior to the creation of Sejong as a special autonomous city in 2007. As of 2006, Korea is divided into eight provinces, one special autonomous province, six metropolitan cities and one special autonomous city. The dataset used in this article was obtained from KOSIS and it consists of eleven variables and a region name. Table 1 lists eight indicator variables and three demographic variables from the data set used in this study.

Figure 1. The prevalence of obesity by age and gender in 2013.

Indicator

Materials and Methods

Figure 2. Trends of the prevalence of obesity by year, gender and age.

[Geospatial Health 2016; 11:381]

[page 131]


gh-2016_2.qxp_Hrev_master 31/05/16 11:44 Pagina 132

Article

subregions more visible while still maintaining the relevant information in the miniature maps. The label panel gives the names of the geographic subregions, while the statistical panels can represent many different forms of statistical summaries, including box-plots, dot-plots, time series plots, confidence intervals (CI) and others. In addition, the geographic subregions in LM plots are sorted based on the statistical variable of interest, which improves perception among consecutive panels from the top to the bottom of the display. To draw the viewerâ&#x20AC;&#x2122;s attention to focus on a specific area at a time, LM plots divide the map regions into perceptual groups of five or fewer subregions that are highlighted on one of the small maps. The geographic locations, the names of the subregions and the statistical data are linked via a colour scheme. These features make it possible to visualise specific geographic patterns in the data that are often lost in other types of graphs and maps. A good explanatory example of a LM plot that further describes the overall concept can be found in Gebreab et al. (2008).

Statistical analysis To explore and analyse the data, common data analysis methods were used in this study. Exploratory data analysis methods are primarily used to explore the data before using more quantitative traditional methods. We used two-sample t-tests (Snedecor and Cochran, 1989) to compare the means of the variables in the two groups. In addition, we fitted regression models to statistically investigate the temporal trends of the prevalence of obesity. With such models, we could determine the relationship between obesity rate and time. Lastly, rank correlation analysis was used to measure the relationship between obesity and other indicators.

ables except depression and physical activity were (slightly) higher in the rural areas, even though the difference was not always statistically significant. Figure 3 displays the temporal development in the prevalence of obesity for both genders in the regional subgroups. The tests were conducted to determine whether there was an increase over time, i.e., whether the slope of a regression line was positive. The P value for the slopes of adolescent males and adolescent females in rural areas were 0.012 and 0.005, respectively. Therefore, we can conclude that these slopes were greater than zero, which implies an increasing trend over the study period. There was no significant increase of the slopes of adolescent males and adolescent females in the metropolitan areas (P=0.304 and 0.131, respectively). Adolescent males in the rural areas had the highest absolute increase in the prevalence of obesity, while adolescent males in the metropolitan areas had the lowest increase. It is also noteworthy that adolescent females in the rural areas had a slightly higher obesity prevalence than adolescent females in the metropolitan areas from 2007 to 2013. For adolescent males, this was only the case for 2012 and 2013 and it was the other way around from 2006 to 2009.

Visualising data at the local level The LM plots combine exploratory data analysis capabilities with traditional statistical graphics while maintaining the geographical con-

Table 2. Comparison of mean by group (%). Variable

Results Data analysis at the national level Figure 1 shows the prevalence of obesity for 2013 by age and gender. The prevalence of obesity for adolescent males in each age group is about twice as high compared to that for adolescent females. The causes for this difference are not immediately obvious. However, an examination of studies with attention to potential gender differences reveals that such differences are common, both before and during puberty (Reilly and Wilson, 2006; Wisniewski and Chernausek, 2009). Moreover, the prevalence of obesity for both genders at age 18 is about twice as high as the prevalence of obesity at age 13. Figure 2 shows the temporal development of the obesity rate by gender and age group from 2006 to 2013. There were no statistically significant trends in any of the six age groups (13 to 18) for any of the genders. However, a slightly increasing trend from 2006 to 2013 was observed for ten of the twelve age/gender combinations. Only the obesity prevalence of adolescent males at the age of 14, and of adolescent females at the age of 18, decreased slightly from 2006 to 2013. In 2013, the average income per household for the metropolitan group was about $43,310, while that of the rural one was about $38,350 and they were significantly different at the 5% level for the two-tailed test (P=0.014). The rates of obesity and the seven other indicator variables in the two groups, based on aggregated data from 2006 to 2013, were compared via two-sample t-tests. Table 2 shows the results of these comparisons. The rates for obesity, smoking and physical activity for both groups were significantly different at the 5% level for the twotailed test (P=0.001, 0.009 and 0.046, respectively). There were no significant statistical differences between the two groups for the remaining variables. However, it should be noted that the levels for all vari[page 132]

Group A Group B (metropolitan area) (rural area)

Obesity Physical activity Being without breakfast Fast-food consumption Stress Depression Smoking Drinking

8.7 10.8 25.4 17.5 43.2 36.2 11.7 21.8

9.0 10.5 26.3 17.8 43.6 35.9 13.0 22.8

P 0.001* 0.046* 0.678 0.992 0.716 0.950 0.009* 0.480

*Statistically significant result at the 5% level of significance.

Figure 3. Trends of the prevalence of obesity by year, gender and regional subgroup.

[Geospatial Health 2016; 11:381]


gh-2016_2.qxp_Hrev_master 31/05/16 11:44 Pagina 133

Article

text of the data. Figure 4 shows a LM plot used to explore the spatial patterns and relationships among obesity, stress, starting the day without breakfast, and depression at the local level. The numbers between the statistical panels indicate the rank of a local autonomous entity in the statistical panels to the right. The subregions were sorted top to bottom with respect to decreasing prevalence of obesity. The map panels of the LM plot exhibit the geographic pattern: the highest prevalence of obesity (shown in the statistical panel no. 1) was found in the North and East (with Jeju a spatial outlier); the lowest prevalence was found in the West and in the South. Obesity did not show any association with stress and being without breakfast (Spearmanâ&#x20AC;&#x2122;s rho: -0.122 and 0.024, P: 0.653 and 0.931, respectively) but obesity and depression had a negative association (Spearmanâ&#x20AC;&#x2122;s rho: -0.560, P: 0.024). In addition, the data in the statisti-

cal panel no. 2 (stress) and no. 3 (being without breakfast) as well as in the statistical panel no. 2 (stress) and no. 4 (depression) are highly correlated. In general, stress, eating behaviour and depression have been recognised as important determinants of obesity (Torres and Nowson, 2007; Luppino et al., 2010). However, as shown in Figure 4, obesity in adolescents in Korea were not (or even negatively) associated with these factors. This result complements other studies also based on the KYRBWS data set (Yu, 2011; Byeon, 2013). Many past studies report that people engaged in any of the four factors mentioned in Table 1, i.e. physical activity, smoking, drinking and fast-food consumption, are more likely to also engage in other high-risk behaviour (Strauss, 2000; Brownell, 2004; Baek, 2008; Kleiser et al., 2009; Kries et al., 2014), but they had no significant effect in this study (Spearmanâ&#x20AC;&#x2122;s rho: -0.210, -0.186, 0.099, and -0.269, respectively).

Figure 4. Linked micromap plot showing the relationships among obesity, stress, starting the day without breakfast, and depression based on regional averages in the period 2006-2013.

[Geospatial Health 2016; 11:381]

[page 133]


gh-2016_2.qxp_Hrev_master 31/05/16 11:44 Pagina 134

Article

Figure 5 shows the regional trends and increases of the prevalence of obesity for the period from 2006 to 2013. Jeju had the largest margin in both absolute increase and relative increase from 2006 to 2013. The relative increases were higher in rural areas than in most metropolitan areas. In 2012, the rates in Gangwon, Jeonbuk and Jeonnam increased considerably but they also decreased considerably again in 2013, which could simply be due to a data-collection problem in these three subregions in 2012. The obesity rates in 2013, compared to 2006, had increased (slightly) in all regions except in Seoul. Moreover, as a result of the regression analysis, the slopes for seven of the regions (Gwangju, Gyeongnam, Jeonbuk, Jeonnam, Chungnam, Gangwon, and Busan) were significantly greater than zero at the 5% level for the onetailed test.

Table 3. Comparison of mean by gender (%). Variable Physical activity Being without breakfast Fast-food consumption Stress Depression Smoking Drinking

Figure 5. Trends of the prevalence of obesity for the period 2006-2013.

[page 134]

[Geospatial Health 2016; 11:381]

Male

Female

15.3 26.0 19.0 34.8 31.4 16.5 24.9

6.0 26.1 16.1 49.6 41.8 7.2 19.6


gh-2016_2.qxp_Hrev_master 31/05/16 11:44 Pagina 135

Article

Discussion Obesity in children and adolescents is a major concern, not only because of health and social problems in the short-term, but also because of a high risk that may continue into adulthood and affect long-term health. Over the past two decades, however, the prevalence of obesity is increasing worldwide although the proportion varies from country to country and between geographical areas within a country (Mohsin et al., 2012). In past studies, impoverished areas in developed countries often were associated with higher rates of obesity. For example, in an obesity study for the United States, Ogden et al. (2010) reported that children and adolescents from low income families are more likely to be obese than children and adolescents from higher income families. Therefore, the effects of rural areas and lower income on obesity were confounded in our study. In this study, we examined the trends of adolescent obesity for the period from 2006 to 2013 in Korea and described how spatial patterns of obesity and other variables can be examined visually and simultaneously using LM plots. First, we investigated the temporal variation of the prevalence of obesity at the national level. Second, we divided the 16 local autonomies of Korea into two groups (metropolitan and rural areas) and compared the variables for these two groups. Finally, we explored the geographic patterns and the relationship between the variables using LM plots. In summary, the prevalence of obesity of adolescent males was about twice as high, compared to that of adolescent females (Figure 1). Thus, the prevalence of obesity of adolescents in Korea can be said to be determined by gender. Based on this observation, the differences in factors relevant to adolescent obesity (Table 3), derived from the KOSIS data, can be understood as follows: compared with adolescent females, adolescent males are physically more active, smoke and drink more frequently; have less stress and depression; and are approximately equal with respect to eating fast-food and starting the day without breakfast. The obesity rates for 2013, compared to 2006, have increased slightly for Korean adolescents in general. Adolescent males living in rural areas had the highest absolute increase in the prevalence of obesity, while adolescent males living in metropolitan areas had the lowest increase. In addition, the obesity rates for 2013, compared to 2006, increased in all regions except in Seoul. The trends at the local level show a slight increase in all regions of Korea. The regions with high obesity prevalence are all located in the Northeast. The results of this study should facilitate the understanding of the variety of geographically referenced data and visualise more enriched statistical information.

Conclusions It should be noted that the data used in this study already had been aggregated. Therefore, it was not possible for us to adjust for covariates such as age and gender in our analysis at the local level. Such an adjustment might eventually help to better explain some of the unexpected patterns observed in our study. In addition, the body mass index used in this study was calculated from self-reported weight and height and not from anthropometric measures. The self-reported measures could lead to a considerable underestimation of the obesity prevalence rates. Moreover, we are describing ecological correlations, i.e., correlations based on group means, rather than on the unaggregated data. On the one hand, this considerably reduces the sample size, e.g. to n=8 when comparing trends from 2006 to 2013. On the other, this also

reduces the variation, compared to using the unaggregated data from several thousand participants of the KYRBWS study each year. Therefore, some of the trends we observed, which ultimately were not statistically significant, may become significant when working with the full-unaggregated data set â&#x20AC;&#x201C; and vice versa.

References Ahn JY, 2013. Visualizing statistical information using Korean linked micromap plots. In: Cho SH, ed. Proceedings of IASC Satellite Conference for the 59th ISI WSC & The 8th Conference of IASCARS, Asian Regional Section of the IASC, pp 219-221. Baek S, 2008. Do obese children exhibit distinguishable behaviours from normal weight children - based on literature review. Nutr Res Pract 13:386-95. Brownell KD, 2004. Fast food and obesity in children. Pediatrics 113:132. Byeon H, 2013. Relationship between self-perception of weight and depression experience in Korean adolescents. Adv Sci Tech Lett 40:66-9. Carr DB, 2001. Designing linked micromap plots for states with many counties. Stat Med 20:1331-9. Carr DB, Pickle LW, 2010. Visualizing data patterns with micromaps. Chapman & Hall/CRC: Boca Raton, FL, USA. Carr DB, Pierson SM, 1996. Emphasizing statistical summaries and showing spatial context with micromaps. Stat Comp Graph News 7:16-23. Dent BD, 1993. Cartography: thematic map design. William C. Brown, Dubuque, IA, USA. Edsall R, 2003. Design and usability of an enhanced geographic information system for exploration of multivariate health statistics. Prof Geogr 55:605-19. Ezzati M, Martin H, Skjold S, Hoorn SV, Murray C, 2006. Trends in national and state-level obesity in the USA after correction for selfreport bias: analysis of health surveys. J Roy Soc Med 99:250-7. Fontaine KR, Redden DT, Wang C, Westfall AO, Allison DB, 2003. Years of life lost due to obesity. J Am Med Assoc 289:187-93. Gebreab S, Gillies RR, Munger RG, Symanzik J, 2008. Visualization and interpretation of birth defects data using linked micromap plots. Birth Defects Res A 82:110-9. Han JC, Lawlor DA, Kimm SY, 2010. Childhood obesity. Lancet 375:1737-48. Han KS, Park SJ, Mun GS, Choi SH, Symanzik J, Gebreab S, Ahn JY, 2014. Linked micromaps for the visualization of geographically referenced data. ICIC Expr Lett 8:443-8. Harris RL, 1999. Information graphics. A comprehensive illustrated reference. Oxford University Press, New York, NY, USA. Jacob BG, Krapp F, Ponce F, Gottuzzo E, Griffith DA, Novak RJ, 2010. Accounting for autocorrelation in multi-drug resistant tuberculosis predictors using a set of parsimonious orthogonal eigenvectors aggregated in geographic space. Geospat Health 4:201-17. James WPT, Jackson-Leach R, Mhurchu CN, Kalamara E, Shayeghi M, Rigby NJ, Nishida C, Rodgers A, 2008. Overweight and obesity (high body mass index). In: Ezzati M, Lopez A, Rodgers A, Murray C, eds. Comparative quantification of health risks: Global and regional burden of disease attributable to selected major risk factors. World Health Organization, Geneva, Switzerland, pp 497-596. Kleiser C, Rosario AS, Mensink G, Langenohl RP, Kurth BM, 2009. Potential determinants of obesity among children and adolescents

[Geospatial Health 2016; 11:381]

[page 135]


gh-2016_2.qxp_Hrev_master 31/05/16 11:44 Pagina 136

Article

in Germany: results from the cross-sectional KiGGS study. BMC Publ Health 9:46. Kolotkin RL, Meter K, Williams GR, 2001. Quality of life and obesity. Obes Rev 2:219-29. Kries R, MĂźller MJ, Heinrich J, 2014. Early prevention of childhood obesity: another promise or a reliable path for battling childhood obesity? Obes Facts 7:77-81. Luppino FS, Wit LM, Bouvy PF, Stijnen T, Cuijpers P, Penninx BW, Zitman FG, 2010. Overweight, obesity, and depression: a systematic review and meta-analysis of longitudinal studies. Arch Gen Psych 67:220-9. McLaren L, 2007. Socioeconomic status and obesity. Epidemiol Rev 29:29-48. Mohsin F, Begum T, Azad K, Nahar N, 2012. An overview of childhood obesity. Birdem Med J 2:93-8. Ogden CL, Lamb MM, Carroll MD, Flegal KM, 2010. Obesity and socioeconomic status in children and adolescents: United States, 20052008. National Center for Health Statistics, Hyattsville, MD, USA. Pichainarong N, Mongkalangoon N, Kalayanarooj S, Chaveepojnkamjorn W, 2006. Relationship between body size and severity of dengue hemorrhagic fever among children aged 0-14 years. Southeast Asian J Trop Med Public Health 37:283-8.

[page 136]

Puhl RM, Heuer C, 2009. The stigma of obesity: a review and update. Obesity 17:941-64. Reilly JJ, Wilson O, 2006. ABC of obesity: childhood obesity. Brit Med J 333:1207-10. Snedecor GW, Cochran WG, 1989. Statistical methods. Iowa State University Press, Iowa City, IA, USA. Strauss R, 2000. Childhood obesity and self-esteem. Pediatrics 105:e15. Symanzik J, Carr DB, 2008. Interactive linked micromap plots for the display of geographically referenced statistical data. In: Chen C, Härdle W, Unwin A, eds. Handbook of data visualization. Springer, Heidelberg, Germany, pp 267-94. Torres SJ, Nowson CA, 2007. Relationship between stress, eating behavior, and obesity. Nutrition 23:887-94. Wang X, Chen JX, Carr DB, Bell BS, Pickle LW, 2002. Geographic statistics visualization: web-based linked micromap plots. Comput Sci Eng 4:90-4. Wisniewski A, Chernausek S, 2009. Gender in childhood obesity: family environment, hormones, and genes. Gender Med 6:76-85. Yu NS, 2011. A study on perceived weight, eating habits, and unhealthy weight control behavior in Korean adolescents. Int J Hum Ecol 12:13-24.

[Geospatial Health 2016; 11:381]


gh-2016_2.qxp_Hrev_master 31/05/16 11:44 Pagina 137

Geospatial Health 2016; volume 11:418

Validation of an interactive map assessing the potential spread of Galba truncatula as intermediate host of Fasciola hepatica in Switzerland Rhea Baggenstos,1 Tobias Dahinden,2 Paul R. Torgerson,3 Hansruedi Bär,4 Christina Rapsch,1 Gabriela Knubben-Schweizer1,5 1Department of Farm Animals, University of Zurich, Zurich, Switzerland; 2Institute of Cartography and Geoinformatics, Leibniz University Hannover, Hannover, Germany; 3Section of Veterinary Epidemiology, University of Zürich, Zürich; 4Institute of Cartography and Geoinformation, Swiss Federal Institute of Technology in Zurich, Zurich, Switzerland; 5Clinic for Ruminants with Ambulatory and Herd Health Services, LudwigMaximilians-University Munich, Oberschleissheim, Germany

Abstract

Introduction

Bovine fasciolosis, caused by Fasciola hepatica, is widespread in Switzerland. The risk regions were modelled in 2008 by an interactive map, showing the monthly potential risk of transmission of F. hepatica in Switzerland. As this map is based on a mathematical model, the aim of the present study was to evaluate the interactive map by means of a field survey taking different data sources into account. It was found that the interactive map has a sensitivity of 40.7-88.9%, a specificity of 11.4-18.8%, a positive predictive value of 26.7-51.4%, and a negative predictive value of 13.1-83.6%, depending on the source of the data. In conclusion, the grid of the interactive map (100 x 100 m) does not reflect enough detail and the underlying model of the interactive map is lacking transmission data.

Bovine fasciolosis, caused by Fasciola hepatica, is widespread in Switzerland. A study by Rapsch et al. (2006) estimated a true prevalence of 18.0% in slaughtered cattle of all age groups. Median financial losses, mainly due to reduced milk production and reduced fertility, are estimated at € 299 per infected animal per year (Schweizer et al., 2005a). Nevertheless, farmers often are unaware of the disease (Schweizer et al., 2005b). In order to enhance awareness among owners and give farmers and veterinarians a decision-making tool for investigating a herd for fasciolosis, an interactive map was designed, showing the monthly potential risk of transmission of F. hepatica in Switzerland (Rapsch et al., 2008). The interactive map is based on risk density distribution derived from environmental factors. Geographical (soil condition, forest cover) and meteorological (temperature, rainfall) data from Switzerland and biological data of the intermediate host Galba truncatula and the free-living stages of F. hepatica were accounted for. The model’s output is an environmental relative risk measurement for the development of G. truncatula and the free-living stages of F. hepatica all over Switzerland. In order to visualise the density distribution, a grid of 100 x 100 m cells was created for the whole country and the risk density for each cell of the grid was visualised by a colour scale ranging from highest (red) to no (white) risk, representing five risk classes. The elements of the map (canton boundaries, water bodies, landform configuration, cities, forest, monthly risk, risk during the year) are grouped in layers, which can be activated or deactivated. Maps produced by means of geographical information systems (GIS) are often used in human or veterinary medicine in order to identify epidemiological patterns of disease outbreaks for example. GIS is especially helpful when trying to answer epidemiological questions concerning vector borne diseases or diseases transmitted by intermediate hosts (Rinaldi et al., 2006) and with increasing technical possibilities, the number of studies using GIS has also increased (Hendrickx et al., 2004; Simoonga et al., 2009). Risk maps for the transmission of F. hepatica have been designed for parts of Europe (Ollerenshaw, 1966; Rapsch et al., 2008), parts of the United States (Malone et al., 1987, 1992), Ethiopia (Yilma and Malone, 1998) or Cambodia (Tum et al., 2004). Some of the models were validated mainly by means of definite host prevalence surveys (Malone et al., 1992; Malone and Gommes, 1998; Tum et al., 2007). The aim of the present study was to evaluate the interactive map by means of a field survey based on the presence or absence of suitable snail habitats and presence of snails.

Correspondence: Gabriela Knubben-Schweizer, Clinic for Ruminants with Ambulatory and Herd Health Services, Ludwig-Maximilians-University, Munich Sonnenstrasse 16, D-85764 Oberschleissheim, Germany. Tel: +49.89.218078850 - Fax: +49.89.218078851. E-mail: G.Knubben@lmu.de Key words: Fasciola hepatica; Galba truncatula; Risk modelling; Interactive map; Switzerland. Acknowledgements: we thank Dr. phil. nat. Simon Capt (Centre Suisse de Cartographie da la Faune, CSCF) for providing the CSCF data. Received for publication: 2 October 2015. Revision received: 17 December 2015. Accepted for publication: 18 December 2015. ©Copyright R. Baggenstos et al., 2016 Licensee PAGEPress, Italy Geospatial Health 2016; 11:418 doi:10.4081/gh.2016.418 This article is distributed under the terms of the Creative Commons Attribution Noncommercial License (CC BY-NC 4.0) which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

[Geospatial Health 2016; 11:418]

[page 137]


gh-2016_2.qxp_Hrev_master 31/05/16 11:44 Pagina 138

Article

Materials and Methods In order to evaluate the risk model, findings of potential G. truncatula habitats and snail findings from three data sources were used and then compared to the risk map.

Data sources Snail findings emanating from a field survey in 2010 The basis of the interactive map designed by Rapsch et al. (2008), covering the whole of Switzerland, is a 100 x 100 m grid (containing a total of 4’152’874 grid fields). During a field survey in 2010, 361 grid fields were searched for the presence of potential G. truncatula habitats deriving from 80 coordinate points in the Northeast of Switzerland chosen by a random generator (Java Random). Water bodies, forests and settlements were excluded. The region was chosen due to ease of surveillance (location of the Vetsuisse Faculty, Zurich) on the one hand and due to the overlapping with the study region surveyed by Schweizer et al. (2007) on the other. The field survey was conducted from July to October 2010. The 80 randomly chosen points were visited and the minimal area of 100 x 100 m was investigated including the randomly chosen point. Due to the topography and the landscape, it was not possible to set up identical investigation areas, so the randomly chosen coordinate points did not always lie in the middle of the investigation area but were always included. If the 100 x 100 m area around the randomly chosen coordinate points was dry and no evidence for potential G. truncatula habitats could be seen nearby, the investigation was stopped recording this area as negative. If the area around the randomly chosen point included potential habitats, e.g. drainage ditches, swampy areas or small streams as described by Schweizer et al. (2007), the coordinates of the potential habitats were recorded using a global positioning system (GPS) device (Garmin eTrex H) and the potential habitats were searched for snails for 30 minutes. This was usually an area of more than 100 x 100 m (Figure 1). Snails found were collected and frozen at –20°C until further identification by binocular loupe.

Snail findings emanating from the Centre Suisse de Cartographie de la Faune From 1990 to 2010 a total of 749 G. truncatula findings from all over Switzerland was registered by the Centre Suisse de Cartographie de la Faune (CSCF). The snails were found in all months of the year. Of these, 66 findings had to be excluded due to incomplete recording of the coordinates and 236 findings were excluded because they were derived from forest or settlement areas. In such areas, cattle are not usually grazed and therefore the model gives no risk for transmission (Rapsch et al., 2008). Repeated findings in one month were utilised once. From the CSCF data a total of 428 findings were taken.

Comparison of the risk map with the field survey data For the study at hand, the risk class 0 of the map created by Rapsch et al. (2008) was defined as no risk and the risk classes 1 to 5 were summarised as risk. In order to calculate the sensitivity, specificity,

Earlier snail findings From a regional survey in northeastern Switzerland investigating the prevalence of F. hepatica in the intermediate host, the coordinates of naturally occurring habitats were taken. Wells were excluded, as these are man-made habitats. The field surveys were conducted from 1999 to 2002 in the months of May to October. A total of 54 habitat findings in the Northeast of Switzerland were taken from the study of Schweizer et al. (2007).

Figure 1. A diagram of the investigated area (grey) around a randomly chosen starting point (•). Underlying is the 100 m x 100 m grid from the interactive map. In this example a total of 9 grid cells were investigated.

Table 1. Data sources for the evaluation of the original interactive map. Data source Field survey 2010

Subject

Information

Randomly chosen fields from Northeast Switzerland (n=361)

1. Fields without risk of transmission (no potential habitats) 2. Fields with potential risk of transmission 2.1. without G. truncatula 2.2. with G. truncatula Potential risk of transmission Potential risk of transmission

Schweizer et al. (2007) G. truncatula findings from Northeast Switzerland (n=54) CSCF G. truncatula findings from all over Switzerland (n=428) G. truncatula, Galba truncatula; CSCF, Centre Suisse de Cartographie de la Faune.

[page 138]

[Geospatial Health 2016; 11:418]


gh-2016_2.qxp_Hrev_master 31/05/16 11:44 Pagina 139

Article

positive and negative predictive value of the map, the snail searches from three different surveys from Switzerland (Table 1) were then compared to the map predicted risk (no risk vs risk). As the interactive map models potential habitats of G. truncatula on a monthly basis, the evaluation of the map also takes the month of finding into account. The investigated fields from the field surveys were either classified negative, when no potential habitats were found or positive when either potential habitats (without snails) or habitats with snails were found (Table 2). All coordinates were rounded to the next 100 m interval, in order to match the 100 x 100 m grid of the interactive map. Subsequently, the monthly risk groups of the investigated areas were taken from the model. Multiple coordinates lying in one grid area were utilised once. The sensitivity, i.e. the true positives divided by the true positives + the false negatives; specificity, i.e. the true negatives divided by the true negatives + the false positives; the positive predictive value, i.e. the true positives divided by the true positives + false positives; and the negative predictive value, i.e. the true negative divided by the true negatives + false negatives were calculated for the map. Two different calculations were made in order to achieve the sensitivity, specificity and the predictive values of the interactive map. First, from all data sources the true monthly findings were compared to the modelled monthly risk group, e.g. the findings in January were compared to the January risk model. Second, from all data sources only the findings from July, August, September, and October were taken. During these months the field survey in 2010 took place. The data from July and August were then compared to the corresponding monthly risk model, whereas the data from September and October were compared to the August risk model as this model predicts the largest risk areas. As the overall model is lacking an underlying transmission model, we assumed, that the areas with the best environmental conditions for the survival and reproduction of G. truncatula, and the survival of the freeliving stages of F. hepatica, in August would also have a higher risk for transmission in the following months (taking the lag of time for reproduction of the snail into account).

Results Classification of the grid fields From the field survey in 2010, 271 grid fields did not contain potential habitats and were therefore negative areas. Potential habitats (positive) were found in 90 grid fields. Out of these, 14 harboured a total of 133 G. truncatula specimens. A total of 89 snails were found in risk class 1 (66.9%), two in risk class 2 (1.5%), four in risk class 3 (3.0%), 28 in risk class 4 (21.1%) and 10 in risk class 5 (7.5%). The potential habitats in risk class 0 did not contain any G. truncatula specimens. Together with the findings from Schweizer et al. (2007) and the CSCF, a total of 496 grid fields with G. truncatula specimens was found.

Validity of the interactive map Validity based on the findings from the field survey 2010 compared per month In 76 grid fields, consisting of swampy areas, ditches and other humid biotopes, no G. truncatula specimens were found. Nevertheless, these grid fields were classified positive, as they serve as potential habitats. When comparing the monthly-modelled risk class to the actual findings in the field (Table 3), the sensitivity of the interactive map was 88.9%, the specificity 18.8%, the positive predictive value 26.7% and the negative predictive value 83.6%. Validity based on the findings of all data compared per month When comparing the monthly-modelled risk groups with the actual findings in the field (Table 4), the sensitivity of the interactive map was 40.7%, the specificity 18.8%, the positive predictive value 51.4% and the negative predictive value 13.0%. Validity based on the findings of all data from July to October When comparing the findings from all sources from July to October

Table 2. Classification of results of the field surveys compared to the risk model. Interactive map

Field survey Positive (presence of potential habitats) Negative (absence of potential habitats) True positive False negative

Risk (risk class 1 to 5) No risk (risk class 0)

False positive True negative

Table 3. Comparison of results of the field survey 2010 with the risk classes in the model (interactive map) per month. Interactive map Risk class 0 Risk class 1 Risk class 2 Risk class 3 Risk class 4 Risk class 5 Risk class 1-5 Total

Findings from the field survey 2010 Number of negative grid fields* (%) Number of positive grid fields* (%) 51 (18.8) 34 (12.6) 33 (12.2) 31 (11.4) 38 (14.0) 84 (31.0) 220 (81.2) 271 (100)

10 (11.1) 28 (31.1) 12 (13.3) 11 (12.2) 5 (5.6) 24 (26.7) 80 (88.9) 90 (100)

Negative grid fields, grid fields without potential habitats; positive grid fields, grid fields with potential habitats either with snails or without (but with typical vegetation).

[Geospatial Health 2016; 11:418]

[page 139]


gh-2016_2.qxp_Hrev_master 31/05/16 11:44 Pagina 140

Article

(Table 5) compared to the modelled risk maps July to October, the sensitivity was 53.7%, the specificity 18.8%, the positive predictive value 46.5% and the negative predictive value 23.6%. When comparing the findings from all sources from July and August (Table 6) compared to the corresponding risk model and the findings from September and October compared to the August risk model (Figure 2) the sensitivity was 65.2%, the specificity is 11.4%, the positive predictive value is 49.2% and the negative predictive value 20.0%.

Discussion The interactive map of Rapsch et al. (2008) models areas with suitable environmental conditions for the survival and reproduction of G. truncatula, including survival of the free-living stages of F. hepatica. The risk areas modelled are corroborated by a study by Ducheyne et al. (2015) modelling the spatial distribution of F. hepatica in dairy cattle in Europe, including Switzerland. The aim of the present study was to evaluate the risk map from Rapsch et al. (2008) by comparing the modelled risk areas with find-

ings from different field surveys with respect to the main intermediate host snail G. truncatula of F. hepatica. This approach was chosen because the risk map models the potential spread of G. truncatula and not the spread of F. hepatica itself (except its free-living stages). In comparable studies, risk maps are evaluated by cattle prevalence data (Malone et al., 1992; Malone and Gommes, 1998; Tum et al., 2007). In Switzerland there is excessive movement of animals, especially of young stock going to alpine pastures during the summer months. If these first and second summer grazing animals become infected with F. hepatica, they will shed eggs and secrete antibodies into their first and second lactation (Knubben-Schweizer and Torgerson, 2015). Positive results of bulk milk antibody tests are therefore not necessarily indicative for a new infection on the premise in question but it can be derived from very different geographical regions. It would therefore be more appropriate to search for potential snail habitats and snails than to compare the risk map to prevalence data in cattle. The search for potential snail habitats and snails poses several challenges. Even though typical habitats, as described by Schweizer et al. (2007), with putative suitable environmental conditions may be present, G. truncatula can still be absent. To the authors’ experience, this is especially true, when habitats are visited once only as the size of the

Table 4. Comparison of results of the monthly risk models (interactive map) with the results of the field surveys from January to December. Interactive map

Risk (risk class 1-5) No risk (risk class 0) Total

Classification of the grid fields based on the field surveys° from January to December

Total

Positive

Negative

233 339 572

220 51 271

453 390 843

Positive, presence of potential habitats either with snails or without (but with typical vegetation); negative, absence of potential habitats. °Field survey 2010, data from Schweizer et al. (2007) and findings from Centre Suisse de Cartographie de la Faune.

Table 5. Comparison of results of the monthly risk models (interactive map) with the results of the field surveys from July to October. Interactive map

Risk (risk class 1-5) No risk (risk class 0) Total

Classification of the grid fields based on the field surveys° from July to October

Total

Positive

Negative

191 165 356

220 51 271

411 216 627

Positive, presence of potential habitats either with snails or without (but with typical vegetation); negative, absence of potential habitats. °Field survey 2010, data from Schweizer et al. (2007) and findings from Centre Suisse de Cartographie de la Faune.

Table 6. Comparison of the findings from all sources in July and August compared to the corresponding risk model and the findings from all sources from September and October compared to the August risk model. Interactive map

Risk (risk class 1-5) No risk (risk class 0) Total

Classification of the grid fields based on the field surveys° from July to October

Total

Positive

Negative

232 124 356

240 31 271

472 155 627

Positive, presence of potential habitats either with snails or without (but with typical vegetation); negative, absence of potential habitats. Findings from September and October are compared to the August model. °Field survey 2010, data from Schweizer et al. (2007) and findings from Centre Suisse de Cartographie de la Faune.

[page 140]

[Geospatial Health 2016; 11:418]


gh-2016_2.qxp_Hrev_master 31/05/16 11:44 Pagina 141

Article

snail population depends on the cumulative environmental conditions of the previous months. For this reason, not only snail findings were assessed as positive but also potential habitats as can be seen in Figures 3 to 5. Another challenge is the identification of the snails. Those from the present study and from Schweizer et al. (2007) were identified by an experienced person by macroscopic inspection. In questionable cases (e.g. very small snails) the identification took place under a binocular loupe. For the data from CSCF, the way the snails were identified is unknown. Misidentification cannot be ruled out, especially for very small snails. In the current study, the infection status of the snails was not investigated as it was not of primary interest. In the study of Schweizer et al. (2007), the snails were examined using PCR to detect F. hepatica infection finding 7.0% of 4733 snails positive. The risk of infection of G. truncatula with F. hepatica depended on the source of the snails (type of habitat and type of pastured cattle, respectively). Another challenge in this field survey was that it was not always possible to survey the grid fields due to limitations, such as slope, water

course, street, building, land use etc. In such areas, it was therefore decided to choose a coordinate point and search the area around it. In the model, the surveyed area was then mapped and the number of grid fields covered by this area were analysed. Several approaches were used to calculate the sensitivity, specificity

A

B

Figure 3. Drainage ditch in a swampy area making a potentially suitable habitat for Galba truncatula.

Figure 2. A) Risk map for the transmission of Fasciola hepatica in Switzerland. B) Detail of figure 2A: Northeast of Switzerland, where the survey at hand has been conducted. The risk for the month of August is shown. White areas are no risk (risk class 0), red shaded areas are risk (risk class 1 to 5). Additionally, the findings of Galba truncatula searches from 3 sources can be seen, where the red dots are positive findings (either potential habitats or snails) and the white dots are areas, with no potential snail habitats.

Figure 4. A swampy area making a potentially suitable habitat for Galba truncatula; notice the rush as an indicator plant described by Rondelaud et al. (2011).

[Geospatial Health 2016; 11:418]

[page 141]


gh-2016_2.qxp_Hrev_master 31/05/16 11:44 Pagina 142

Article

et al., 2007) are not mapped on commercial maps (topographic map 1:25’000). This was considered by modelling the soil type (Wilson et al., 1982) in the interactive map based on the fact, that the water content is dependent on water permeability and how waterlogged it is (Rapsch et al., 2008). Furthermore, beside natural habitats, there are also manmade habitats, such as wells. Where the type of habitat was known, the wells were eliminated from the evaluation. Nevertheless, local microclimatic factors and ground condition as well as man-made habitats are almost impossible to model. Besides, habitats may vary over the years. As the study at hand was based on data deriving from a 20-year period (1990 to 2010), some alteration in habitats might have occurred. Another type of small habitat, also not considered in the interactive map, are areas characterised by spring water that generally occur on slopes. It was therefore assumed, that the implementation of small watercourses and slopes might increase the sensitivity, specificity and predictive values of the interactive map.

Conclusions

Figure 5. Spring water making a potentially suitable habitat for Galba truncatula.

and predictive values of the interactive map. In general the model is neither very sensitive nor very specific. When comparing the monthly findings of all surveys with the monthly risk models, 40.7% of the modelled risk areas habitats were found. Additionally, in the field survey of 2010, only 18.8% of the areas of risk class 0 were without potential habitats and snails were not found in any of them. The chance of finding a potential habitat in an area where a risk is predicted by the model was 51.4% for any month of the year. In contrast, the chance of absence of habitats in an area where no risk was predicted by the model was 13.0% for any month of the year. Considering the conditions of the previous months when calculating the risk, these results were improved. Nevertheless, the sensitivity, specificity, positive and negative predictive values remained low. As the interactive map lacks an underlying transmission model, mean monthly temperatures below 10°C result in a risk 0 classification of the affected grid field. For this reason, the interactive map does not predict any potential risk areas for the survival and reproduction of G. truncatula, and for the survival of the free-living stages of F. hepatica, during the winter months. Nevertheless, snails can overwinter and can therefore be found the whole year around (Mehl, 1932). As a result, the snails found from November to March, where the map does not show any risk areas at all, will lead to a decrease of the sensitivity and the negative predictive value for the 12-month-model. A limitation of the interactive map is the grid size. Even though 100 x 100 m allows a resolution approaching the farm level, habitats are often much smaller. Swampy areas, drainage ditches and slow streams, which are the typical habitats for G. truncatula (Mehl, 1932; Schweizer [page 142]

Models such as those assessing the potential spread of G. truncatula or the spatial distribution of F. hepatica can be very helpful tools for farmers and veterinarians. However, these are theoretical models. In order to validate the original Swiss model (Rapsch et al., 2008), a fieldsurvey was undertaken. This demonstrated moderate sensitivity, specificity and predictive values of the model. The next-step revision of the model will require the environmental factors to be examined in more detail (e.g. the role of slopes). Furthermore, a transmission model is in progress, which will be integrated into the map and help to reduce the problem of missing cumulative effects over the year.

References Ducheyene E, Charlier J, Vercruysse J, Rinaldi L, Biggeri A, Demeler J, Brandt C, de Waal T, Selemetas N, Höglund J, Kaba J, Kowalczyk J, Hendrickx G, 2015. Modelling the spatial distribution of Fasciola hepatica in dairy cattle in Europe. Geospat Health 9:261-70. Hendrickx G, Biesemans J, De Deken R, 2004. The use of GIS in veterinary parasitology. In: Durr P, Gatrell A, eds. GIS and spatial analysis in veterinary science. CABI Publishing, Wallingford, UK, pp 145-76. Knubben-Schweizer G, Torgerson PR, 2015. Bovine fasciolosis: control strategies based on the location of Galba truncatula habitats on farms. Vet Parasitol 208:77-83. Malone JB, Fehler DP, Loyacano AF, Zukowski SH, 1992. Use of LANDSAT MSS imagery and soil type in a geographic information system to assess site-specific risk of fascioliasis on Red River Basin farms in Louisiana. Ann NY Acad Sci 653:389-97. Malone JB, Gommes R, 1998. A geographic information system on the potential distribution of Fasciola hepatica an F. gigantic in east Africa based of Food and Agriculture Organization database. Vet Parasitol 78:87-101. Malone JB, Williams TE, Muller RA, Geaghan JP, Loyacano AF, 1987. Fascioliasis in cattle in Louisiana: development of a system to predict disease risk by climate, using the Thornthwaite water budget. Am J Vet Res 48:1167-70. Mehl S, 1932. Die Lebensbedingungen der Leberegelschnecke (Galba truncatula Müller). Untersuchungen über Schale, Verbreitung, Lebensgeschichte, natürliche Feinde und Bekämpfun-

[Geospatial Health 2016; 11:418]


gh-2016_2.qxp_Hrev_master 31/05/16 11:44 Pagina 143

Article

gsmöglichkeiten. Dr. F.P. Datterer & Cie Publ., Munich, Germany. Ollerenshaw CB, 1966. The approach to forecasting the incidence of fascioliasis over England and Wales. Agr Meteorol 3:35-53. Rapsch C, Dahinden T, Heinzmann D, Torgerson PR, Braun U, Deplazes P, Hurni L, Bär H, Knubben-Schweizer G, 2008. An interactive map to assess the potential spread of Lymnaea truncatula and the freeliving stages of Fasciola hepatica in Switzerland. Vet Parasitol 154:242-9. Rapsch C, Schweizer G, Grimm F, Kohler L, Deplazes P, Braun U, Bauer C, Torgerson PR, 2006. Estimating the true prevalence of Fasciola hepatica in cattle slaughtered in Switzerland in the absence of an absolute diagnostic test. Int J Parasitol 36:1153-8. Rinaldi L, Musella V, Biggeri A, Cringoli G, 2006. New insights into the application of geographical information systems and remote sensing in veterinary parasitology. Geospat Health 1:33-47. Rondelaud D, Hourdin P, Vignoles P, Dreyfuss G, Cabaret J, 2011. The detection of snail host habitats in liver fluke infected farms by use of plant indicators. Vet Parasitol 181:166-73. Schweizer G, Braun U, Deplazes P, Torgerson PR, 2005a. Estimating the financial losses due the bovine fasciolosis in Switzerland. Vet Rec 157:188-93. Schweizer G, Hässig M, Braun U, 2005b. Das Problembewusstsein von Landwirten in Bezug auf die Fasciolose des Rindes. Schweiz Arch

Tierheilk 147:253-7. Schweizer G, Meli ML, Torgerson PR, Lutz H, Deplazes P, Braun U, 2007. Prevalence of Fasciola hepatica in the intermediate host Lymnaea truncatula detected by real time TaqMan PCR in populations from 70 Swiss farms with cattle husbandry. Vet Parasitol 150:164-9. Simoonga C, Utzinger J, Brooker S, Vounatsou P, Appleton CC, Stensgard AS, Olsen A, Kristensen TK, 2009. Remote sensing, geographical information system and spatial analysis for schistosomiasis epidemiology and ecology in Africa. Parasitology 136:1683-93. Tum S, Puotinen ML, Copeman DB, 2004. A geographic information system model for mapping risk of fasciolosis in cattle and buffaloes in Cambodia. Vet Parasitol 122:141-9. Tum S, Puotinen ML, Skerratt LF, Chan B, Sothoeun S, 2007. Validation of a geographic information system model for mapping the risk of fasciolosis in cattle and buffaloes in Cambodia. Vet Parasitol 143:364-7. Wilson RA, Smith G, Thomas MR, 1982. Fascioliasis. In: Anderson RM, ed. The population dynamics of infectious diseases: theory and applications. Chapman and Hall, London, UK, pp 262-319. Yilma JM, Malone JB, 1998. A geographic information system forecast model for strategic control of fasciolosis in Ethiopia. Vet Parasitol 78:103-27.

[Geospatial Health 2016; 11:418]

[page 143]


gh-2016_2.qxp_Hrev_master 31/05/16 11:45 Pagina 144

Geospatial Health 2016; volume 11:410

Spatial and temporal changes in household structure locations using high-resolution satellite imagery for population assessment: an analysis in southern Zambia, 2006-2011 Timothy Shields,1 Jessie Pinchoff,1 Jailos Lubinda,2 Harry Hamapumbu,2 Kelly Searle,1 Tamaki Kobayashi,1 Philip E. Thuma,2 William J. Moss,1 Frank C. Curriero1 1Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD, USA; 2Macha Research Trust, Choma, Zambia

Abstract Satellite imagery is increasingly available at high spatial resolution and can be used for various purposes in public health research and

Correspondence: Timothy Shields, Department of Epidemiology, Bloomberg School of Public Health, Johns Hopkins University, 615 N Wolfe Street, 21205 Baltimore, MD, USA. Tel: +1.410.502.9077 - Fax: +1.410.955.0105. E-mail: tshields@jhu.edu Key words: Satellite imagery; Spatial epidemiology; GIS; Population targeting; Spatial statistics. Contributions: TS assisted with analysis plan, derived and mapped all satellite imagery, and drafted the manuscript; JP assisted with data analysis and manuscript preparation; JL assisted with data management and mapping; HH ran the field team and all data collection; KS assisted with SaTScan; TK assisted with study design and preparation of manuscript; PET assisted with study design and coordination on site; WJM designed and coordinated the study and was involved with preparation of the manuscript; FCC designed the analysis plan and was involved in preparation of the manuscript.

programme implementation. Comparing a census generated from two satellite images of the same region in rural southern Zambia obtained four and a half years apart identified patterns of household locations and change over time. The length of time that a satellite image-based census is accurate determines its utility. Households were enumerated manually from satellite images obtained in 2006 and 2011 of the same area. Spatial statistics were used to describe clustering, cluster detection, and spatial variation in the location of households. A total of 3821 household locations were enumerated in 2006 and 4256 in 2011, a net change of 435 houses (11.4% increase). Comparison of the images indicated that 971 (25.4%) structures were added and 536 (14.0%) removed. Further analysis suggested similar household clustering in the two images and no substantial difference in concentration of households across the study area. Cluster detection analysis identified a small area where significantly more household structures were removed than expected; however, the amount of change was of limited practical significance. These findings suggest that random sampling of households for study participation would not induce geographic bias if based on a 4.5-year-old image in this region. Application of spatial statistical methods provides insights into the population distribution changes between two time periods and can be helpful in assessing the accuracy of satellite imagery.

Conflict of interest: the authors declare no potential conflict of interest. Funding: this work was supported by the Johns Hopkins Malaria Research Institute, the Bloomberg Family Foundation and the Division of Microbiology and Infectious Diseases, National Institutes of Allergy and Infectious Diseases, National Institutes of Health as part of the International Centers of Excellence for Malaria Research (U19 AI089680). Acknowledgements: the researchers would like to thank the Macha Research Trust and community of Macha without which this research could not have been carried out. Received for publication: 17 September 2015. Revision received: 26 December 2015. Accepted for publication: 10 January 2016. ŠCopyright T. Shields et al., 2016 Licensee PAGEPress, Italy Geospatial Health 2016; 11:410 doi:10.4081/gh.2016.410 This article is distributed under the terms of the Creative Commons Attribution Noncommercial License (CC BY-NC 4.0) which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

[page 144]

Introduction Incorporating data from high-resolution satellite images and global positioning systems (GPS) into geographical information systems (GIS) has become increasingly useful, accurate, and widely available. Spatial resolution has dramatically increased to less than two meters for multispectral images and less than half a meter for panchromatic images. Availability has also increased, with a growing number of companies launching satellites for commercial use as well as some data freely available via web services such as Google EarthTM and BingTM (Belward and Skøien, 2014). Data from satellites are spatially precise and spatial accuracy can be validated by GPS (Lowther et al., 2009; Vazquez-Prokopec et al., 2009; Checchi et al., 2013). High-resolution satellite images have diverse applications, such as the measurement of land use, population movement, change in civil infrastructure, conservation, monitoring of humanitarian emergencies, and the study of infectious diseases (Radke et al., 2004; Dambach et al., 2009; Schmidt and Kedir, 2009; Checchi et al., 2013; Boyle et al., 2014). In public health, the use of high-resolution satellite imagery has been identified as a cost-effective approach to develop disease surveillance systems, monitor disease trends, and document topographi-

[Geospatial Health 2016; 11:410]


gh-2016_2.qxp_Hrev_master 31/05/16 11:45 Pagina 145

Article

cal changes that may influence disease transmission (Fernandez, 2008; Lefer et al., 2008; Chang et al., 2009; Kamadieu, 2009; Lowther et al., 2009; Wei et al., 2012; Soti et al., 2013; Franke et al., 2015; Nsoesie et al., 2015). Another use for satellite imagery is in the selection of households for targeted interventions. For example, satellite imagery was used in Zambia to enumerate structures and select target areas to receive indoor residual spraying for malaria control (Franke et al., 2015; Kamanga et al., 2015). Household enumeration based on high-resolution satellite imagery has been used to measure population changes in refugee camps, and has been identified as a practical method for generating a sampling frame for public health research in sub-Saharan Africa (Lowther et al., 2009; Moss et al., 2011; Wampler et al., 2013; Escamilla et al., 2014; Franke et al., 2015). Satellite image-based census enumeration has also been employed to create population distribution maps that can be useful for many epidemiological calculations and studies, as well as for public health planning and targeting interventions (Chang et al., 2009; Linard et al., 2012; Wampler et al., 2013; Kondo et al., 2014). Unfortunately, existing census and demographic datasets for lowincome countries, where disease burdens are commonly highest, are often based on outdated population enumeration data (Tatem et al., 2007; Linard et al., 2010, 2012). Recently, studies have validated the use of satellite imagery and GPS to provide sampling frames for ethnographic and public health surveys (Lowther et al., 2009; Wampler et al., 2013) and to estimate population size (Lowther et al., 2009; Checchi et al., 2013; Wampler et al., 2013; Hillson et al., 2014). However, the length of time a satellite image remains accurate and useful is unclear. Determining the accuracy of an image depends on the context, and varies based on the research question. For example, imagery utilized for epidemiological studies relying on household locations for survey implementation may be more temporally sensitive than studies determining and involving general land cover characteristics. Describing changes in the distribution of household structures visualised on satellite images is a novel application; previously, this technique has been restricted mainly to the description of refugee camps or areas of conflict (Galway et al., 2012; Checchi et al., 2013). As a component of the Southern Africa International Centers of Excellence for Malaria Research (ICEMR), households are selected for enrolment into a prospective study of malaria transmission using simple random sampling from an enumerated list. Households in the sampling pool are identified and enumerated from a high-resolution satellite image and their coordinates confirmed by GPS at enrolment. For this sampling strategy to be effective, the coordinates of selected households must be accurate. Equally important is that the pool of enumerated households is accurately identified as collected field data are assumed to be representative of the target population. The temporal accuracy (shelf-life) of high-resolution satellite imagery was assessed by comparing images obtained in 2006 and 2011 of the study area in rural southern Zambia.

Materials and Methods Study area The catchment area of Macha Hospital in Choma District, Southern Province, Zambia is one of three sites of the ICEMR. The study site is a rural area approximately 575 km2 at an average elevation of 1100 meters and consists of open savannah woodland with land clearings for subsistence agriculture (Moss et al., 2011). All houses and non-residential structures are single story.

Geographical information system methodology A satellite image task order was generated by DigitalGlobe Services, Inc (Denver, CO, USA) and a multispectral 2.4-m resolution image was acquired on 01/12/2006. This image was pan-sharpened to 0.62-meter resolution using the resolution-merge function. A second task order was generated by Apollo Mapping (Boulder, CO, USA) of the same study area for acquisition of a GeoEye-1 image obtained in mid-2011 with a 0.5-meter resolution. Six tiles from the imagery archives, collected between April and July 2011 (21/04/2011, 24/04/2011, 13/05/2011, 16/05/2011, and 18/07/2011) were added into a mosaic covering the study area. Orthorectification, using rational polynomial coefficients, was performed to improve spatial accuracy. Image processing was conducted in Erdas Imagine 2010 (Hexagon Geospatial, Norcross, GA, USA). Each image was imported into ArcGIS 9.2 (ESRI, Redlands, CA, USA). Visual inspection of the imagery was performed during the onscreen digitising process, during which structures of appropriate size and shape were identified as potential households. A typical household was typically recognised by a clearing of the natural brush with one or more domestic structures. Smaller structures, such as cooking houses or animal kraals might be present as well. A household was defined as one or more of these structures that function as a family unit. During this manual enumeration process, a map feature (point) was created for the centroid of each household. Comparison of the images allowed for the identification and coding of households that remained at the same location, were newly built, or were removed in the four and a half year period between the two images. As an alternative to the manual enumeration process described above, household identification was originally attempted, without success, using automated feature extraction software. These software algorithms incorporate spatial context while classifying object-specific features specified by the user. However, in this study area and similar study areas in developing countries, the assortment of materials used for roofs (bush material, asbestos sheets, and corrugated metal) and walls (mud brick and concrete) impeded the ability to accurately and reliably discern houses. Additionally, our malaria data is collected and mapped at a household level, which, as stated, is often a collection of individual houses of varying number and geographic expanse.

Statistical analyses Spatial statistics were used to assess clustering, cluster detection and spatial variation in household location between the 2006 and 2011 satellite images to describe and quantify changes in spatial patterns of households and to identify geographic areas of significant change. Spatial clustering is the property that describes how tightly compact or dispersed a set of mapped locations are. The K-function, which estimates the expected number of other events within a range of distances of each event, was used to assess spatial clustering (Waller and Gotway, 2004). The K-function was estimated for both the 2006 and 2011 mapped household locations and the difference was plotted as a function of distance to assess change in the level of spatial clustering of household locations. Significant differences in spatial clustering were assessed using the Monte Carlo random labelling approach (Diggle, 2008). To complement the assessment of spatial clustering, spatial variation in the location of households was also explored. Spatial intensity, defined as the expected number of events per unit area, was estimated using the non-parametric kernel density approach and mapped to highlight spatial variation in the concentration of events (Waller et al., 2004; Diggle, 2008). Spatial intensity was estimated for both the 2006 and 2011 mapped household locations. A map of the difference in spatial intensity between 2006 and 2011 was generated to show

[Geospatial Health 2016; 11:410]

[page 145]


gh-2016_2.qxp_Hrev_master 31/05/16 11:45 Pagina 146

Article

changes in spatial variation of household locations between these two time periods. K-function and spatial intensity analysis were performed using the R Statistical Software with contributed spatstat package (Baddeley and Turner, 2005; R Statistical Software, 2013). A cluster detection analysis was performed to assess clusters of significant change in the number households from 2006 to 2011. In comparison to the property of spatial clustering, a spatial cluster describes the local property of a subarea with a significant difference in the expected number of events. The existence of such a cluster may not be captured in the previously described analyses but could have profound effects on the sampling strategy and other related objectives that are based on enumerated satellite imagery. The study area was divided into 1-km grid cells. For each cell, the total number of newly added and removed households from 2006-2011, as well as the ratio of net change (difference in the added and removed houses) to the 2006 cell population, were determined. The cluster detection software SaTScan v9.4 (http://www.statscan.org) was used to search for clusters (contiguous sets of grid cells) with significantly high net change in household pop-

ulation from 2006 to 2011. The cluster detection was based on the SaTScan normal model to accommodate positive and negative net change and was performed controlling for proximity to roads (defined as the total length of all road segments in each grid cell). A tarred road was constructed in 2008 between the time points of the two images. Cluster detection analysis controlling for proximity to roads, a known driver of household settlement in this area, identifies clusters beyond what would have been explained by these features.

Results A total of 3821 household structures were enumerated in 2006 and 4256 in 2011 (Table 1). Between 2006 and 2011, 971 (25.4%) structures were added and 536 (14.0%) structures removed (no longer present) (Table 1). Thus, by mid-2011, there was a net increase in 435 (11.4%) household structures from 2006. All enumerated household structures

Figure 1. Change in households between the enumerated 2006 and 2011 satellite images for the study area in Southern Province, Zambia.

[page 146]

[Geospatial Health 2016; 11:410]


gh-2016_2.qxp_Hrev_master 31/05/16 11:45 Pagina 147

Article

as well as the change (added and removed households) were mapped (Figure 1). There was no significant difference in the level of spatial clustering for the 2006 household locations compared to the 2011 household locations. The difference in K-functions for 2006 to 2011 remained close to the horizontal zero line of no difference and did not approach statistical significance in either direction (Figure 2). Assessment of the intensity maps suggested the spatial variation in household concentrations were consistent from 2006 to 2011, although household density reached 32 houses per km2 in 2011 compared to 27 per km2 in 2006, reflecting the positive net change in households (Figure 3). The difference in intensity maps suggested that areas with the highest net change (both positive and negative) occurred where there were higher concentrations of households. An area of negative net change (more households removed than expected) appeared along the southern border of the northeast quadrant where there was moderate household density in both 2006 and 2011 (Figure 3). The results of the SaTScan analysis identified one spatial cluster with a combined significantly negative net change (Figure 4). Within the cluster, significantly more household structures were removed between 2006 and 2011 than expected (P<0.001). Although statistically significant, the weighted mean net change in the cluster was -0.23 houses compared to 0.15 houses outside the cluster.

nitude of the difference was not deemed to be of practical significance for population sampling. This study incorporated time-intensive manual identification of households that was necessitated by the varying materials used in the construction of these houses and the need to identify groups of houses rather than individual structures. Automated feature extraction, including identification of houses, has been successfully utilised in other studies (Tullis and Jensen, 2003; Lo, 2007; Lowther et al., 2009; Moss et al., 2011; Wampler et al., 2013; Escamilla et al., 2014; Franke et al., 2015; Kamanga et al., 2015). Regardless of the method used to identify the map feature of interest the spatial statistical approach to understand the changes in imagery between time periods remains applicable. This study had some limitations. First, the assessment of household location change relied on observations from two images; no interim images were considered. Thus, longitudinal assessments at smaller temporal scales could not be determined. Second, the household enumeration process was based upon visual inspection of the images, potentially leading to the misclassification of non-residential structures as households or households as non-residential structures. Attempts were made to use automated feature

Table 1. Change with respect to the enumerated households for the 2006 and 2011 satellite imagery. Household data

Discussion Satellite images depict the Earthâ&#x20AC;&#x2122;s surface at a precise moment, providing a snapshot in time. Often missing from the literature, particularly in the field of public health, is an assessment of the degree of change over time and the potential impact such changes have on public health research, planning interventions, and population sampling. Assessment of change is also critical for longitudinal projects that involve planning, ongoing data collection, and outcome evaluations over space and time. Over a 4.5-year period, the number of households identified in a rural area of Zambia increased 11.4%; however, the household distribution patterns were maintained. These methodological approaches to examining changes in satellite imagery between two time periods can be used in other settings and for different research questions. The cost of acquiring new satellite imagery, although decreasing, remains an obstacle to their use in public health studies. Researchers have to determine whether existing archived imagery, which is significantly less expensive, is suitable for the research project and for how long a purchased image will remain useful. In public health studies, population movement is often a concern. Triggers such as changes in access and availability of transportation (e.g. road construction), new industrial developments, and changes in government policy can provide an indication that the population distribution in a given area may be changing. This analysis demonstrates how, with the use of spatial statistical techniques, these features can be incorporated into an assessment of change across multiple high-resolution satellite images of the same area. Identifying a net difference in the number of households between two time periods alone does not adequately describe the dynamics of household distribution. Further investigation highlighted that there were nearly twice as many households added as were removed. However, no significant change in the spatial distribution of household locations was identified in both large-scale spatial trends in the concentration of households and smaller scale spatial clustering of households. Although a statistically significant cluster of lower than expected net change in households was identified, the mag-

Households 2006 Households 2011 Households added between 2006 and 2011 Households removed between 2006 and 2011 Net increase

N

%

3821 4256 971 536 435

na na 25.4 14.0 11.4

na, not available.

Figure 2. Difference in K functions comparing spatial clustering of enumerated household locations, 2006 to 2011.

[Geospatial Health 2016; 11:410]

[page 147]


gh-2016_2.qxp_Hrev_master 31/05/16 11:45 Pagina 148

Article

extraction software but none were able to account for the differing nature of the household materials. However, misclassified households were likely to be few or resulted in non-differential misclassification as the same methods were used for each image. Third, observations were based on enumerating household structures not actual people. While the number of household structures would likely be correlated with population size, on small spatial scales changes in population may not always be reflective of changes in household locations. Lastly, rural areas in southern Africa or in other developing countries may have more or

less household movement over time, thus limiting the generalisability of our findings. However, the methods used to assess changes in household structure patterns can be applied in different settings.

Conclusions Satellite imagery is increasingly used for activities such as study planning, data collection, distribution of resources, or targeting of

A

B

Figure 3. A) Mapped spatial intensity of enumerated household locations from the 2006 and 2011 satellite image; B) map of the difference in spatial intensity from 2006 to 2011. [page 148]

[Geospatial Health 2016; 11:410]


gh-2016_2.qxp_Hrev_master 31/05/16 11:45 Pagina 149

Article

Figure 4. Net change in number of household structures within 1-km grid cells from 2006-2011, Choma District, Zambia.

activities. Understanding changes in the distribution of households over time is of importance to researchers relying on satellite imagery. Researchers should consider and evaluate the accuracy of satellite imagery as the time from acquisition to use increases.

References Baddeley A, Turner R, 2005. spatstat: an R package for analyzing spatial point patterns. J Stat Softw 12:1-42. Belward A, Skøien J, 2014. Who launched what, when and why; trends in global land-cover observation capacity from civilian earth observation satellites. ISPRS J Photogramm 103:115-28. Boyle S, Kennedy C, Torres J, Colman K, Perez-Estigarribia P, 2014. High-resolution satellite imagery is an important yet underutilized resource in conservation biology. PLoS One 9:e86908. Chang A, Parrales M, Jimenez J, Sobieszczyk M, Hammer S, Copenhaver D, Kulkarni R, 2009. Combining Google Earth and GIS mapping technologies in a dengue surveillance system for develop-

ing countries. Int J Health Geogr 8:49. Checchi F, Stewart B, Palmer J, Grundy C, 2013. Validity and feasibility of a satellite imagery-based method for rapid estimation of displaced populations. Int J Health Geogr 12:4. Dambach P, Sie A, Lacaux JP, Vignolles C, Machault V, Sauerborn R, 2009. Using high spatial resolution remote sensing for risk mapping of malaria occurrence in the Nouna district, Burkina Faso. Glob Health Action 2009:2. Diggle P, 2008. Statistical analysis of spatial patterns. Oxford University Press, New York, NY, USA. Escamilla V, Emch M, Dandalo L, Miller W, Martinson F, Hoffman I, 2014. Sampling at community level by using satellite imagery and geographical analysis. B World Health Organ 92:690-4. Fernandez I, 2008. Use of Google earth to facilitate GIS based decision support systems for arthropod-borne diseases. Adv Dis Surveill 4:91-2. Franke J, Gebreslasie M, Bauwens I, Deleu J, Siegert F, 2015. Earth observation in support of malaria control and epidemiology: MALAREO monitoring approaches. Geospat Health 10:335. Galway L, Bell N, Shatari A, Hagopian A, Burnham G, Flaxman A, Weiss

[Geospatial Health 2016; 11:410]

[page 149]


gh-2016_2.qxp_Hrev_master 31/05/16 11:45 Pagina 150

Article

W, Rajaratnam J, Takaro T, 2012. A two-stage cluster sampling method using gridded population data, a GIS, and Google EarthTM imagery in a population-based mortality survey in Iraq. Int J Health Geogr 11:12. Hillson R, Alejandre J, Jacobsen K, Ansumana R, Bockarie A, Bangura U, Lamin J, Malanoski A, Stenger D, 2014. Methods for determining the uncertainty of population estimates derived from satellite imagery and limited survey data: a case study of Bo city, Sierra Leone. PLoS One 9:e112241. Kamadieu R, 2009. Tracking the polio virus down the Congo River: a case study on the use of Google EarthTM in public health planning and mapping. Int J Health Geogr 8:4. Kamanga A, Renn S, Pollard D, Bridges DJ, Chirwa B, Pinchoff J, Larsen DA, Winters AM, 2015. Open-source satellite enumeration to map households: planning and targeting indoor residual spraying for malaria. Malaria J 14:435. Kondo M, Bream K, Barg F, Branas C, 2014. A random spatial sampling method in a rural developing nation. BMC Public Health 14:1-8. Lefer T, Anderson M, Fornari A, Lambert A, Fletcher J, Baquero M, 2008. Using Google Earth as an innovative tool for community mapping. Publ Health Rep 123:474-80. Linard C, Alegana V, Noor A, Snow R, Tatem A, 2010. A high resolution spatial population database of Somalia for disease risk mapping. Int J Health Geogr 9:45. Linard C, Gilbert M, Snow R, Noor A, Tatem A, 2012. Population distribution, settlement patterns and accessibility across Africa in 2010. PLoS One 7:e31743. Lo CP, 2007. Automated population and dwelling unit estimation from high-resolution satellite images: a GIS approach. Remote Sens 16:17-34. Lowther S, Curriero F, Shields T, Ahmed S, Monze M, Moss W, 2009. Feasibility of satellite image-based sampling for a health survey among urban townships of Lusaka, Zambia. Trop Med Int Health 14:7-78. Moss W, Hamapumbu H, Kobayashi T, Shields T, Kamanga A, Clennon J, Mharakurwa S, Thuma P, Glass G, 2011. Use of remote sensing to identify spatial risk factors for malaria in a region of declining transmission: a cross-sectional and longitudinal community. Malaria J 10:163.

[page 150]

Nsoesie E, Butler P, Ramakrishnan N, Mekaru S, Brownstein J, 2015. Monitoring disease trends using hospital traffic data from high resolution satellite imagery: a feasibility study. Sci Rep 5:9112. R Statistical Software, 2013. R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. Radke R, Andra S, Al-Kofahi O, Roysam B, 2004. Image change detection algorithms: a systematic survey. IEEE T Image Process 14:294-307. Schmidt E, Kedir M, 2009. Urbanization and spatial connectivity in Ethiopia: urban growth analysis using GIS. International Food Policy Research Institute, Addis Ababa, Ethiopia. Soti V, Chevalier V, Maura J, Begue A, LeLong C, Lancelot R, Thiongane Y, Tran A, 2013. Identifying landscape features associated with Rift Valley fever virus transmission, Ferlo region, Senegal, using very high spatial resolution satellite imagery. Int J Health Geogr 12:10. Tatem AJ, Noor AM, Hagen Cv, Gregorio AD, Hay SI, 2007. High resolution population maps for low income nations: combining land cover and census in East Africa. PLoS One 2:e1298. Tullis, JA, Jensen JR, 2003. Expert system house detection in high spatial resolution imagery using size, shape, and context. Geocarto Intl 18:5-15. Vazquez-Prokopec GM, Stoddard ST, Paz-Soldan V, Morrison AC, Elder JP, Kochel TJ, Scott TW, Kitron U, 2009. Usefulness of commercially available GPS data-loggers for tracking human movement and exposure to Dengue virus. Int J Health Geogr 8:68. Waller LA, Gotway CA, 2004. Applied spatial statistics for public health data. Wiley, New York, NY, USA. Walsh SJ, Shao Y, Mena CF, McCleary AL, 2008. Integration of hyperion satellite data and a household social survey to characterize the causes and consequences of reforestation patterns in the Northern Ecuadorian Amazon. Photogrammetric Eng Rem S 74:725-35. Wampler P, Rediske R, Molla A, 2013. Using ArcMap, Google Earth, and global positioning systems to select and locate random households in rural Haiti. Int J Health Geogr 12:3. Wei L, Kun Y, Feng W, Xiao-Nong Z, Le-Ping S, Jian-Feng Z, Guo-Jing Y, Hang DR, Yong-Sheng J, 2012. A real-time platform for monitoring schistosomiasis transmission supported by Google Earth and a webbased geographical information system. Geospat Health 6:195-203.

[Geospatial Health 2016; 11:410]


gh-2016_2.qxp_Hrev_master 01/06/16 13:54 Pagina 151

Geospatial Health 2016; volume 11:421

Climate change and species distribution: possible scenarios for thermophilic ticks in Romania

Cristian Domșa, Attila D. Sándor, Andrei D. Mihalca Department of Parasitology and Parasitic Diseases, University of Agricultural Sciences and Veterinary Medicine, Cluj-Napoca, Romania 43.1% for H. marginatum, and from 53.4 to 205.2% for R annulatus. Although the developed models demonstrate a good predictive power, the issue of species ecology should also be considered.

Abstract Several zoonotic tick-borne diseases are emerging in Europe due to various factors, including changes of the cultural landscape, increasing human populations, variation of social habits and climate change. We have modelled the potential range changes for two thermophilic tick species (Hyalomma marginatum and Rhipicephalus annulatus) by use of MaxEnt® and 15 climatic predictors, taking into account the aptitude for future climatic change in Romania. Current models predict increased temperatures, both in the short term (up to 2050) and in the long term (up to 2070), together with possible changes also of the other climatic factors (e.g. precipitation), and may lead to higher zoonotic risks associated with an expansion of the range of the target species. Three different models were constructed (the present, 2050 and 2070) for four representative concentration pathways (RCPs) of greenhouse gas scenarios: RCP2.6, RCP4.5, RCP6, and RCP8.5. The most dramatic scenario (RCP8.5) produced the highest increase in the probable distribution range for both species. In concordance with similar continental-wide studies, both tick species displayed a shift of distribution towards previously cooler areas of Romania. In most scenarios, this would lead to wider ranges; from 9.7 to

Correspondence: Attila D. Sándor, Department of Parasitology and Parasitic Diseases, University of Agricultural Sciences and Veterinary Medicine, ClujNapoca, Calea Mănăştur 3-5, 400372 Cluj, Romania. Tel. +40.264.596384 - Fax: +40.264.593792. E-mail: attila.sandor@usamvcluj.ro Key words: Climate change; Modelling; Hyalomma marginatum; Rhipicephalus annulatus; Romania. Acknowledgements: the research was funded by POSDRU grant no. 159/1.5/S/136893 with title: Parteneriat strategic pentru creșterea calității cercetării științifice din universitățile medicale prin acordarea de burse doctorale și postdoctorale – DocMed.Net_2.0. The work of ADS was funded by PN-II-RU-TE-2014-4-1389, while ADM was beneficiary of UEFISCDI Grant PCE IDEI 236/2011. This paper was published under the frame of EurNegVec COST Action TD1303. Received for publication: 10 October 2015. Revision received: 21 January 2016. Accepted for publication: 21 January 2016. ©Copyright C. Domșa et al., 2016 Licensee PAGEPress, Italy Geospatial Health 2016; 11:421 doi:10.4081/gh.2016.421

This article is distributed under the terms of the Creative Commons Attribution Noncommercial License (CC BY-NC 4.0) which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

Introduction Ticks are blood-sucking arthropods considered to be the most important vectors of both human and animal disease in the temperate areas of Europe. Due to various factors, including changes of the cultural landscape, increasing human populations, variation of social habits and climate change, several zoonotic tick-borne diseases are emerging in Europe (Gray et al., 2009). The distribution range of these ticks is influenced by complex interactions that include both biotic (the prevailing type of vegetation, host specificity, host community structure and abundance) and abiotic, climatic factors. Knowledge of tick ranges and their likely distribution changes are needed for the projection of future threats and this information is needed for the formulation of a suitable epidemiologic policy with respect to ticks (Campbell-Lendrum et al., 2015). The distribution of the Hyalomma marginatum and Rhipicephalus annulatus ticks is mainly determined by the temperature and the availability of large animal hosts, particularly ruminants although the larval stages of H. marginatum also commonly feed on small homeotherms, e.g. birds [European Centre for Disease Prevention and Control (ECDC, 2015)]. Both tick species present a southern distribution in Europe with noted range expansions in the last few decades (Estrada-Peña et al., 2012, 2013). Field data collections have shown that their distribution in Romania is influenced by temperature (the tick species under study have appeared in new, now warmer areas only at present). Currently, both species are mostly found in the southern and western regions of the country (Mihalca et al., 2012). H. marginatum is considered to be the most important vector for the Crimean-Congo hemorrhagic fever virus in Europe and Asia (Hoogstraal, 1979; Ergonul and Whitehouse, 2007). In addition Rickettsia aeschlimannii has been detected in specimens collected from migratory birds is several European countries (Rumer et al., 2011; Chochlakis et al., 2012; Hornok et al., 2013; Movila et al., 2013; Tomassone et al., 2013). A number of other viruses (Dhori, Bahig, Matruh) have also been found in H. marginatum, but its vectorial capacity for these viruses is not yet defined (Converse et al., 1974; Moussa et al., 1974; Filipe and Casals, 1979). The immature stages H. marginatum readily feed on migratory birds, while the adults prefer large herbivores, which makes the tick prone to be carried over large distances (Jameson et al., 2012). R. annulatus, on the other hand, feeds specifically on cattle and is an important vector for economically important pathogens as Babesia bigemina, B. bovis and Anaplasma marginale (Walker et al., 2000). Current climate models predict a continuous increase in temperatures, both in the short term (up to 2050) and in the long term (up to

[Geospatial Health 2016; 11:421]

[page 151]


gh-2016_2.qxp_Hrev_master 31/05/16 11:45 Pagina 152

Article

2070) (IPCC, 2007). Together with the potential change of other climatic factors, e.g. precipitations, this change could lead to a higher zoonotic risk as such climatic changes may support an expansion of the range of vectors under study (Williams et al., 2015). We used the currently known distribution data (presence only) for the two tick species studied, first to model the present distribution, using available ecological data. Then, using the projected climate scenarios, the present models were expanded in order to fit the projected changes of environmental variables. To model the distribution we used the maximum entropy approach as described by Phillips et al. (2006), Porretta et al. (2013) and Signorini et al. (2014). We choose to model the impact of the perceived climate change on the distribution of H. marginatum and R. annulatus, both distributed in the warmer areas of southern and southeastern Romania (Mihalca et al., 2012). All four available representative concentration pathways (RCPs) of climate scenarios: RCP2.6, RCP4.5, RCP6, and RCP8.5 (Weyant et al., 2009) were used with both time periods (2050 and 2070). For this paper, we used the Community Climate System Model, version 4 (CCSM4) of climate change (Gent et al., 2011). CCSM is a coupled climate model used to simulate Earthâ&#x20AC;&#x2122;s climate system. It is composed of four separate models simultaneously simulating different components (atmosphere, ocean, land surface and sea/ice) and one central coupler component (that binds the different parts together).

Materials and Methods Tick distributions The tick distribution data used for this study were compiled from the published literature, downloaded from a freely accessible georeferenced database (http://www.geo-parasite.org) (Figure 1). The species selected (H. marginatum and R. annulatus) show a limited distribution in Romania, occurring principally in the South and the Southeast. Their distribution is likely limited by climatic factors as their main hosts (large herbivores for adults and rodents/birds for larvae and nymphs) are common and occur continuously all over the territory of Romania. The number of available records varied. The numbers of H. marginatum records were relatively high (n=171), while those of R. annulatus were much fewer (n=35). We attempted to develop models for both species even if there were comparatively low number of the latter.

Environmental data The models were built using climatic data provided by the WorldClim database (http://www.worldclim.org), which are presented as high-resolution, interpolated rasters. We used the bioclimatic dataset to construct the current and future probable distribution for both species. Environmental data were downloaded from WorldClim database with the highest resolution provided (which is 30 arc-seconds, i.e. one pixel=0.6 km2). The bioclimatic variables, derived from the monthly temperature and rainfall values in order to give the most biologically meaningful variables (Table 1), were downloaded and processed. Since our aim was not to evaluate the contribution of each variable to the

Table 1. Variables used for creating the models. Variable BIO1 BIO3 BIO6 BIO8 BIO9 BIO10 BIO11 BIO12 BIO13 BIO14 BIO15 BIO16 BIO17 BIO18 BIO19

H. marginatum (% contribution)

R. annulatus (% contribution)

4.8 1.7 1.8 0.5 0.1 11.7 29.6 6.4 0.1 2.3 29.6 0.8 0.1 3.9 6.6

5.1 0 0.2 0.3 0.2 1.2 0.4 0.1 1.8 0.1 77.4 1.2 3.7 8.3 0.1

H. marginatum, Hyalomma marginatum; R. annulatus, Rhipicephalus annulatus; BIO1, annual mean temperature; BIO3, isothermality (BIO1/BIO7)*100; BIO6, minimum temperature of coldest period; BIO8, mean temperature of wettest quarter; BIO9, mean temperature of driest quarter; BIO10, mean temperature of warmest quarter; BIO11, mean temperature of coldest quarter; BIO12, annual precipitation; BIO13, precipitation of wettest period; BIO14, precipitation of driest period; BIO15, precipitation seasonality; BIO16, precipitation of wettest quarter; BIO17, precipitation of driest quarter; BIO18, precipitation of warmest quarter; BIO19, precipitation of coldest quarter.

Figure 1. Occurrence points for the modelled species in Romania: A) Hyalomma marginatum; B) Rhipicephalus annulatus.

[page 152]

[Geospatial Health 2016; 11:421]


gh-2016_2.qxp_Hrev_master 31/05/16 11:45 Pagina 153

Article

model but to construct reliable, general models, we used all available variables. For the estimated future climate conditions we used the Community Climate System Model version 4 (CCSM4) (Gent et al., 2011), for two different time periods: 2050 (average for the years 20412060) and 2070 (average for the years 2061-2080). The outcome of the four available greenhouse gas scenarios, i.e. RCP2.6, RCP4.5, RCP6 and RCP8.5 (Weyant et al., 2009) was examined for each period. Thus, we had a succession of three different, probable distribution models for each scenario, i.e. for the present, for 2050 and for 2070.

Modelling methods deployed The MaxEnt software (https://www.cs.princeton.edu/~schapire/maxent/) was used since it has been found to perform better than many other modelling methods (Phillips et al., 2006; Porretta et al., 2013). This technique utilizes presence-only records to estimate the potential distribution of the species based on the environmental variables and presents the results as a probability distribution (Phillips et al., 2006). Present and projected future probable distributions for the two target species were built with the presence data for the tick species in question and the climatic variables using a maximum entropy algorithm. We used 75% of the occurrence points to construct the model (training data) and the remaining 25% to test it (Pearson et al. 2007; NenzĂŠn and AraĂşjo, 2013; Pedersen et al., 2014). In our case, the MaxEnt model output was logistic, which provided an estimated relative probability of presence for any given location ranging from 0 to 1, where 0 signifies a very low probability of species presence and 1 a very high relative probability (Phillips et al., 2006; Signorini et al., 2014). For all the runs, the default parameters of MaxEnt were used. To evaluate the accuracy of the developed models, we used the area under the curve (AUC) of the receiver operating characteristic (ROC) provided by MaxEnt (Peterson et al., 2007; Porretta et al., 2013). To calculate the future increase of the distribution area in respect to the present model, presence/absence maps using the minimum training

presence threshold were constructed using two geographical information systems (GIS) tools for analysis and management: QGIS (free software, https://www.qgis.org/) and SAGA (free software, http://www.sagagis.org/). Differences in future distribution were calculated as percentages of the current distribution. Supplementary, the contribution of each environmental variable to the model, was calculated for both tick species under study (Table 1). These percentages have only informative value, since the highly correlated variable pairs were not accounted for. The raster data were processed and analyzed using the SAGA and QGIS softwares. The models were fitted using MaxEnt software. The final distribution maps were constructed using ArcGIS (ESRI, Redlands, CA, USA).

Results Out of the four greenhouse gas scenarios, the highest degree of potential distribution change of range for both tick species was the most dramatic scenario (RCP8.5) both for 2050 and 2070 (range expansion and range shift in case of 2050 and mainly range shift in case of 2070). For H. marginatum, the AUC value was 0.880 indicating a good performance for our model. As reference for the calculation of the climatically suitable distribution area, the minimum training presence threshold provided by the model was 0.084. The variation of the climatically suitable distribution area for H. marginatum predicted by the model is presented in Table 2. The net area is given in km2 and the difference from the present conditions as a percentage of the present area. The maps for the climatically suitable distribution areas predicted by our model (Figure 2) show an expansion of the range towards the northern and central parts of Romania. The most dramatic increase of the area was in case of RCP 8.5, for the time period 2050. However, the area decreased in the 2070 period in this scenario. Although the expan-

Figure 2. Maps of the distribution of Hyalomma marginatum in Romania according to the model. Probability of occurrence from close to 0 (black) to close to 1 (white).

[Geospatial Health 2016; 11:421]

[page 153]


gh-2016_2.qxp_Hrev_master 31/05/16 11:45 Pagina 154

Article

sion of the area continued into the northern parts of Romania (Figure 2), the decrease was due to the fact that some areas in the southern part would become unsuitable in this scenario. In the case of R. annulatus, the average value for AUC was 0.970, indicating a good performance for our model. As a reference, to calculate the climatically suitable distribution area, the minimum training presence threshold provided by the model was 0.053. The variation of the climatically suitable distribution area for R. annulatus predicted by the model is presented in Table 3. The net area is given in km2 and the difference from the present conditions as a percentage of the present area. The maps for the climatically suitable distribution areas, predicted by our model are shown in Figure 3. Also in case of this species, the most dramatic increase in the suitable area was in relation to RCP 8.5, both for 2050 and 2070 (there was no similar pattern with H. margina-

tum, most of the southern areas remained suitable). There was an expansion of the range towards the southern and western parts of Romania for all models (but not to the central region, as in case of R. annulatus). The MaxEnt software provides also an analysis regarding the contribution of each variable used for the construction of the model. Because we did not eliminate the highly correlated values, the results must be treated cautiously. For H. marginatum the most important variables were the mean temperature of the coldest quarter (28.6%) and the precipitation seasonality (24.4%) (Table 1). For R. annulatus, the most important variable was the precipitation of the warmest quarter (69.9%), followed by the precipitation of driest month (7.8%). There was also a difference in the distribution of variable contribution for the two species. Since the share of the variables was more evenly distributed for H. marginatum than in the case of R. annulatus where one

Table 2. The potential variation of the climatically suitable distribution area for Hyalomma marginatum. Scenario Present range (km2) RCP 2.6 RCP 4.5 RCP 6.0 RCP 8.5

97,992 97,992 97,992 97,992

Range in 2050 (km2)

Change in 2050 (%)

134,823 107,465 91,181 140,250

37.6 9.7 -7.0 43.1

Range in 2070 (km2) Change in 2070 (%) 131,554 126,716 131,752 112,630

34.2 29.3 34.5 14.9

RCP, representative concentration pathway.

Table 3. The potential variation of the climatically suitable distribution area for Rhipicephalus annulatus. Scenario Present range (km2) RCP 2.6 RCP 4.5 RCP 6.0 RCP 8.5

28,181 28,181 28,181 28,181

Range in 2050 (km2)

Change in 2050 (%)

43,236 50,481 27,835 86,023

53.4 79.1 -1.2 205.2

Range in 2070 (km2) Change in 2070 (%) 31,145 43,517 46,046 94,289

10.5 54.4 63.4 234.6

RCP, representative concentration pathway.

Figure 3. Maps of the distribution of Rhipicephalus annulatus according to the model. Probability of occurrence from close to 0 (black) to close to 1 (white).

[page 154]

[Geospatial Health 2016; 11:421]


gh-2016_2.qxp_Hrev_master 31/05/16 11:45 Pagina 155

Article

variable (BIO18=precipitation of the warmest quarter) contributed more than 69%.

Discussion Modelling the future distribution of animal organisms based on climate envelopes is useful for up-to-date studies of climate change adaptations and is often used for the formulation of epidemiologic policies (Pindyck, 2013). Predicting the geographical range changes caused by climate changes has become an important tool for nature conservation (Parmesan et al., 2013), for developing climate-resilient agriculture practices (Howden et al., 2007) or for health-related policies (Wardrop et al., 2013). Range changes have been modelled from global to local scales for parasites and other infectious diseases (Khormi and Kumar, 2014; Lafaye et al., 2013). Tick distribution studies are at the forefront of climate-related distribution modelling, with models built for global and regional distributions, e.g. for H. marginatum (Estrada-PeĂąa et al., 2012) and R annulatus (Williams et al., 2015), while there are no models built at the national scale. This paper presents the first approach in Romania to model future distribution maps for tick species in the light of future climate scenarios. Our purpose was to predict the expansion of the risk zones for two thermophilic tick species in Romania using the predicted models based on the most up-to-date climate scenarios. Given the values for AUC provided by MaxEnt, our models demonstrate as good a predictive power that has been shown by other authors (Phillips et al., 2006, Porretta et al., 2013; Signorini et al., 2014). The general trend of areal expansion toward previously cooler zones is in line with other similar studies on different tick species (Ixodes ricinus, I. scapularis, R. sanguineus), both in Europe and North America (Ogden et al., 2006; Gray et al., 2009; Porretta et al., 2013). In the case of H. marginatum, one of the clusters of its distribution extends from the Balkans (in the North) toward Turkey and Middle East. Analyses of the climate ecological niche for this species indicate a temperaturerelated limiting factor (Estrada-PeĂąa, 2008). The increase in temperature, which is predicted to occur in the near future (IPCC, 2007) is likely to affect the ecology and geographic distribution of many organisms, including ticks. In the case of H. marginatum, the most important climatic variables proved to be the mean temperature of the coldest quarter and the seasonality of precipitation (Estrada-PeĂąa et al., 2012). For all scenarios studied, we foresee an increase in the potential distribution areas of both H. marginatum and R. annulatus. The general trend of increase is more prominent in the northern and central parts of the country. The RCP 2.6 scenario predicted around 16% increase in the area for 2050, which would rise to 28% by 2070. An interesting evolution is shown in case of RCP 8.5: after a large increase projected for 2050, a decrease was noticed with respect to the extent of suitable areas for the year 2070. The explanation is that although we may observe an expansion of species range towards the north and centre, previously suitable areas in the south become unsuitable in the projected climate conditions. Hence, the net expanse of suitable areas should decrease by almost 30%. Furthermore, the tick vulnerability to drought according the Emergency Food Security Assessment - EFSA (2010) could have a negative effect on the species in southernmost part of its present range of distribution in Romania (as seen in our case in the 2070 / RCP 8.5 scenario). In case of R. annulatus, the seasonality of precipitation was found to be the most important variable, which contributed massively to the model (more than 69%). Regarding the evolution of the distribution

range, the pattern is relatively similar. The general trend of area increase is shown to be more prominent in the southern and western parts of the country. All the scenarios predict a general increase. The most conservative one (RCP 2.6), however, predict a relative decrease in 2070 (still a net increase compared to the present), while the most dramatic increase (with around 40%) is predicted by the RCP 8.5 scenario both for 2050 and for 2070. In the latter case, the general pattern for R. annulatus was the expansion of the suitable distribution area combined with a levelling of the probability of occurrence throughout the range. Having more data available usually increases the performance of species distribution modelling and also allows more data to be used for testing (hence better reliability for the predictions) (Phillips et al., 2006). The lower number of data available with regard to R. annulatus may limit the significance of the conclusions for the species. Although the model performance in this case was high (AUC=0.970) due to the fact that most distribution records shared a homogenous set of climatic characteristics, this is not a global rule but an exception caused by high similarity of records in Romania.

Conclusions The modelling of the geographical distribution patterns of vector species in ecological space, using presence data, maximum entropy approach and GIS tools, based on climatic and environmental data, is a useful tool for the understanding of the ecological requirements of the vectors, and hence the vectored pathogens. While vector pathogens require the presence of suitable reservoir hosts too, the assessment of their epidemiologic impact could not be decoupled from the geographic distribution of their vectors. Both studied species display a shift of their distribution towards presently cooler areas of Romania, in concordance with similar continental wide studies. Although the developed models demonstrate a good predictive power, the issue of species ecology should also be considered. The selected species, as well as all other ixodid ticks, depend for parts of their life on the availability of suitable hosts, so the climate is only one essential determinant of their occurrence. As the climate change will very likely affect the hosts as well, this in turn can affect the tick species distribution.

References Campbell-Lendrum D, Manga L, Bagayoko M, Sommerfeld J, 2015. Climate change and vector-borne diseases: what are the implications for public health research and policy? Philos T Roy Soc B 370:20130552. Chochlakis D, Ioannou I, Sandalakis V, Dimitriou T, Kassinis N, Papadopoulos B, Tslentis Y, Psaroulaki, A, 2012. Spotted fever group Rickettsiae in ticks in Cyprus. Microbial Ecol 63:314-23. Converse JD, Hoogstraal H, Moussa MI, Stek M, Jr., Kaiser MN, 1974. Bahig virus (Tete group) in naturally- and transovarially-infected Hyalomma marginatum ticks from Egypt and Italy. Arch Gesamte Virusforsch 46:29-35. ECDC, 2015. http://www.ecdc.europa.eu/en/healthtopics/vectors/ticks/ Pages/hyalomma-marginatum-.aspx?preview=yes&pdf=yes#geo EFSA, 2010. Scientific opinion on the role of tick vectors in the epidemiology of Crimean-Congo hemorrhagic fever and African swine fever in Eurasia. EFSA J 8:1703.

[Geospatial Health 2016; 11:421]

[page 155]


gh-2016_2.qxp_Hrev_master 01/06/16 13:57 Pagina 156

Article

Ergonul O, Whitehouse CA 2007. Crimean-Congo hemorrhagic fever: a global perspective. Springer, Dordrecht, The Netherlands. Estrada-Peña A, 2008. Climate, maps and ticks. In: Proceedings of the ESCMID Conference on Viral Haemorrhagic Fevers (VHFs ’08), Istanbul, Turkey. Estrada-Peña A, Farkas R, Jaenson TG, Koenen F, Madder M, Pascucci I, Salman M, Tarres-Call J, Jongejan F, 2013. Association of environmental traits with the geographic ranges of ticks (Acari: Ixodidae) of medical and veterinary importance in the western Palearctic. A digital data set. Exp Appl Acarol 59:351-66. Estrada-Peña A, Sánchez N, Estrada-Sánchez A, 2012. An assessment of the distribution and spread of the tick Hyalomma marginatum in the western Palearctic under different climate scenarios. Vector Borne Zoonotic Dis 12:758-68. Filipe AR, Casals J. 1979. Isolation of Dhori virus from Hyalomma marginatum ticks in Portugal. Intervirology 11:124-7. Gent PR, Danabasoglu G, Donner LJ, Holland MM, Hunke EC, Jayne SR, Lawrence DM, Neale RB, Rasch PJ, Vertenstein M, Worley PH, Yang ZL, Zhang M, 2011. The community climate system model, version 4. J Climate 24:4973-91. Gray JS, Dautel H, Estrada-Peña A, Kahl O, Lindgren E, 2009. Effects of climate change on ticks and tick-borne diseases in Europe. Interdiscip Perspect Infect Dis 2009:593232. Hoogstraal H, 1979. The epidemiology of tick-borne Crimean-Congo hemorrhagic fever in Asia, Europe, and Africa. J Med Entomol 15:307-417. Hornok S, Csorgo T, de la Fuente J, Gyuranecz M, Privigyei C, Meli ML, Kreizinger Z, Gönczi E, Fernández de Mera IG, Hofmann-Lehmann R, 2013. Synanthropic birds associated with high prevalence of tick-borne rickettsiae and with the first detection of Rickettsia aeschlimannii in Hungary. Vector Borne Zoonotic Dis 13:77-83. Howden SM, Soussana JF, Tubiello FN, Chhetri N, Dunlop M, Meinke H, 2007. Adapting agriculture to climate change. P Natl Acad Sci USA 104:19691-6. IPCC, 2007. Climate change 2007: the physical science basis. Available from: https://www.ipcc.ch/pdf/assessment-report/ar4/wg1/ar4_ wg1_full_report.pdf Jameson LJ, Morgan PJ, Medlock JM, Watola G, Vaux AG, 2012. Importation of Hyalomma marginatum, vector of Crimean-Congo haemorrhagic fever virus, into the United Kingdom by migratory birds. Ticks Tick Borne Dis 3:95-9. Khormi HM, Kumar L, 2014. Climate change and the potential global distribution of Aedes aegypti: spatial modelling using GIS and CLIMEX. Geospat Health 8:405-15. Lafaye M, Sall B, Ndiaye Y, Vignolles C, Tourre YM, Borchi F, Soubeyroux JM, Diallo M, Dia I, Ba Y, Faye A, Ba T, Ka A, Ndione JA, Gauthier H, Lacaux JP, 2013. Rift Valley fever dynamics in Senegal: a project for pro-active adaptation and improvement of livestock raising management. Geospat Health 8:279-88. Mihalca AD, Dumitrache MO, Magdaş C, Gherman CM, Domşa C, Mircean V, Ghira IV, Pocora V, Cozma V, Ionescu DT, Sikó Barabási S, Sándor AD, 2012. Synopsis of the hard ticks (Acari: Ixodidae) of Romania with update on host associations and geographical distribution. Exp Appl Acarol 58:183-206. Moussa MI, Imam IZ, Converse JD, El-Karamany RM, 1974. Isolation of Matruh virus from Hyalomma marginatum ticks in Egypt. J Egypt Public Health Assoc 49:341-8. Movila A, Alekseev AN, Dubinina HV, Toderas I, 2013. Detection of tick-

[page 156]

borne pathogens in ticks from migratory birds in the Baltic region of Russia. Med Vet Entomol 27:113-7. Nenzén HK, Araújo MB. 2011. Choice of threshold alters projections of species range shifts under climate change. Ecol Model 222:334654. Ogden NH, Maarouf A, Barker IK, Bigras-Poulin M, Lindsay LR, Morshed MG, O'Callaghan CJ, Ramay F, Waltner-Toews D, Charron DF, 2006. Climate change and the potential for range expansion of the Lyme disease vector Ixodes scapularis in Canada. Int J Parasitol 36:63-70. Parmesan C, Burrows MT, Duarte CM, Poloczanska ES, Richardson AJ, Schoeman DS, Singer MC, 2013. Beyond climate change attribution in conservation and ecological research. Ecol Letters 16:58-71. Pearson RG, Raxworthy CJ, Nakamura M, Peterson AT, 2007. Predicting species distributions from small numbers of occurrence records: a test case using cryptic geckos in Madagascar. J Biogeo 34:102-17. Pedersen UB, Midzi N, Mduluza T, Soko W, Stensgaard AS, Vennervald BJ, Mukaratirwa S, Kristensen TK, 2014. Modelling spatial distribution of snails transmitting parasitic worms with importance to human and animal health and analysis of distributional changes in relation to climate. Geospat Health 8:335-43. Peterson AT, Papeş M, Eaton M, 2007. Transferability and model evaluation in ecological niche modeling: a comparison of GARP and maxent. Ecography 30:550-60. Phillips SJ, Anderson RP, Schapire RE, 2006. Maximum entropy modeling of species geographic distributions. Ecol Model 190:231-59. Pindyck RS, 2013. Climate change policy: what do the models tell us? National Bureau of Economic Research, Cambridge, MA, USA. Porretta D, Mastrantonio V, Amendolia S, Gaiarsa S, Epis S, Genchi C, Bandi C, Otranto D, Urbanelli S, 2013. Effects of global changes on the climatic niche of the tick Ixodes ricinus inferred by species distribution modelling. Parasite Vector 6:271. Rumer L, Graser E, Hillebrand T, Talaska T, Dautel H, Mediannikov O, Roy-Chowdhury P, Sheshukova O, Mantke OD, Niedrig M, 2011. Rickettsia aeschlimannii in Hyalomma marginatum ticks, Germany. Emerg Infect Dis 17:325-6. Signorini M, Cassini R, Drigo M, Frangipane di Regalbono A, Pietrobelli M, Montarsi F, Stensgaard AS, 2014. Ecological niche model of Phlebotomus perniciosus, the main vector of canine leishmaniasis in north-eastern Italy. Geospat Health 9:193-201. Tomassone L, Grego E, Auricchio D, Iori A, Giannini F, Rambozzi L, 2013. Lyme borreliosis spirochetes and spotted fever group rickettsiae in ixodid ticks from Pianosa island, Tuscany Archipelago, Italy. Vector Borne Zoonotic Dis 13:84-91. Walker JB, Keirans JE, Horak IG, 2000. The genus Rhipicephalus (Acari, Ixodidae): a guide to the brown ticks of the world. Cambridge University Press, Cambridge, UK. Wardrop NA, Kuo CC, Wang HC, Clements AC, Lee PF, Atkinson PM, 2013. Bayesian spatial modelling and the significance of agricultural land use to scrub typhus infection in Taiwan. Geospat Health 8:229-39. Weyant J, Azar C, Kainuma M, Kejun J, Nakicenovic N, Shukla PR, La Rovere E, Yohe G, 2009. Report of 2.6 versus 2.9 Watts/m2 RCPP evaluation panel. IPCC Secretariat, Geneva, Switzerland. Williams HW, Cross DE, Crump HL, Drost CJ, Thomas CJ, 2015. Climate suitability for European ticks: assessing species distribution models against null models and projection under AR5 climate. Parasite Vector 8:1-15.

[Geospatial Health 2016; 11:421]


gh-2016_2.qxp_Hrev_master 31/05/16 11:45 Pagina 157

Geospatial Health 2016; volume 11:403

Is missing geographic positioning system data in accelerometry studies a problem, and is imputation the solution? Kristin Meseck,1 Marta M. Jankowska,1 Jasper Schipperijn,2 Loki Natarajan,1 Suneeta Godbole,1 Jordan Carlson,3 Michelle Takemoto,1 Katie Crist,1 Jacqueline Kerr1 1Department of Family Medicine and Public Health, University of California, La Jolla, CA, USA; 2Department of Sports Science and Clinical Biomechanics, University of Southern Denmark, Odense, Denmark; 3Center for Children's Healthy Lifestyles and Nutrition, Childrenâ&#x20AC;&#x2122;s Mercy Hospital-University of Missouri, Kansas City, MO, USA

Abstract The main purpose of the present study was to assess the impact of global positioning system (GPS) signal lapse on physical activity analyses, discover any existing associations between missing GPS data and

Correspondence: Kristin Meseck, Department of Family Medicine and Public Health, University of California, 9500 Gilman Drive, La Jolla, CA 92093, USA. Tel: +1.858.246.1824 - Fax: +1.858.534.9404. E-mail: kmeseck@ucsd.edu Key words: GPS; GIS; Missing data; Imputation; Accelerometer. Contributions: JK, SG, KC, data collection; KM, MJ, data analysis; SG, LN, statistical design; KM, manuscript writing; MJ, JS, MT, KC, JK, JC manuscript reviewing and revisions. Funding: this research was funded by the National Institutes of Health, and National Cancer Institute research grant R21 CA169535 entitled Development and Validation of Novel Prospective GPS/GIS Based Exposure Measures. Ethical statement: written informed consent was obtained from each participant and all institutional and governmental regulations concerning the ethical use of human volunteers were followed during all studies. All study procedures were approved by the research ethics board of the University of California, San Diego, CA, USA. Acknowledgements: this research was supported/partially supported by the National Institutes of Health, National Cancer Institute research grant R21 CA169535 titled: Development and Validation of Novel Prospective GPS/GIS Based Exposure Measures Jacqueline Kerr, principle investigator. The results of the present study do not constitute any form of endorsement by American College of Sports Medicine. Received for publication: 30 July 2015. Revision received: 19 September 2015. Accepted for publication: 25 October 2015. ŠCopyright K. Meseck et al., 2016 Licensee PAGEPress, Italy Geospatial Health 2016; 11:403 doi:10.4081/gh.2016.403 This article is distributed under the terms of the Creative Commons Attribution Noncommercial License (CC BY-NC 4.0) which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

environmental and demographics attributes, and to determine whether imputation is an accurate and viable method for correcting GPS data loss. Accelerometer and GPS data of 782 participants from 8 studies were pooled to represent a range of lifestyles and interactions with the built environment. Periods of GPS signal lapse were identified and extracted. Generalised linear mixed models were run with the number of lapses and the length of lapses as outcomes. The signal lapses were imputed using a simple ruleset, and imputation was validated against person-worn camera imagery. A final generalised linear mixed model was used to identify the difference between the amount of GPS minutes pre- and post-imputation for the activity categories of sedentary, light, and moderate-to-vigorous physical activity. Over 17% of the dataset was comprised of GPS data lapses. No strong associations were found between increasing lapse length and number of lapses and the demographic and built environment variables. A significant difference was found between the pre- and postimputation minutes for each activity category. No demographic or environmental bias was found for length or number of lapses, but imputation of GPS data may make a significant difference for inclusion of physical activity data that occurred during a lapse. Imputing GPS data lapses is a viable technique for returning spatial context to accelerometer data and improving the completeness of the dataset.

Introduction The use of geographic positioning system (GPS) devices in physical activity (PA) and sedentary behaviour (SB) research has been steadily increasing because of its ability to determine where participants interact with built environment (Kerr et al., 2011; Krenn et al., 2011). These data sources have the potential to have profound impacts on public health by establishing more specific and accurate measures of environmental influences on SB and PA behaviour, and also providing better evidence for public policy change (Jankowska et al., 2015). GPS can show when and how long participants are indoors or outdoors (Quigg et al., 2010; Lam et al., 2013), locate what routes they take for transport (Duncan and Mummery 2007; Duncan et al., 2009) and identify PA and SB behaviour, such as walking, bicycling or driving in specific environments (Troped et al., 2010; Oliver and Badland, 2010). However, missing GPS data due to signal lapse is a problem that may introduce significant bias into modelled relationships between environment and PA or SB. Currently no studies have assessed the bias of missing GPS data in PA and SB studies, and how that bias may influence study outcomes. Signal lapse is inherent in GPS data, which is collected through a connection between the GPS device worn by study participants and multiple satellites in the sky to establish the geographic location of

[Geospatial Health 2016; 11:403]

[page 157]


gh-2016_2.qxp_Hrev_master 31/05/16 11:45 Pagina 158

Article

each individual. We define a signal lapse as the interruption of continuous GPS data collection, resulting in no collection for a variable amount of time, followed by reconnection to satellites and the recommencement of GPS data collection. The reasons for lapse of the signal include physical objects such as buildings, and natural objects such as cloud coverage or dense tree canopies (Costa, 2011). Examples of signal lapse are displayed in Figure 1 [georeferenced background satellite imagery photo provided by SANDAG (2014)]. For population-level PA and SB studies, a challenge for collection of GPS data is posed by freeliving study participants, who move in and out of buildings, pass through dense urban areas and engage in a variety of activities that may involve environments that block GPS signals. Signal lapse is a significant concern for PA and SB studies, as GPS data is often timematched to accelerometer data. GPS signal lapse can result in unsupported assumptions or misclassification (e.g. any period of signal lapse can be due to the participantâ&#x20AC;&#x2122;s location being indoors) and data elimination (e.g. the removal of a participant from analyses when too many data recordings are found to be missing), both of which may produce biased results. Previous studies have managed signal lapses in GPS data in one of three ways. One method involves leaving the data as they were originally collected, with no alteration of the GPS data (e.g. Wheeler et al., 2010). While this method still allows other time-matched data collected during the lapse to be kept and utilised, such as accelerometer data, such data lose spatial context. In spite of this limitation, this approach may be appropriate for studies focusing on locations of outdoor PA (e.g. Lachowycz et al., 2012). Another technique for dealing with missing GPS data is the complete removal of study participants or days of wear time when they do not meet minimum data criteria (e.g. Oliver and Badland, 2010). The third technique to manage missing GPS data is imputation (Ogle and Guensler, 2002; Stopher et al., 2008; Wiehe et al., 2008; Troped et al., 2010). Such methodologies may differ, but almost all GPS data imputation methods are based on spatial or temporal parameters of the previous points. Literature reviews on the use of GPS in PA studies have been conducted (Maddison and Ni Mhurchu, 2009; Krenn et al., 2011), and found that most imputation methods utilise arbitrary decisions of time and distance to impute missing points without validation of imputation assumptions. The goals of this study are twofold: first to assess the demographic and environmental bias of missing GPS data in PA and SB studies. Are certain people or populations more prone to having missing GPS data, and does movement in specific environmental contexts effect data quality? This is an important question for a better understanding what populations GPS and PA/SB studies are better suited for, and if certain environmental contexts are not viable for assessing environment and PA/SB relationships. Our second goal is to develop and validate a GPS imputation method as a solution to missing GPS data and test if there are significant changes of PA and SB time pre- and post-imputation in home and non-home environments.

lected for each participant. The total sample included 782 participants of an average age of 46 years (min 18, max 102), 12.3% Hispanic and 68.6% women.

Data collection All data were collected using the same standardised procedures. Participants wore Qstarz (http://www.qstarz.com) GPS devices (BTQ1000XT) and Actigraph (http://www.actigraphcorp.com) accelerometers (GT3X+). The GPS data were collected every 15 seconds, the accelerometer data 30 times a second. The GPS data were processed and joined to the accelerometer data using the Personal Activity and Location Measurement System (PALMS) (Demchak et al., 2012; Carlson et al., 2015). Data were aggregated and then merged at the minute level. Accelerometer data were classified into sedentary behaviour (counts per minute below 100), light activity (counts per minute between 100 and 1040) and everything above light activity (counts per minute above 1040). This relatively low upper cut-point for activity was chosen because the sample included many older (>65) adults. As this analysis is concerned with the utility of GPS data in PA and SB studies, days were only included in the analysis if they met the criteria for a valid accelerometer wear day defined as a day containing 600 or more accelerometer minutes. Non-wear was defined as 90 or more sequential minutes with an activity count of zero, allowing for 2 minute periods of activity (Heil et al., 2012). In total, participants wore devices for 1-13 days (mean=5.6, standard deviation=1.89). Data for land use in San Diego County were downloaded in 2014 from San Diego Geographic Information Source - JPA/San Diego

Materials and Methods Study sample Data from eight studies using GPS and accelerometer devices were pooled, representing a range of participants, lifestyles and interactions with built environments. The studies were conducted in San Diego County, CA, USA between the years of 2010 and 2013, and included both baseline interventions and observational studies. Demographic data with reference to age, sex, ethnicity and employment status were col[page 158]

Figure 1. Examples of geographic positioning system signal lapse.

[Geospatial Health 2016; 11:403]


gh-2016_2.qxp_Hrev_master 31/05/16 11:45 Pagina 159

Article

Association of Governments (SANDAG, 2014). ESRI ArcGIS, v. 10.2 software (ESRI, Redlands, CA, USA) was used to assess how areas used for residence, transportation, shopping, parks and recreation, health care, office/school, industry, hotel/resort and leisure and others, were related to signal lapse. Additionally, census data on population counts were obtained from American Community Factfinder (http://factfinder.census.gov) at the census block group level from which population density was derived. Data were matched on the minute level, where each minute or single GPS point was spatially matched with the intersecting land use and population density.

Assessing the demographic and environmental bias of missing geographic positioning system data Missing GPS data were assessed on participant- and day-levels. To compute descriptive statistics for the periods of missing data, minutes of missing data were extracted and collapsed into lapses (defined as a period of missing GPS signal â&#x2030;Ľ1 minute), yielding multiple lapses per day, per participant. It is important to note that lapses may be caused by either environmental or technological (e.g. lack of battery) issues, and that these cannot be differentiated with the available data. Minutes of sedentary, light, and above light activity were assessed for each lapse, and descriptive statistics were employed to assess the amount of activity data without GPS signal. Demographic bias was explored to identify if particular population sub-groups engaged in activities within their environments that would interfere with GPS signal more than other groups. A mixed linear effects model (days nested within person) was used to explore the relationship between the number of daily GPS lapses and the individual characteristics of participants. We tested if the number of daily GPS lapses was associated with sex, age, Hispanic ethnicity, employment status and number of daily sedentary minutes while controlling for daily GPS minutes collected. To better understand the environmental biases of GPS signal lapse, a mixed linear effect model (lapse, nested in day, nested in person) was used to test if the length of GPS lapses were associated to environmental characteristics of the lapse (i.e., the land use category of the last known point before the lapse, population density of the last known point before the lapse). Due to non-normality of lapse length (many short lapses), lapse length was square root transformed and placed into the model as a continuous variable. We report the non-transformed coefficients with the transformed significance as transformed coefficients that can be difficult to interpret. This model was controlled for individual level factors found to be significant in the demographic bias model. Land use was placed into the model as a categorical variable to compare the effects of land use categories on signal lapse length. Residential was chosen as a reference category as it had the highest number of signal lapses.

Imputation algorithm, validation, and comparison The imputation algorithm was created in the R environment (R Core Team, 2013) using functions found in the plyr (Wickham, 2011) and gmt (Magnusson, 2014) packages (imputation algorithm available upon request). Periods of missing GPS data were imputed where any lapse had at least one or more valid GPS points preceding it (to allow for points to impute from) and following it (to ensure GPS battery loss or other device malfunctioning was not imputed). The algorithm locates periods of GPS signal lapse, takes the mean centre point of the 20 GPS points that occurred before the lapse (or number of available points if less than 20) and assigned the resulting mean centre latitude and longitude to the minutes comprising the missing lapse period. All data were imputed with no limit with regard to time or distance of

lapse. This decision was based on the assumption that an individual was most likely stationary during a signal lapse, and once they started moving out of the building or location, the signal would begin again. The algorithm was designed on the lapse level rather than the daily level. If a lapse would begin at 11:45 PM of one night and end at 3:05 AM the next day, then the missing data would be imputed from one day to the next. The decision to impute over several days was made because our sample consisted of many participants beyond retirement age, who might be spending all day at home for several days. The algorithm does provide distance between last known GPS point and first GPS point of data re-uptake to allow for removal of very large distances if desired. The results of the imputation algorithm were validated against a dataset of 40 participants who wore a person-worn camera, SenseCam (Vicon Revue v1.0), in addition to GPS devices and accelerometers. SenseCam data have been employed to validate travel episodes, indoor/outdoor time, eating, and physical activity behaviours in previous studies (Ellis et al., 2013; Doherty et al., 2013; Kerr et al., 2013). SenseCam photos were taken at least every 20 seconds. The photos were coded for a personâ&#x20AC;&#x2122;s context (indoor, outdoor, and in-vehicle), and behaviour (walking/running, biking, sitting, standing still, and standing moving within a confined space) (Kerr et al., 2013). Photo classifications were aggregated to the minute level and joined by timestamp to GPS signal lapses. If timestamps did not match, a final category of unmatched data was created. In order to assess if the imputation assumption of stationarity (the individual had not moved during the imputed lapse) were true, lapses were assessed for the amount of time spent in moving and non-moving behaviours, such as in vehicle, walking, and biking, and standing still/sedentary behaviours. These behaviours were further classified as occurring indoors or outdoors. To better understand the utility of imputation in the context of a PA or SB study, and to explore if the method might have statistical implications for PA and SB studies, total minutes per day of non-wear, sedentary, light and above-light activity within 800 m of the home and outside of 800 m of the home were compared pre- and post-imputation using mixed linear effects models to account for days within participants.

Table 1. Mixed linear effects model results: outcome lapse length. Variable Intercept Hotel/resort/leisure Industry Transportation Shopping Office/school Health care Parks and recreation Other land use Population density Female Age Hispanic Employed Sedentary minutes

Coefficient

Standard error

69.00* -22.11 -37.22 -4.44 -15.17 -27.72 -42.31 -21.01 -18.15 -0.01** 114.44 0.05 -26.93 -111.37 2.03**

127.93 23.74 18.50 7.58 9.26 9.12 18.83 12.65 18.61 0.0008 58.60 1.60 93.54 77.75 0.03

*P<.05; ** P<.001.

[Geospatial Health 2016; 11:403]

[page 159]


gh-2016_2.qxp_Hrev_master 31/05/16 11:45 Pagina 160

Article

Results Missing geographic positioning system data The dataset was comprised of 2,007,924 missing GPS points out of 6,171,693 total GPS points, or 32.53%. About half of these missing data (17.39%) were from GPS signal lapses, where a lapse is defined by the GPS signal being re-obtained subsequent to being lost. The other half of these missing data (15.14%) occurred during the beginning of wear time without a GPS signal, or the signal was lost and never re-obtained (i.e., at the heads and tails of each participant’s wear). After identifying and isolating GPS lapses from valid accelerometer wear days, 64.1% or 516 participants across 2033 wear days were identified as having at least one signal lapse with 15,539 total lapses identified across the entire dataset. Lapses were an average of 51.85 minutes in length (±329.60, min 1, max 9650). Participants had an average of 30 (±28.80, min 1, max 157) signal lapses over their entire wear time, with average 7.6 lapses per day (±6.40, min 1, max 39). The total missing GPS lapse time averaged 396.30 missing minutes daily (±882.41, min 1, max 1440).

Demographic and environmental bias of missing geographic positioning system data The results of the mixed linear effects model for the number of lapses found the factor of age to be a significant factor [coeff(standard error, SE):- 0.0384 (0.012), P<0.05]. No other associations were significant; however, being female and employed had positive associations with increasing the number of lapses. Results for the mixed linear effects model for lapse length are displayed in Table 1, where Residential is the reference category for land use. Although this model was performed using square-root transformed values of lapse length, Table 1 reports the non-transformed coefficients with the transformed significance. None of the land use categories had any significantly different associations for increasing the length of lapses when compared to the Residential category. The Hotel/Resort/Leisure category was negatively associated with increasing lapse length [coeff(SE):-22.11 (23.74)] as were the Industry category [coeff(SE): -37.22 (18.50)], Transportation category [coeff(SE): -4.44 (7.58)], Shopping category [coeff(SE): -15.17 (9.26)], Office/School category [coeff(SE): -27.72 (9.12)], Health Care category [coeff(SE): -42.31 (18.83)], Parks and Recreation category [coeff(SE): -21.01 (12.65)], and Other Land Use category [coeff(SE): -18.15 (18.61)]. All land use categories had small associations with lapse length, with Healthcare and Industry having the largest difference, and Transportation the smallest difference from the Residential category. Population density was found to have a significantly negative association with increasing lapse length [coeff(SE): 0.01 (0.0008), P<.001]; however, the coefficient and standard error were very small. The female sex was positively associated with increasing lapse length, with a large coefficient [coeff(SE): 114.44 (58.60)], while age also had positive associations with a much smaller coefficient [coeff(SE): 0.05 (1.60)]. Hispanic ethnicity was found to have a negative association with lapse length [coeff(SE): -26.93 (93.54)]. Finally, sedentary time was found to be significantly associated with increasing signal lapse length [coeff(SE):2.03 (0.03), P<0.001], with each minute increase in sedentary time associated with a 2.03 minute increase in lapse length.

Imputation of the full sample The algorithm imputed 100% of lapse data, or 17.39% of the entire dataset, resulting in 934,890 (15.15%) minutes of missing data remain[page 160]

ing. On average, 396.30 (±882.41, min 1, max 10,410) minutes of data were imputed per person, per day with an average of 1561.47 (±223,304, min 1, max 11,612) points imputed over a participant’s entire wear. Analysis of the accelerometer data found there to be an average of 822.81 (±198.29) of accelerometer minutes per day. Using only matched un-imputed GPS data and accelerometer data, the average minutes per day of data lowered to 560.53±343.76 minute a day. After imputation of GPS signal lapses this daily average increased to 717.71±327.91 minutes. The cut-off for the lowest quartile was 110.5 minutes of imputed data (low imputation), 570 minutes in the highest quartile (high imputation), and 570 in the middle two quartiles (medium imputation). Examples of individuals’ days that fell into these quartiles are displayed in Figure 2. The figure demonstrates the variability of both when and where GPS lapses occurred throughout an individual’s day.

Imputation validation The validation of the imputation algorithm using a subset of the 40 participants and SenseCam photography found that 91.5% of the imputed minutes were classified as occurring indoors and as non-moving. Less than 1% of the data was classified as in-vehicle and less than 4% was classified as moving (Table 2). Due to conflicting time-stamps, 3.49% of the SenseCam photo data was unable to be matched with the GPS data and is therefore unclassified. Table 3 summarises the total average daily means of activity before and after imputation, and broken down for within 800 m of the home and beyond 800 m of the home for each participant. Using a mixed linear effects model to compare pre- and post-imputation, all physical activity categories gained a significant amount of minutes after imputation was conducted. Sedentary behaviours within the home environment gained the largest amount of time [β (CI):80.37 (74.04-80.37), P<0.001] at 95% confidence. Both categories of above-light gained almost three minutes after imputation whereas light in-home gained 24.02 minutes and light outdoor gained almost 12 minutes. For all environments, 115.18 minutes of sedentary behaviour, 35.92 minutes of light activity, and 5.55 minutes of above light activity were gained after imputation.

Discussion The identification of periods of missing GPS data within the dataset demonstrated a high amount of GPS signal lapses on both the daily and participant levels. Although our sample was not necessarily representative of the population of San Diego county, participants did interact with a variety of different environments ranging from urban/suburban to semi-rural while engaging in free-living behaviours. Of valid accelerometer wear days, 17.39% of GPS data were missing, which means the environmental context was unknown for these usable PA and SB data. Although some participants only experienced one signal

Table 2. SenseCam photos analysed for minutes of imputed time (n=12,077). Matched data Indoors Outdoors In-vehicle Unmatched data

[Geospatial Health 2016; 11:403]

Moving

Non-moving

327 (2.71%) 11,043 (91.5%) 152 (1.26%) 22 (0.19%) ̶ 111 (0.92%) 422 (3.49%)


gh-2016_2.qxp_Hrev_master 31/05/16 11:45 Pagina 161

Article

lapse, most experienced several over their entire wear with an average of 30 lapses per participant (7.6 per day). This is likely due to habitual interaction with certain environments and/or the reoccurrence of certain behaviours (e.g., working in offices without windows) accumulating over a participant’s full device-wearing period. The variance in the length of lapses, which averaged at 51.85 minutes with a standard deviation of 329.55 minutes and a median of 2, would indicate that a variety of activities and environments may contribute to signal lapses. Testing for demographic and environment bias of the number of lapses as well as length of lapses found little bias for the types of individuals who may engage in activities that make them more prone to GPS signal loss. This is an encouraging finding for other studies using GPS devices in a variety of adult populations. Age had a significant negative association with the number of lapses indicating that older individuals experience fewer signal lapses – a surprising finding given that many elderly individuals spend more time indoors, especially in this sample, which included elder-care communities. This may be due to less movement in and out of buildings, leading to fewer chances for dropped signals. However, age was not associated with the length of the average lapse. We also found that increased sedentary behaviour was associated with longer lapse lengths indicating that movement is an important aspect of obtaining GPS signal, and that prolonged sitting episodes will likely result in higher amounts of missing GPS signal. Participants whose behaviours do not include long bouts of sitting and do include greater movement throughout their wear day are more likely

to gain and then retain a GPS signal by giving more chances for the GPS satellites to lock onto the GPS device. This has implications for those working with populations known to be more sedentary, where GPS might not be the most appropriate tool or post-collection data loss mitigation will likely be needed. Surprisingly, land use categories were not significantly associated with the signal lapse length, where we expected to find that office and industrial land use might be locations more prone to longer signal lapses and parks or outdoor venues might be less prone. Population density was the only environmental characteristic significantly associated with the length of signal lapses. However, it was found to be inversely related with lapse length, demonstrating that lower population density is associated with longer lapses. While this result seems counterintuitive, it is likely a reflection of where the lapses are occurring, which are in residential areas with comparatively lower population density than in urban centres. A large portion of our participants were past retirement age and living in elder-care communities, and it is likely that they spent most of their time in their homes or on the campus of their care facility. Since we did find a strong association between sedentary minutes and increasing lapse length, we hypothesise that most participants are sedentary within their homes, and many lapses are occurring there. Additionally, the very small coefficient indicates that this effect is not strong. While we did not find large demographic and environmental biases in the missing GPS data, we did find that a significant amount of PA

Figure 2. Total day (24 hours) geographic positioning system signal and imputation in the home (with 800 m of the home) and outside of the home (beyond 800 m) for three example participants of low-, medium-, and high-imputation days.

Table 3. Daily averages and standard deviation of minutes for valid wear days with significance of imputation significance.

Total

Within home buffer

Outside of home buffer

Pre Post β(CI:95) Pre Post β(CI:95) Pre Post β(CI:95)

Sedentary°

Light#

Above-light§

Non-wear time

361.89 (237.17) 477.09 (238.32) 115.18 (108.55-176.79)*** 246.80 (217.30) 327.12 (244.58) 80.37 (74.04-80.37)*** 115.10 (156.5) 149.96 (189.58) 34.83 (30.22-39.44)***

159.10 (114.60) 195.02 (114.25) 35.92 (32.83-39.02)*** 104.27 (93.26) 128.30 (97.64) 24.02 (21.42-26.63)*** 54.83 (76.51) 66.73 (88.13) 11.90 (9.72-14.08)***

38.19 (48.61) 43.74 (51.22) 5.55 (4.35-6.75)*** 21.26 (28.54) 24.12 (29.40) 2.86 (2.07-2.86)*** 16.93 (34.58) 19.62 (38.34) 2.69 (1.76-3.61)***

442.81 (289.20) 503.02 (265.28) 412.10 (300.50) 467.60 (285.33) 30.71 (127.55) 35.42 (135.51)

°0-99 counts per min; #100-1040 counts per min; §more than 1041 counts per min. ***P<.001.

[Geospatial Health 2016; 11:403]

[page 161]


gh-2016_2.qxp_Hrev_master 31/05/16 11:45 Pagina 162

Article

and SB occur when the GPS signal is lost. Table 2 illustrates that missing GPS data is tied to a large proportion of light PA as well as some higher intensity PA. Before imputation, an average of 36 minutes a day of light activity and approximately 5.5 minute of above-light activity occurred during signal lapses. Added up to a weekly level that is an average of over 4 hours of light and over half an hour of above-light activity; this represents a substantial amount of PA data that lacks environmental context. In order to regain the spatial context of this accelerometer data, we developed an imputation method based on the assumption that individuals would most likely not be moving during a GPS lapse (movement would greatly increase the chance that a satellite can pick up the signal). The results of the imputation validation using the SenseCam data subset suggest that this imputation method is accurate and a viable option for managing missing GPS data due to signal lapse. We found that 91.5% of signal lapse minutes were classified as Indoors/Non-moving. It is important to note that San Diego does not have underground transportation, nor a highly dense multi-story built environment. Imputation in cities with underground transportation would be possible by accounting for the metro grid, including entry and exit points in the imputation model (essentially imputing the metro travel path). Imputation for environments with dense high-rise structures may be a more difficult task. This diversity in urban environments does pose a challenge for the adoption of one unified imputation technique across different cities. The imputation procedure utilised in this research was an example of imputing using a general ruleset and mean centre statistic; however, several other forms of imputation have been utilised previously (Stopher et al., 2008; Wiehe et al., 2008; Troped et al., 2010). For this example of imputation we chose not to restrict the algorithm in any way and to demonstrate what imputing the maximum amount of data would look like. Researchers can add their own limitations depending on their population and dataset that could include excluding imputation over a certain distance, or preventing imputation over multiple days. As advances continue in machine-learned algorithms to detect behaviour from accelerometry data (Ellis et al., 2013), imputation could also depend on detected behaviour (sitting/standing) from the accelerometer. For any form of imputation to be fully adopted by GPS and health data analysis practitioners, decisions and justification for exact parameters and procedures used need to be reported so that the process can move towards standardisation. Figure 2 illustrates the variety of locations and times throughout the day that GPS signal loss occurs. The figure particularly highlights that for individuals living in homes with poor signal reception, a large amount of data may be missing in the home environment. Table 3 results support this observation, and demonstrate that a large number of missing data occurs within 800 m of the home. Results from Table 2 also highlight the importance of imputation. Every category of activity, both in and outside of the home, was significantly improved through imputation. Of particular importance is the above light activity category, where 5.5 daily geo-located minutes of activity were added through imputation, evenly distributed between the home and non-home environments. These results suggest that imputation of missing GPS data may add significant PA and SB data to the model, and decrease modelling error associated with missing data.

Conclusions Modelling of demographic and land use characteristics on presence of signal lapse and lapse length indicated that there are some demo[page 162]

graphic and behavioural characteristics associated with GPS signal lapse even after including environmental factors in the model. These biases are relatively small, but coupled with the large amount of data loss present for almost all participants. The results of this study indicate that researchers should consider the demographic and behavioural factors of their study population that may make GPS signal lapse a significant issue. Researchers should also strongly consider imputation. We found significant increases in data across all activity categories after imputation, adding up to large amounts of weekly activity. The imputation technique utilised in this study was validated and found to be highly accurate. We advocate general imputation as an effective tool for mitigating data lost through GPS signal lapse as it has the ability to return spatial context and greater utility to the data set, with the caveat that researchers must consider their research site for situations where the assumption of participant stationarity during signal loss may not apply.

References Carlson J, Jankowska MM, Meseck K, Godbole S, Natarajan L, Raab F, Demchak B, Patrick K, Kerr J, 2015. Validity of PALMS GPS scoring of active and passive travel compared with SenseCam. Med Sci Sport Exer 47:662-7. Costa E, 2011. Simulation of the effects of different urban environments on GPS performance using digital elevation models and building databases. IEEE T Int Transp Syst 12:819-29. Demchak B, Kerr J, Raab F, Patrick K, Kruger IH, 2012. PALMS: a modern coevolution of community and computing using policy driven development. Available from: https://www.computer.org/csdl/proceedings/hicss/2012/4525/00/4525c735.pdf Doherty A, Hodges S, King A, Smeaton AF, Berry E, Moulin CJ, Lindley S, Kelly P, Foster C, 2013. Wearable cameras in health: the state of the art and future possibilities. Am J Prev Med 44:320-3. Duncan MJ, Badland HM, Mummery WK, 2009. Applying GPS to enhance understanding of transport-related physical activity. Sports Med Aus 12:549-56. Duncan MJ, Mummery WK, 2007. GIS or GPS? A comparison of two methods for assessing route taken during active transport. Am J Prev Med 33:51-3. Ellis K, Godbole S, Chen J, Marshall S, Lanckriet G, Kerr J, 2013. Physical activity recognition in free-living from body-worn sensors. Available from: eceweb.ucsd.edu/~gert/papers/par-13.pdf Heil DP, Brage S, Rothney MP, 2012. Modeling physical activity outcomes from wearable monitors. Med Sci Sport Exer 44(Suppl.1):50-60. Jankowska MM, Schipperijn J, Kerr J, 2015. ARTICLE: A framework for using GPS data in physical activity and sedentary behavior studies. Exercise Sport Sci R 43:48-56. Kerr J, Duncan S, Schipperijn J, 2011. Using global positioning systems in health research: a practical approach to data collection and processing. Am J Prev Med 41:532-40. Kerr J, Marshall SJ, Godbole S, Chen J, Legge A, Doherty AR, Kelly P, Oliver M, Badland HM, Foster C, 2013. Using the SenseCam to improve classifications of sedentary behavior in free-living settings. Am J Prev Med 44:290-6. Krenn PJ, Titze S, Oja P, Jones A, Ogilvie D, 2011. Use of global positioning systems to study physical activity and the environment: a systematic review. Am J Prev Med 41:508-15. Lachowycz K, Jones AP, Page AS, Wheeler BW, Cooper AR, 2012. What can global positioning systems tell us about the contribution of dif-

[Geospatial Health 2016; 11:403]


gh-2016_2.qxp_Hrev_master 31/05/16 11:45 Pagina 163

Article

ferent types of urban greenspace to childrenâ&#x20AC;&#x2122;s physical activity? Health Place 18:586-94. Lam MS, Godbole S, Chen J, Oliver M, Badland H, Marshall SJ, Kelly P, Foster C, Doherty A, Kerr J, 2013. Measuring time spent outdoors using a wearable camera and GPS. Available from: https://ucsdpalms-project.wikispaces.com/file/view/ Lam_2013_Measuring_ Time_Spent_Outdoors_Using_Wearable_Camera_GPS.pdf/545841 220/Lam_2013_Measuring_Time_Spent_Outdoors_Using_Weara ble_Camera_GPS.pdf Maddison R, Ni Mhurchu C, 2009. Global positioning system: a new opportunity in physical activity measurement. Int J Behav Nutr 6:73. Magnusson A, 2014. gmt: interface between GMT map-making software and R. Available from: http://cran.r-project.org/package=gmt Ogle J, Guensler R, 2002. Accuracy of global positioning system for determining driver performance parameters. Transport Res Rec 1818:12-24. Oliver M, Badland H, Mavoa S, Duncan MJ, Duncan S, 2010. Combining GPS, GIS, and accelerometry: methodological issues in the assessment of location and intensity of travel behaviors. J Phys Act Health 7:102-8. Quigg R, Gray A, Reeder AI, Holt A, Waters DL, 2010. Using accelerometers and GPS units to identify the proportion of daily physical activity located in parks with playgrounds in New Zealand children.

Prev Med 50:235-40. R Core Team, 2013. R: a language and environment for statistical computing. Available from: http://www.r-project.org/ SANDAG, 2014. SanGIS/SANDAG data warehouse. San Diego Geographic Information Source - JPA/San Diego Association of Governments. Available from: http://www.sandag.org/ index.asp? subclassid=100&fuseaction=home.subclasshome Stopher P, Fitzgerald C, Zhang J, 2008. Search for a global positioning system device to measure person travel. Transportation Res CEmer 16:350-69. Troped PJ, Wilson JS, Matthews CE, Cromley EK, Melly SJ, 2010. The built environment and location-based physical activity. Am J Prev Med 38:429-38. Wheeler BW, Cooper AR, Page AS, Jago R, 2010. Greenspace and childrenâ&#x20AC;&#x2122;s physical activity: a GPS/GIS analysis of the PEACH project. Prev Med 51:148-52. Wickham H, 2011. The split-apply-combine strategy for data analysis. J Stat Soft 40:1-29. Available from: https://www.jstatsoft.org/ index.php/jss/article/view/v040i01/v40i01.pdf Wiehe SE, Hoch SC, Liu GC, Carroll AE, Wilson JS, Fortenberry JD, 2008. Adolescent travel patterns: pilot data indicating distance from home varies by time of day and day of week. J Adolescent Health 42:418-20.

[Geospatial Health 2016; 11:403]

[page 163]


gh-2016_2.qxp_Hrev_master 31/05/16 11:45 Pagina 164

Geospatial Health 2016; volume 11:399

Perceived spatial stigma, body mass index and blood pressure: a global positioning system study among low-income housing residents in New York City Dustin T. Duncan,1-4 Ryan R. Ruff,2,5 Basile Chaix,6,7 Seann D. Regan,1 James H. Williams,1 Joseph Ravenell,1 Marie A. Bragg,1,2 Gbenga Ogedegbe,1,2 Brian Elbel1-3,8 1Department of Population Health, New York University School of Medicine, New York, NY; 2College of Global Public Health, New York University, New York, NY; 3Population Center, New York University, New York, NY; 4Center for Data Science, New York University, New York, NY; 5Department of Epidemiology and Health Promotion, New York University College of Dentistry, New York, NY, USA; 6Pierre et Marie Curie University, Pierre Louis Institute of Epidemiology and Public Health, Paris; 7National Institute of Health and Medical Research, Pierre Louis Institute of Epidemiology and Public Health, Paris, France; 8Wagner Graduate School of Public Service, New York University, New York, NY, USA

Correspondence: Dustin T. Duncan, Spatial Epidemiology Lab, Department of Population Health, New York University School of Medicine, 227 East 30th street, 6th Floor, Room 621, New York, NY 10016, USA. Tel: +1.646.5012674. Fax: +1.646.5012706. E-mail: Dustin.Duncan@nyumc.org Key words: Spatial stigma; Spatial epidemiology; Global positioning system technology; Low-income housing residents; Cardiovascular disease. Acknowledgements: the NYC Low-income Housing, Neighborhoods and Health Study was supported by the NYU-HHC Clinical and Translational Science Institute (CTSI) Pilot Project Awards Program (Dr. Dustin Duncan, Principal Investigator). The NYU-HHC CTSI is supported in part by grant UL1TR000038 (Dr. Bruce Cronstein, Principal Investigator and Dr. Judith Hochman, co-Principal Investigator) from the National Center for Advancing Translational Sciences of the National Institutes of Health. We thank the research assistants for this project: Maliyhah Al-Bayan; Shilpa Dutta; William Goedel; Brittany Gozlan; Kenneth Pass; James Williams; and Abebayehu Yilma. We thank Jeff Blossom for geocoding the participantsâ&#x20AC;&#x2122; addresses, calculating neighborhood percent non-Hispanic Black and neighborhood median household income in ArcGIS and creating the figure used in this study, and we thank William Goedel, Samantha Bennett and Jermaine Blakley for their assistance with the preparation of this manuscript. We also thank the participants for engaging in this research. Received for publication: 17 July 2015. Revision received: 29 January 2016. Accepted for publication: 1 February 2016. ŠCopyright D.T. Duncan et al., 2016 Licensee PAGEPress, Italy Geospatial Health 2016; 11:399 doi:10.4081/gh.2016.399 This article is distributed under the terms of the Creative Commons Attribution Noncommercial License (CC BY-NC 4.0) which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

[page 164]

Abstract Previous research has highlighted the salience of spatial stigma on the lives of low-income residents, but has been theoretical in nature and/or has predominantly utilised qualitative methods with limited generalisability and ability to draw associations between spatial stigma and measured cardiovascular health outcomes. The primary objective of this study was to evaluate relationships between perceived spatial stigma, body mass index (BMI), and blood pressure among a sample of low-income housing residents in New York City (NYC). Data come from the community-based NYC Low-income Housing, Neighborhoods and Health Study. We completed a crosssectional analysis with survey data, which included the four items on spatial stigma, as well objectively measured BMI and blood pressure data (analytic n=116; 96.7% of the total sample). Global positioning systems (GPS) tracking of the sample was conducted for a week. In multivariable models (controlling for individual-level age, gender, race/ethnicity, education level, employment status, total household income, neighborhood percent non-Hispanic Black and neighborhood median household income) we found that participants who reported living in an area with a bad neighborhood reputation had higher BMI (B=4.2, 95%CI: -0.01, 8.3, P=0.051), as well as higher systolic blood pressure (B=13.2, 95%CI: 3.2, 23.1, P=0.01) and diastolic blood pressure (B=8.5, 95%CI: 2.8, 14.3, P=0.004). In addition, participants who reported living in an area with a bad neighborhood reputation had increased risk of obesity/overweight [relative risk (RR)=1.32, 95%CI: 1.1, 1.4, P=0.02) and hypertension/pre-hypertension (RR=1.66, 95%CI: 1.2, 2.4, P=0.007). However, we found no differences in spatial mobility (based GPS data) among participants who reported living in neighborhoods with and without spatial stigma (P>0.05). Further research is needed to investigate how placebased stigma may be associated with impaired cardiovascular health among individuals in stigmatised neighborhoods to inform effective cardiovascular risk reduction interventions.

[Geospatial Health 2016; 11:399]


gh-2016_2.qxp_Hrev_master 31/05/16 11:45 Pagina 165

Article

Introduction Obesity and hypertension persist as major public health problems because of their high prevalence and associated co-morbidities. Obesity is a risk factor for type 2 diabetes (Nguyen et al., 2008; Guh et al., 2009), cardiovascular disease (Guh et al., 2009), hypertension (Nguyen et al., 2008), certain cancers (Guh et al., 2009), and liver disease (Karlas et al., 2013; Shulman, 2014). Hypertension is a risk factor for cardiovascular disease, kidney disease, and stroke (WHO, 2013). From the National Health and Nutrition Examination Survey (NHANES) 2011-2012, the prevalence of obesity among United States adults was estimated at 34.9% (Ogden et al., 2014), while the prevalence of hypertension was 29.1% (Nwankwo et al., 2013). There are long-standing socio-economic disparities in obesity and hypertension rates in the United States, especially stark among lowincome populations in the United States. Low-income populations have higher obesity and hypertension rates compared to the general population (Clarke et al., 2009; Rossen and Schoendorf, 2012; Decker et al., 2013; Krieger et al., 2014). NHANES 2007-2010 data indicates that the prevalence of obesity among low-income adults (classified as below 138% of the federal poverty line) was nearly 40% (Decker et al., 2013). In addition to low socioeconomic status being associated with higher mean blood pressure and hypertension prevalence, low-income individuals are less likely to receive treatment for their elevated blood pressure levels and have their hypertension optimally controlled (Colhoun et al., 1998; Ostchega et al., 2008; Brummett et al., 2011; Egan et al., 2014; Krieger et al., 2014). Research also has documented high rates of obesity and hypertension among low-income housing residents (Digenis-Bury et al., 2008; Duncan et al., 2014c). For example, one Boston study found that the prevalence of self-reported hypertension among low-income housing residents was more than two times higher than the rate of hypertension among non-low-income housing residents (Digenis-Bury et al., 2008). Additionally, low-income housing residents were nearly two times more likely to be obese than non-lowincome housing residents (Digenis-Bury et al., 2008). Research shows that the spatial context, including the neighborhood environment, can influence obesity and hypertension rates (Kawachi and Berkman, 2003; Bennett et al., 2008). For instance, neighborhoods with poor walkability, low neighborhood safety, high fast-food restaurant density, and low supermarket density have been shown to have higher obesity rates (Rundle et al., 2007; Duncan et al., 2009, 2014d; Saelens et al., 2012; Lovasi et al., 2013; Stark et al., 2013; Pham do et al., 2014; Troped et al., 2014). These factors have also been associated with higher rates of hypertension (Chaix et al., 2008; Mujahid et al., 2008; Unger et al., 2014). Other neighborhood characteristics associated with obesity and hypertension include social disorder (e.g. alcohol use, prostitution, drug addiction), physical disorder (e.g. broken windows, vandalism, litter, empty alcohol containers), and neighborhood violence, all of which may be sources of psychosocial stress and thus a potential mechanism connecting these neighborhood characteristics and deleterious cardio-metabolic health outcomes (Bennett et al., 2007; Mujahid et al., 2011; Lovasi et al., 2013). These dimensions of neighborhoods have received substantial attention over the last several decades. Other neighborhood-related factors can result in psychosocial stress. For example, spatial stigma, or the negative representations of place that are attached to neighborhoods may have deleterious effects on the health of residents via psychosocial stress, among other potential pathways (Keene and Padilla, 2014). Spatial stigma has also been defined by the co-occurrence of its components: labeling, stereotyping, separa-

tion, status loss and discrimination (Chaix, 2009). Spatial stigma can include overall neighborhood reputation, media image of the respondentsâ&#x20AC;&#x2122; neighborhood, negative perception of low-income housing residents, and feelings of judgment due to living in subsidised housing. Stigmatised neighborhoods and residents of those neighborhoods may be viewed poorly by the media, individuals living outside the neighborhood, and/or residents themselves. Consequently, these stigmatised neighborhoods or marginalised places can carry negative symbolic meanings that have implications for the health and well being of their residents. Despite evidence demonstrating the associations between neighborhood factors and cardiovascular health outcomes, the role of spatial stigma in obesity and hypertension disparities remains underexplored (Chaix, 2009), including among low-income populations. Previous research has highlighted the salience of spatial stigma on the lives of low-income residents, but has been theoretical in nature and/or has predominantly utilised qualitative methods with limited generalisability and ability to draw associations between spatial stigma and measured cardiovascular health outcomes (Sampson and Raudenbush, 2004; Thompson et al., 2007; Keene and Padilla, 2010, 2014; Kelaher et al., 2010; Tabuchi et al., 2012). Because very little empirical work has been conducted to examine the potential role of spatial stigma as it relates to cardiovascular health, including obesity and hypertension, in low-income populations, the aim of this study was to examine the relationship between spatial stigma, body mass index and blood pressure among a sample of lowincome housing residents in New York City. Based on the previous theoretical and empirical research, we hypothesised that perceived spatial stigma would be associated with impaired cardiovascular health (i.e. elevated body mass index and/or blood pressure) among low-income housing residents in New York City. In a sub-analysis we incorporate global positioning systems (GPS) data to investigate whether participants with and without spatial stigma have different mobility patterns, which has not been evaluated previously and may help us understand why spatial stigma could matter for cardiovascular health profiles. In addition, we sought to investigate if individuals who spend more time in their residential neighborhoods are more sensitive to spatial stigma. We hypothesised that individuals who report living in neighborhoods with spatial stigma will be more spatially mobile and that individuals who spent more time in their residential neighborhood would be more sensitive to spatial stigma.

Materials and Methods Study sample Data used in this study come from the NYC Low-Income Housing, Neighborhoods and Health Study (n=120) (Duncan et al., 2014c; Duncan and Regan, 2015). Recruitment was conducted through community-based outreach, which included handing out flyers outside of public housing developments in four different New York City neighborhoods, as well as through flyers posted and circulated by communitybased organisations that work with low-income individuals (especially public housing residents), flyers posted in community locations (e.g. local stores) and through word of mouth (social networks). Adults were considered eligible for participation in the study if they self-reported living in low-income housing (e.g. public housing) in New York City; were 18 years of age or older; could speak and read English; self-reported not being pregnant; self-reported no difficulty in walking or climbing stairs; and were willing to wear a Global Positioning Systems (GPS) device (on their person; e.g. in their pocket) for one week. The vast

[Geospatial Health 2016; 11:399]

[page 165]


gh-2016_2.qxp_Hrev_master 31/05/16 11:45 Pagina 166

Article

majority (80%) of the participants reported living in public housing (versus other low-income housing) and all participants reported being low-income (e.g. 5.8% of participants in the study reported living in Section 8 housing). We collected survey and objectively measured health (height, weight and blood pressure) data, which were collected in our research office. Data were collected between June and July 2014. Informed consent was obtained from all participants prior to data collection. We geocoded participant address data using methods used in our past work (Duncan et al., 2011a, 2011b) in order to determine and map their neighborhood of residence. The participants came from 4 of the 5 New York City boroughs, 28 ZIP codes, 21 community districts, 17 United Hospital Fund (UHF)-defined neighborhoods, 41 census tracts and 50 census block groups, which are various ways to define neighborhoods in New York City (Duncan et al., 2014a). See Figure 1 for the spatial distribution of the overall sample by borough, which shows that the majority of participants come from Manhattan (65.8%). The New York University School of Medicine Institutional Review Board reviewed and approved the research protocol.

Spatial stigma We assessed spatial stigma using a four-item survey informed by prior work on spatial sigma and health disparities (Sampson and Raudenbush, 2004; Thompson et al., 2007; Keene and Padilla, 2010, 2014; Kelaher et al., 2010; Tabuchi et al., 2012). The first item was Overall, what is the reputation of your neighborhood? Response options included: Good; Moderate; Bad and Don’t know/Not sure. The second item was Overall, is the image of your neighborhood in the media positive? Response options were: Yes; No; and Don’t know/Not sure. The third item was According to you, are people who live in your neighborhood seen negatively outside the neighborhood? Response options included: Yes; No; and Don’t know/Not sure. The fourth item was Do you feel that people judge you because you live in low-income or subsidised housing? Response options included: Yes; No; and Don’t know/Not sure.

the most recent Joint National Committee (JNC 8) guidelines, hypertension is classified as a systolic blood pressure greater than 140 mmHg, a diastolic blood pressure greater than 90 mmHg or self-reported use of blood pressure lowering medications. Pre-hypertension is classified as a systolic blood pressure between 120-139 mmHg, or a diastolic blood pressure between 80-89 mmHg. Normal blood pressure is classified as a systolic blood pressure less than 120 mmHg and a diastolic blood pressure less than 80 mmHg.

Global positioning system data processing Consistent with other studies (Zenk et al., 2011; Hurvitz and Moudon, 2012; McCluskey et al., 2012; Wiehe et al., 2013; Yan et al., 2014; Clark et al., 2014; Dessing et al., 2014; Harrison et al., 2014; Klinker et al., 2014; Yen et al., 2015), GPS tracking of the sample was conducted for one week. Prior to distribution, we programmed the GPS device to log in 30-second intervals (so if a participant wore the GPS device for an hour, and had no data loss it would have 120 GPS points recorded) (Duncan et al., 2014c). During the study orientation and baseline assessment, participants were instructed to place the small QStarz’s BT-Q1000XT GPS device on their belt (using the manufacturer-provided case) or in their pocket and to complete a travel diary. Participants were asked to wear the GPS units at all times, expect when sleeping, swimming or showering. Consisting of a series of checkboxes, the travel diary asked the participant questions related to GPS protocol compliance, Did you charge the GPS monitor today? and Did you carry the GPS monitor with you today? This was meant to help the participant remember to charge the unit and carry it with him or her throughout the week. The GPS device was given to participants in a large plastic zipper storage bag, which also contained a mini USB charging cord for the GPS device, a USB wall adapter for charging, a manufacturer-provided GPS belt holder (if requested), a pamphlet containing background information on GPS and the travel diary. Upon completion of the week-long GPS protocol (i.e. carrying the unit for all journeys, charging the unit daily, and completing the travel diary), we went

Body mass index Following standard protocols, trained research assistants (medical students) measured the participant’s height and weight. Participants were asked to remove their shoes, heavy outer clothing, hats, and any tall hair accessories prior to measurement of height and weight. Height was measured to the nearest tenth of a centimeter using a Seca 213 stadiometer, with each participant’s back to the stadiometer and their head in the Frankfort position (Geeta et al., 2009; Abidin and Adam, 2013; Bacardí-Gascón et al., 2013; McGurk et al., 2013; Prushansky et al., 2013) Weight was measured to the nearest tenth of a kilogram with a Tantina 351 scale (Geeta et al., 2009; Thomas et al., 2010; Yahia et al., 2011; Bacardí-Gascón et al., 2013; Bammann et al., 2013). Body mass index (BMI) was calculated using the standard formula of weight (kg)/[height (cm)/100]2. Underweight is classified as a BMI less than 18.50; normal weight is classified as a BMI 18.50 to 24.99; overweight is classified as a BMI 25.00 to 29.99; and obesity is classified as a BMI above 30.00.

Blood pressure The research assistants were also trained to measure each participant’s blood pressure following standard protocols. Participants sat silently in a chair prior to and during measurement of their blood pressure with their arms outstretched, back supported, legs uncrossed and feet on the floor. After being seated for 15 to 30 seconds, we measured their blood pressure with a Welch Allyn Vital Signs 300 monitor (Hess et al., 2007; Victor et al., 2011; Ravenell et al., 2013). Consistent with [page 166]

Figure 1. Number of participants by borough, New York City Low-income Housing, Neighborhoods, and Health Study.

[Geospatial Health 2016; 11:399]


gh-2016_2.qxp_Hrev_master 31/05/16 11:45 Pagina 167

Article

to community locations (i.e. coffee shop, library) in the participant’s neighborhood to obtain the GPS devices or participants returned to the project office to give back the GPS devices, depending on which option was most convenient for the individual. Participants compliance was high with a GPS return rate of 95.6%, and 114 of the overall study population had GPS data (Duncan et al., 2014c). GPS participant data was downloaded using the Qstarz proprietary software and stored as .gpx files. The GPS data was then cleaned using several scripts written in the python programming language and ArcGIS Models to eliminate duplicate data, GPS points likely caused by multipath reflectance, GPS data likely caused by timing errors, and isolated GPS data (Python Software Foundation. Python Language Reference, version 2.7. Available at: http://www.python.org) and ArcGIS version 10.2 (ESRI, Redlands, CA, USA).

Global positioning system activity space size calculation and percent time in residential neighborhoods Global positioning system activity space buffers were created using ArcGIS version 10 (ESRI). There are various ways to define an activity space (e.g. convex hull, one standard deviation ellipse and daily path area) as well as various distance thresholds for these measures (Zenk et al., 2011; Christian, 2012). In this study, we used a daily path area (a buffering zone drawn around the GPS tracks), which is a common method in behavioral geography research to understand where participants spend the majority of their time and exposure to environments (Zenk et al., 2011; Christian, 2012). Consistent with previous research (Zenk et al., 2011; Christian, 2012), we buffered all the pre-processed GPS points at 0.5 mile and dissolved these separate features into a single feature, or space to create an activity space for each participant. A half-mile was selected to capture the immediate vicinity around activity locations, general area of exposure, and travel routes. While the literature is mixed on a common buffer size, one half-mile was selected because it has been shown to give a good estimate of exposure based on walkability studies, and is a commonly used distance in physical activity studies (Frank et al., 2004; Cohen et al., 2006; Troped et al., 2010). The activity space size was expressed in square miles. In addition, we calculated the percent of GPS points within the residential neighborhood (i.e. percent of time spent in the neighborhood). Neighborhoods were defined as 400- and 800-meter street network buffers around one’s residential address (Rundle et al., 2009; Duncan et al., 2011a, 2012, 2014b, 2014d; Jilcott et al., 2011; Leung et al., 2011; Schwartz et al., 2011; Duncan and Hatzenbuehler, 2014; Reitzel et al., 2014; James et al., 2014; Troped et al., 2014).

Other variables Age categories included 18-24, 25-44, and 45+. Gender categories included male and female. Race/ethnicity categories included Black/African American, Asian/Asian American, Hispanic, White/Caucasian, and Other. Education levels included less than a 12th grade education, high school degree or GED, some college or vocational school, completion of bachelor’s degree, and completion of graduate degree. Employment status groups were defined as full-time, part-time, unemployed, and retired/school. Household income included <$25,000, $25,000-$49,999, $50,000-$74,999, and $75,000+ categories. In addition, neighborhood percent non-Hispanic Black and neighborhood median household income at the census block group level were calculated using geographic information systems (GIS) software using data from the 2010 US Census and the 2009-2013 American Community Survey (United States Census Bureau, 2010).

Statistical analyses The analytic sample for the primary analyses included only participants who answered all four spatial stigma items (n=116; 96.7%). Using this restricted sample, we first generated descriptive statistics for the sample by participant demographics. Then, we computed descriptive statistics for all spatial stigma items and set all stigma responses of Don’t know/Not sure to missing. In analysing the GPS data to determine whether participants with and without spatial stigma have different mobility patterns, we first used a measure of activity space size. Additionally, we investigated whether individuals with larger proportions of time exposed to their residential neighborhoods were more sensitive to spatial stigma, using GPS percent of time in residential buffers (400 and 800 meter network buffers). Analyses for activity space size used simple linear regression, and those for network buffers used generalised linear modeling with a logit link and a binomial distribution with robust standard errors. After this, we fit a series of multivariable models including each of the four spatial stigma items. In particular, each model included only one spatial stigma item in light of potential multicollinearity. Outcomes for multivariable analyses included BMI, systolic blood pressure, diastolic blood pressure, overweight/obesity status, and hypertension/prehypertension status. A series of five regression models were constructed. Three linear regression models included BMI and blood pressure (for systolic and diastolic) as continuous dependent variables. Overweight-obesity and pre-hypertension-hypertension status were dichotomised into overweight-obese versus not and pre-hypertensivehypertensive versus not groups and used in two regression models. The degree of potential clustering due to neighborhood effects was estimated using intercept-only random-effects linear and logistic regression models, clustered by census block group. Large intraclass correlelation coefficients were found for diastolic blood pressure (18.9%) and hypertension/pre-hypertension status (24.0%). For diastolic and hypertensive/pre-hypertensive models, regression analyses utilised clustered robust standard errors due to large ICC levels. As overweight/obese and pre-hypertensive/hypertensive status are common in our study sample, odds ratios were likely to overestimate the effect (Thompson et al., 1998; McNutt et al., 2003; Behrens et al., 2004; Schmidt and Kohlmann, 2008). Therefore, relative risks (RRs; i.e., prevalence ratios) were calculated rather than odds ratios. Odds ratios were obtained using logistic regression and converted to relative risks. Modified Poisson regression with clustered robust variances were used to estimate relative risks for the hypertension/prehypertension model. Covariates for all models included individual-level age, gender, race/ethnicity, education level, employment status, total household income neighborhood percent non-Hispanic Black (continuous variable) and neighborhood median household income (continuous variable). Statistical analysis was performed using Stata version 13 (Stata Corp; College Station, TX, USA). All P values reported are twosided. Statistical significance was evaluated by 95% confidence intervals (CIs) and P values less than 0.05.

Results Table 1 presents socio-demographic characteristics of the sample of low-income housing residents. Over half (56%) of the sample was female, 68% were Black/African American, and 39% were 45 years or older. A large proportion of the sample was obese and/or hypertensive: 40% were obese and 38% of the sample was hypertensive. Slightly over half (52%) reported a moderate neighborhood reputation, while 21%

[Geospatial Health 2016; 11:399]

[page 167]


gh-2016_2.qxp_Hrev_master 31/05/16 11:45 Pagina 168

Article

reported their neighborhood reputation as bad (Table 2). Almost 36% reported that the media did not positively view their neighborhood. 41% reported negative external perception and 58% reported that they feel judged from living in low-income housing. We found no differences in spatial mobility (based on GPS data) among participants who reported living in neighborhoods with and with out spatial stigma (P>0.05) (Table 2). For example, participants reporting living in a neighborhood with a good reputation had a mean activity space of 13.8 square miles, living in a neighborhood with a moderate reputation had a mean activity space of 12.7 square miles, and living in a neighborhood with a bad reputation had a mean activity space of 12.3 square miles (P=0.89). Moreover, we found that individuals exposed to a larger fraction of their time to their residential neighborhood were not more sensitive to spatial stigma (P>0.05). For example, participants reporting living in a neighborhood with a good reputation spent 52.6% of time in their residential neighborhood (400-meter network buffer), living in a neighborhood with a moderate reputation spent 59.9% of time in their residential neighborhood (400-meter network buffer), and living in a neighborhood with a bad reputation spent 43.7% of time in their residential neighborhood (400-meter network

buffer) (P=0.35). Multivariable models of spatial stigma, BMI and blood pressure are shown in Table 3. Participants who reported living in an area that had a bad neighborhood reputation had higher BMI (B=4.2, 95%CI: -0.01, 8.3, P=0.051), as well as higher systolic blood pressure (B=13.2, 95%CI: 3.2, 23.1, P=0.01) and diastolic blood pressure (B=8.5, 95%CI: 2.8, 14.3, P=0.004). In addition, participants who reported living in an area with a bad neighborhood reputation had increased prevalence of obesity/overweight (RR=1.32, 95%CI: 1.1, 1.4, P=0.02) and hypertension/pre-hypertension (RR=1.66, 95%CI: 1.2, 2.4, P=0.007). Finally, we found that reporting feeling judged from living in public housing have lower diastolic blood pressure (B=-5.5, 95%CI: 10.2, -0.66, P=0.027).

Discussion No study, to our best knowledge, has quantitatively assessed relationships between spatial stigma and cardiovascular health, especially among low-income housing residents who traditionally have high rates

Table 1. Socio-demographic characteristics of the sample of New York City low-income housing residents (n=116). Overall Individual-level

Gender Race/ethnicity

Age Education

Income

Employment

BMI

Blood pressure

Male Female Black Asian White Hispanic Other 18-24 25-44 45+ Less than high-school education High school/GED Some college College graduate Graduate degree Less than $25,000 $25,000-$49,999 $50,000-$74,999 $75,000+ Full-time Part-time Unemployed Retired School BMI (SD) Underweight Normal weight Overweight Obese Systolic (SD)° Diastolic (SD)° Normal Pre-hypertensive Hypertensive

%

95% CI

43.9 56.1 67.5 0.9 4.4 23.7 3.5 25.9 35.3 38.8 28.9 40.4 23.7 5.3 1.8 72.3 20.5 5.5 1.8 14.9 18.4 54.4 5.3 7.0 29.4 (7.9) 1.7 31.9 26.7 39.7 129.1 (18.8) 76.6 (13.1) 31.0 31.0 37.9

34.9, 53.2 46.8, 65.1 58.3, 75.6 0.0, 6.1 1.8, 10.2 16.7, 32.5 1.3, 9.1 18.6, 34.7 27.1, 44.6 30.3, 48.1 21.3, 38.1 31.6, 49.7 16.7, 32.5 2.4, 11.3 0.0, 6.9 63.2, 79.9 14.0, 29.2 2.4, 11.5 0.0, 7.0 9.4, 22.8 12.3, 26.8 45.1, 63.4 2.4, 11.4 3.5, 13.5 28.0, 30.8 0.4, 6.8 24.0, 41.0 19.4, 35.6 31.1, 48.9 125.6, 132.6 74.2, 79.0 23.2, 40.1 23.2, 40.1 29.5, 42.2

28.4 (20.6) 44,341.8 (27,767.6)

24.7, 32.2 39,235.0, 49,448.6

Neighborhood-level Neighborhood percent non-Hispanic Black (SD) Neighborhood median household income (SD)

CI, confidence interval; GED, general educational development; BMI, body mass index; SD, standard deviation. °Values are expressed as mmHg.

[page 168]

[Geospatial Health 2016; 11:399]


gh-2016_2.qxp_Hrev_master 31/05/16 11:45 Pagina 169

Article

of obesity and hypertension (Digenis-Bury et al., 2008; Duncan et al., 2014c). In this study, we primarily sought to evaluate relationships between spatial stigma, BMI, and blood pressure among a sample of low-income housing residents in New York City. We found that a bad neighborhood reputation was associated with increased BMI, as well as related to increases in both systolic and diastolic blood pressure. We also found that participants who reported living in an area with a bad neighborhood reputation had increased prevalence of obesity/overweight and hypertension/pre-hypertension. This highlights that neighborhood reputation as compared with the other dimensions of spatial stigma may be the most salient aspect related to cardiovascular health, as other dimensions of spatial stigma were not associated with body mass index or blood pressure. While the connection between spatial stigma and health is fairly understudied, our overall results complement past theoretical and qualitative literature on spatial stigma and health disparities. This past research has suggested that spatial stigma acts as a psychosocial stressor and can contribute to a range of physical and mental health out-

comes (Sampson and Raudenbush, 2004; Keene and Padilla, 2010, 2014; Kelaher et al., 2010; Tabuchi et al., 2012). Negative place-based identity from neighborhoods can be transferred to the residents of these neighborhoods who incorporate this identity, which negatively affects their behavior. High levels of spatial stigma may induce stress, which is in turn associated with increased BMI and blood pressure. The stress from spatial stigma may be due in part to individuals who reside in stigmatised neighborhoods (e.g. high-poverty neighborhoods) facing daily discrimination when others view them negatively because of where they live, such as discrimination based on negative stereotypes that people hold about certain neighborhoods. In addition, another possible mechanism is through more depressive feelings related to altered individual identity as affected by the negative collective identify (Tabuchi et al., 2012): that is, spatial stigma can produce a negative view of self, which in turn can affect absence of efforts to manage one’s body weight, to monitor one’s blood pressure (Chaix, 2009). This discrimination may also limit people’s economic and health-promoting opportunities. Specifically, it has been postulated that spatial stigma

Table 2. Prevalence of spatial stigma in New York City Low-income Housing, Neighborhoods and Health Study. Percent of overall Mean of activity (analytic sample) space size (n=116) (n=102) Good Moderate Bad Don’t know/not sure Positive media image Yes No Don’t know/not sure Negative external perception Yes No Don’t know/not sure Feel judged from housing Yes No Don’t know/not sure

19.8 51.7 20.7 7.8 27.6 35.3 37.1 40.5 27.6 31.9 57.8 24.1 18.1

Neighborhood reputation

13.8 12.7 12.3 15.9 14.7 13.6 12.6 13.8 14.5 11.8 14.6 12.2 11.0

P*

Percent in residential buffer (400-meters)

P*

.89

52.6 59.9 43.7 48.8 56.8 48.5 57.0 52.2 54.4 59.7 53.4 52.8 57.7

.35

.69

.77

.32

.87

.37

.69

Percent in P* residential buffer (800-meters) 59.1 65.8 54.0 58.1 62.3 57.4 64.5 62.2 59.5 66.5 61.4 58.9 64.7

.57

.73

.64

.81

*P trend for neighborhood reputation (all hypothesis tests set Don’t know/not sure to missing).

Table 3. Multivariable models of spatial stigma, body mass index and blood pressure (n=116). Model 1: BMI

Neighborhood reputation

Good (ref) Moderate Bad Positive media image Yes (ref) No Negative external No (ref) perception Yes Feel judged from housing No (ref) Yes

Model 2: Obese/overweight vs Not

β

95% CI

P

-.07 4.2 1.1 1.3 -.77

-3.5, 3.4 -.01, 8.3 -3.3, 5.4 -2.7, 5.2 -4.3, 2.8

.97 .051 .627 .518 .668

RR 95% CI .92 1.32 1.19 .97 1.03

P

.55, 1.2 .61 1.1, 1.4 0.02 .70, 1.5 .43 .57, 1.2 .85 .65, 1.3 .87

Model 3: Blood pressure (systolic)

Model 4: Blood pressure (diastolic)

β 95% CI

P

β 95% CI

P

7.3 -.92, 15.4 13.2 3.2, 23.1 -6.4 -16.7, 3.9 -4.96 -14.5, 4.6 -6.04 -13.9, 1.9

.081 0.01 .218 .302 .132

.68 -4.3, 5.6 8.5 2.8, 14.3 -.64 -8.4, 7.1 -1.98 -8.8, 4.9 -5.5 -10.2, -.66

.784 .004 .867 .561 .027

Model 5: HTN/Pre-HTN vs Not RR 95% CI P 1.31 1.66 .82 .85 .86

.97, 1.8 1.2, 2.4 .65, 1.0 .63, 1.1 .67, 1.1

.075 .007 .104 .252 .241

BMI, body mass index; HTN, hypertension; CI, confidence interval; RR, relative risk. All models adjusted for individual-level age, gender, race/ethnicity, income, education, employment status, neighborhood percent non-Hispanic Black and neighborhood median household income. For overall spatial stigma, the reference group is Good perceptions of the neighborhood. For positive media image, reference group is Yes, overall media image is positive. For negative external perception, reference group is No, people in my neighborhood are not seen negatively outside the neighborhood. For housing judgment, reference group is No, people do not judge me because I live in low-income/subsidised housing.

[Geospatial Health 2016; 11:399]

[page 169]


gh-2016_2.qxp_Hrev_master 31/05/16 11:45 Pagina 170

Article

restricts residentsâ&#x20AC;&#x2122; access to health-promoting resources and limits economic opportunities, and thus contributes to persistent health disparities (Thompson et al., 2007; Keene and Padilla, 2014). For example, employers may discriminate in the hiring process against addresses from disadvantaged neighborhoods (Kirschenman and Neckerman, 1991; Wilson, 1996). In addition, spatial stigma, including neighborhood reputation, can affect the type of resources and opportunities in that neighborhood. It is plausible that certain respected brands might not want their stores to be located in neighborhoods marked by spatial stigma. The finding about being judged from living in public housing being associated with reduced diastolic blood pressure was unexpected. This may suggest that participants in affluent neighborhoods are more judged because in poor neighborhoods as everyone find it normal. So one would only be judged when being in an affluent neighborhood and at the same time living in public housing. In addition, in a subanalysis we incorporated GPS data to investigate whether participants with and without spatial stigma have different spatial mobility patterns and we found no differences. Perhaps spatial mobility is influence by other macro-social factors such as neighborhood poverty. Moreover, we found that individuals exposed to a larger fraction of their time to their residential neighborhood were not more sensitive to spatial stigma. It is unclear why spatial stigma was not more salient to individuals who spend more time in their residential neighborhoods. Spatial stigma is an illuminating but understudied phenomenon that may contribute to persistent health disparities among low-income and marginalised populations. However, further research is needed. For instance, the role of spatial stigma may vary across different geographic contexts and across different population groups, including non-lowincome populations. Our study only evaluated spatial stigma in lowincome urban environments, and thus future research should continue to evaluate the role of spatial stigma on cardiovascular health as well as other health outcomes in larger samples and samples across different geographies, including in rural locations. Future research could also assess how resilient these findings are across age, gender or racial/ethnic subgroups. In addition, future research can identify mediating mechanisms of behavioral nature (e.g. perhaps increased food intake) or psychological nature (e.g. depression) linking spatial stigma to cardiovascular health. Understanding the mediating role of health behaviors and health states can guide intervention development. While spatial stigma can be measured using individual-level survey research methods, as was done in this study, other research methods can be used, including objective methods such as newspaper reports as well as using an ecometric approach to produce objective indicators of stigma in each neighborhood by aggregating survey responses from different participants residing in this neighborhood (Gauvin et al., 2005; Fone et al., 2006; Mujahid et al., 2007; De Jong et al., 2011; Corsi et al., 2012). For example, researchers could conduct street-intercept surveys with individuals in neighborhoods as previous research has shown that this particular survey method can yield more representative samples when compared to traditional sampling methods (e.g. random digit dialing) (Miller et al., 1997; Ompad et al., 2008) and then aggregate responses to a neighborhood unit for analyses. Furthermore, there is an additional research opportunity to psychometrically develop an instrument of spatial stigma that would assist in these types of research investigations. This future research may help us further understand and eventually reduce health disparities experienced among low-income housing residents. For instance, in addition to structural policy environmental interventions, psychosocial interventions that address spatial stigma may be needed to improve cardiovascular health among residents in low-income housing. Media outlets could be provided with and provide more constructive and de-stigmatising images of low-income [page 170]

neighborhoods and their residents, which could be a public health intervention. This study is subject to several limitations that we noted here. First, our results may not be generalisable to low-income populations in other non-urban regions of the United States: we had a convenient relatively small sample of low-income housing residents in New York City. Having a small sample size likely reduced power to detect significant effects. However, our sample includes a multi-ethnic sample of lowincome housing residents across different New York City neighborhoods and our study is the first quantitative study of spatial stigma on cardio-metabolic health among the studied population. In addition, while 116 participants is a relatively small sample size for general population, many GPS studies have fewer than 100 participants so the sample sized used here is on par with and even exceeds the sample size of many GPS-based research. Selection bias might also be a concern: we could assume that high BMI negatively influences participation in the study, and that stigma also negatively influences participation. It would cause bias in the spatial stigma BMI association. The selection bias we describe would likely pull the positive association that we document towards the null. In addition, the study was limited to English speaking low-income housing residents, and consequently our findings may not be generalisable to non-English speaking low-income housing populations. While self-report bias can be an issue with self-reported blood pressure as well as self-reported height and weight data, we were able to objectively measure both blood pressure and BMI, which is an important strength of our study. However, blood pressure was calculated by a single measurement. A clinical diagnosis of hypertension usually requires multiple blood pressure measurements and several previous studies have used the average of two or more measurements. The use of a single measurement may overestimate the prevalence of hypertension. Social desirability bias, with regards to perceived spatial stigma, may also be an issue. Furthermore, residual confounding is a potential limitation. For instance, the survey did not evaluate residential history and thus we were not able to control for that. This study was a cross-sectional analysis. As such, our study does not provide evidence that spatial stigma is casually associated with cardiovascular health outcomes. In addition, reserve causation may be a concern. From an environmental psychology point of view, it might be that obese people, if they have a lower self-esteem from this condition, tend to perceive less favorably their neighborhood. If that were true, this reverse causation would contribute to the positive association that we found between spatial stigma and BMI. Finally, there are caveats about the GPS analysis. It should be noted that while GPS data allows for potentially highly accurate point locations of participants, these data may be limited when the GPS receiver cannot find enough satellites to triangulate its location. In addition after a long period without satellite communication, GPS receivers may take additional time to acquire a fix location and these issues are exacerbated in cities with many think and tall concrete buildings. Additionally, while the GPS analysis by spatial stigma indicators is very novel, we must note that the modifiable areal unit problem is a limitation. In particular, the selected buffer size around the GPS points could have influenced the findings.

Conclusions In conclusion, overall perceived spatial stigma was significantly associated with increased BMI, as well as significantly related to increases in both systolic and diastolic blood pressure. We also found that participants who reported living in an area with a bad neighbor-

[Geospatial Health 2016; 11:399]


gh-2016_2.qxp_Hrev_master 31/05/16 11:45 Pagina 171

Article

hood reputation had increased risk of obesity/overweight and hypertension/pre-hypertension. Further research is needed to investigate how place-based stigma may be associated with impaired cardiovascular health among individuals in stigmatised neighborhoods to inform effective cardiovascular risk reduction and quality-of-life interventions.

References Abidin NZ, Adams MB, 2013. Prediction of vertical jump height from anthropometric factors in male and female martial arts athletes. Malays J Med Sci 20:39. Bammann K, Huybrechts I, Vicente-Rodriguez G, Easton C, De Vriendt T, Marild S, Mesana M, Peeters MW, Reilly JJ, Sioen I, Tubic B, Wawro N, Wells JC, Westerterp K, Pitsiladis Y, Moreno LA, IDEFICS Consortium, 2013. Validation of anthropometry and foot-to-foot bioelectrical resistance against a three-component model to assess total body fat in children: the IDEFICS study. Int J Obesity 37:5206. Barcadí-Gascón M, Jones EG, Jiménez-Cruz A, 2013. Prevalence of obesity and abdominal obesity from four to 16 years old children living in the Mexico-USA border. Nutr Hosp 28:479-85. Behrens T, Taeger D, Wellmann J, Keil U, 2004. Different methods to calculate effect estimates in cross-sectional studies. A comparison between prevalence odds ratios and prevalence ratio. Method Inform Med 43:505-9. Bennett GG, McNeill LH, Wolin KY, Duncan DT, Puleo E, Emmons KM, 2007. Safe to walk? Neighborhood safety and physical activity among public housing residents. PLoS Med 4:1599-606. Bennett GG, Wolin KY, Duncan DT, 2008. Social determinants of obesity. In: Hu F (ed.) Obesity epidemiology: methods and applications. Oxford University Press, Oxford, UK. pp. 342-76. Brummett BH, Babyak MA, Siegler IC, Shanahan M, Harris KM, Elder GH, Williams RB, 2011. Systolic blood pressure, socioeconomic status, and biobehavioral risk factors in a nationally representative US young adult sample. Hypertension 58:161-6. Chaix B, 2009. Geographic life environments and coronary heart disease: a literature review, theoretical contributions, methodological updates, and a research agenda. Annu Rev Publ Health 30:81-105. Chaix B, Bucimetière P, Lang T, Haas B, Montaye M, Ruidavets JB, Arbeiler D, Amouyel P, Ferrières J, Bingham A, Chauvin P, 2008. Residential environment and blood pressure in the PRIME study: is the association mediated by body mass index and waist circumference? J Hypertens 26:1078-84. Christian WJ, 2012. Using geospatial technologies to explore activitybased retail food environments. Spat Spatiotemporal Epidemiol 3:287-95. Clark RA, Weragoda N, Paterson K, Telianidis S, Williams G, 2014. A pilot investigation using global positioning systems into the outdoor activity of people with severe traumatic brain injury. J Neuroeng Rehabil 11:37. Clarke P, O’Malley PM, Johnston LD, Schulenberg JE, 2009. Social disparities in BMI trajectories across adulthood by gender, race/ethnicity and lifetime socio-economic position: 1986-2004. Int J Epidemiol 38:499-509. Cohen DA, Ashwood JS, Scott MM, Overton A, Evenson KR, Staten LK, Porter D, McKenzie TL, Catellier D, 2006. Public parks and physical activity among adolescent girls. Pediatrics 118:e1381-9. Colhoun HM, Hemingway H, Poulter NR, 1998. Socio-economic status and blood pressure: an overview analysis. J Hum Hypertens 12:91-110.

Corsi DJ, Subramanian SV, McKee M, Li W, Swaminathan S, LopezJaramillo P, Avezum A, Lear SA, Dagenais G, Rangarajan S, Teo K, Yusuf S, Chow CK, 2012. Environmental profile of a community’s health (EPOCH): an ecometric assessment of measures of the community environment based on individual perception. PLoS One 7:e44410. De Jong K, Albin M, Skärbäck E, Grahn P, Wadbro J, Merlo J, Björk J, 2011. Area-aggregated assessments of perceived environmental attributes may overcome single-source bias in studies of green environments and health: results from a cross-sectional survey in southern Sweden. Environ Health 10:4. Decker SL, Kostova D, Kenney GM, Long SK, 2013. Health status, risk factors, and medical conditions among persons enrolled in Medicaid vs uninsured low-income adults potentially eligible for Medicaid under the Affordable Care Act. J Am Med Assoc 309:257986. Dessing D, De Vries SI, Graham JM, Pierik FH, 2014. Active transport between home and school assessed with GPS: a cross-sectional study among Dutch elementary school children. BMC Public Health 14:227. Digenis-Bury EC, Brooks DR, Chen L, Ostrem M, Horsburgh CR, 2008. Use of a population-based survey to describe the health of Boston public housing residents. Am J Public Health 98:85-91. Duncan DT, Aldstadt J, Whalen J, Melly SJ, Gortmaker SL, 2011a. Validation of walk score for estimating neighborhood walkability: An analysis of four US metropolitan areas. Int J Environ Res Public Health 8:4160-79. Duncan DT, Castro MC, Blossom JC, Bennett GG, Gortmaker SL, 2011b. Evaluation of the positional difference between two common geocoding methods. Geospat Health 5:265-73. Duncan DT, Castro MC, Gortmaker, Aldstadt J, Melly SJ, Bennett GG, 2012. Racial differences in the built environment-body mass index relationship? A geospatial analysis of adolescents in urban neighborhoods. Int J Health Geogr 11:11. Duncan DT, Hatzenbuehler ML, 2014. Lesbian, gay, bisexual, and transgender hate crimes and suicidality among a population-based sample of sexual-minority adolescents in Boston. Am J Public Health 104:272-8. Duncan DT, Johnson RM, Molnar BE, Azrael D, 2009. Association between neighborhood safety and overweight status among urban adolescents. BMC Public Health 9:289. Duncan DT, Kapadia F, Halkitis PN, 2014a. Examination of spatial polygamy among young gay, bisexual, and other men who have sex with men in New York City: the p18 cohort study. Int J Environ Res Public Health 11:8962-83. Duncan DT, Kawachi I, Subramanian SV, Alstadt J, Melly SJ, Williams DR, 2014b. Examination of how neighborhood definition influences measurements of youths’ access to tobacco retailers: a methodological note on spatial misclassification. Am J Epidemiol 179:373-81. Duncan DT, Regan SD, 2015. Mapping multi-day GPS data: a cartographic study in NYC. J Maps 12:668-70. Duncan DT, Regan SD, Shelley D, Day K, Ruff RR, Al-Bayan M, Elbel B, 2014c. Application of global positioning system methods for the study of obesity and hypertension risk among low-income housing residents in New York City: a spatial feasibility study. Geospat Health 9:57-70. Duncan DT, Shairifi M, Melly SJ, Marshall R, Sequist TD, Rifas-Shiman SL, Taveras EM, 2014d. Characteristics of walkable built environments and BMI z-scores in children: evidence from a large electronic health record database. Environ Health Persp 122:1359-65.

[Geospatial Health 2016; 11:399]

[page 171]


gh-2016_2.qxp_Hrev_master 31/05/16 11:45 Pagina 172

Article

Egan BM, Li J, Small J, Nietert PJ, Sinopoli A, 2014. The growing gap in hypertension control between insured and uninsured adults: National Health and Nutrition Examination Survey 1988 to 2010. Hypertension 64:997-1004. Fone DL, Farewell DM, Dunstan FD, 2006. An ecometric analysis of neighbourhood cohesion. Popul Health Metr 4:17. Frank LD, Andresen MA, Schmid TL, 2004. Obesity relationships with community design, physical activity, and time spent in cars. Am J Prev Med 27:87-96. Gauvin L, Richard L, Craig CL, Spivock M, Riva M, Forster M, Laforest S, Laberge S, Fournel MC, Gagnon H, Gagne S, Potvin L, 2005. From walkability to active living potential: an ecometric validation study. Am J Prev Med 28:126-33. Geeta A, Jamaiyah H, Safiza M, Khor G, Kee C, Ahmad A, Suzana S, Rahmah R, Faudzi A, 2009. Reliability, technical error of measurements and validity of instruments for nutritional status assessment of adults in Malaysia. Singapore Med J 50:1013. Guh DP, Zhang W, Bansback N, Amarsi Z, Birmingham CL, Anis AH, 2009. The incidence of co-morbidities related to obesity and overweight: a systematic review and meta-analysis. BMC Public Health 9:88. Harrison F, Burgoine T, Corder K, Van Sluijs EM, Jones A, 2014. How well do modelled routes to school record the environments children are exposed to? A cross-sectional comparison of GIS-modelled and GPS-measured routes to school. Int J Health Geogr 13:5. Hess PL, Reingold S, Jones J, Fellman MA, Knowles P, Ravenell JE, Kim S, Raju J, Ruger E, Clark S, 2007. Barbershops as hypertension detection, referral, and follow-up centers for black men. Hypertension 49:1040-6. Hurvitz PM, Moudon AV, 2012. Home versus nonhome neighborhood: quantifying differences in exposure to the built environment. Am J Prev Med 42:411-7. James P, Berrigan D, Hart JE, Hipp JA, Hoehner CM, Kerr J, Major JM, Oka M, Laden F, 2014. Effects of buffer size and shape on associations between the built environment and energy balance. Health Place 27:162-70. Jilcott SB, Wade S, McGuirt JT, Wu Q, Lazorick S, Moore JB, 2011. The association between the food environment and weight status among eastern North Carolina youth. Public Health Nutr 14:1610-7. Karlas T, Wiegand J, Berg T, 2013. Gastrointestinal complications of obesity: non-alcoholic fatty liver disease (NAFLD) and its sequelae. Best Pract Res Clin En 27:195-208. Kawachi I, Berkman L, 2003. Neighborhoods and health. Oxford University Press, New York, NY, USA. Keene DE, Padilla MB, 2010. Race, class and the stigma of place: moving to opportunity in Eastern Iowa. Health Place 16:1216-23. Keene DE, Padilla MB, 2014. Spatial stigma and health inequality. Critical Public Health 24:392-404. Kelaher M, Warr DJ, Feldman P, Tacticos T, 2010. Living in ‘Birdsville’: exploring the impact of neighbourhood stigma on health. Health Place 16:381-8. Kirschenman J, Neckerman KM, 1991. ‘We’d love to hire them, but...’: the meaning of race for employers. The Urban Underclass 203:20332. Klinker CD, Schipperijin J, Christian H, Kerr J, Ersboll AK, Troelsen J, 2014. Using accelerometers and global positioning system devices to assess gender and age differences in children’s school, transport, leisure and home based physical activity. Int J Behav Nutr Phys Act 11:8. Krieger N, Kosheleva A, Waterman PD, Chen JT, Beckfield J, Kiang MV, 2014. 50-year trends in US socioeconomic inequalities in health: [page 172]

US-born Black and White Americans, 1959-2008. Int J Epidemiol 43:1294-313. Leung CW, Laraia BA, Kelly M, Nickleach D, Adler NE, Kushi LH, Yen IH, 2011. The influence of neighborhood food stores on change in young girls’ body mass index. Am J Prev Med 41:43-51. Lovasi GS, Schwartz-Soicher O, Quinn JW, Berger DK, Neckerman KM, Jaslow R, Lee KK, Rundle A, 2013. Neighborhood safety and green space as predictors of obesity among preschool children from lowincome families in New York City. Prev Med 57:189-93. McCluskey A, Ada L, Dean CM, Vargas JM, 2012. Feasibility and validity of a wearable GPS device for measuring outings after stroke. ISRN Rehabil 2012:823180. McGurk P, Jackson JM, Elia M, 2013. Rapid and reliable self-screening for nutritional risk in hospital outpatients using an electronic system. Nutrition 29:693-6. McNutt LA, Wu C, Xue X, Hafner JP, 2003. Estimating the relative risk in cohort studies and clinical trials of common outcomes. Am J Epidemiol 157:940-3. Miller KW, Wilder LB, Stillmn FA, Becker DM, 1997. The feasibility of a street-intercept survey method in an African-American community. Am J Public Health 87:655-8. Mujahid MS, Diez-Roux AV, Cooper RC, Shea S, Williams DR, 2011. Neighborhood stressors and race/ethnic differences in hypertension prevalence (the Multi-Ethnic Study of Atherosclerosis). Am J Hypertens 24:187-93. Mujahid MS, Diez-Roux AV, Morenoff JD, Raghunathan T, 2007. Assessing the measurement properties of neighborhood scales: from psychometrics to ecometrics. Am J Epidemiol 165:858-67. Mujahid MS, Diez-Roux AV, Morenoff JD, Raghunathan TE, Cooper RS, Ni H, Shea S, 2008. Neighborhood characteristics and hypertension. Epidemiology 19:590-8. Nguyen NT, Magno CP, Lane KT, Hinojosa MW, Lane JS, 2008. Association of hypertension, diabetes, dyslipidemia, and metabolic syndrome with obesity: findings from the National Health and Nutrition Examination Survey, 1999 to 2004. J Am Coll Surg 207:928-34. Nwankwo T, Yoon S, Burt V, Gu Q, 2013. Hypertension among adults in the United States: National Health and Nutrition Examination Survey, 2011-2012. NCHS Data Brief 113:1-8. Ogden CL, Carroll MD, Kit BK, Flegal KM, 2014. Prevalence of childhood and adult obesity in the United States, 2011-2012. J Am Med Assoc 311:806-14. Ompad DC, Galea S, Marshall G, Fuller CM, Weiss L, Beard JR, Chan C, Edwards V, Vlahov D, 2008. Sampling and recruitment in multilevel studies among marginalized urban populations: the IMPACT studies. J Urban Health 85:268-80. Ostchega Y, Hughes JP, Wright JD, McDowell MA, Louis T, 2008. Are demographic characteristics, health care access and utilization, and comorbid conditions associated with hypertension among US adults? Am J Hypertens 21:159-65. Pham Do Q, Ommerborn MJ, Hickson DA, Taylor HA, Clark CR, 2014. Neighborhood safety and adipose tissue distribution in African Americans: the Jackson Heart Study. PLoS One 9:e105251. Prushansky T, Geller S, Avraham A, Furman C, Sela L, 2013. Angular and linear spinal parameters associated with relaxed and erect postures in healthy subjects. Physiother Theory Pract 29:249-57. Ravenell JE, Thompson H, Cole H, Plumhoff J, Cobb G, Afolabi L, Boutin-Foster C, Wells M, Scott M, Ogedegbe G, 2013. A novel community-based study to address disparities in hypertension and colorectal cancer: a study protocol for a randomized control trial. Trials 14:287.

[Geospatial Health 2016; 11:399]


gh-2016_2.qxp_Hrev_master 31/05/16 11:45 Pagina 173

Article

Reitzel LR, Regan SD, Nguyen N, Cromley EK, Strong LL, Wetter DW, McNeill LH, 2014. Density and proximity of fast food restaurants and body mass index among African Americans. Am J Public Health 104:110-6. Rossen LM, Schoendorf KC, 2012. Measuring health disparities: trends in racial-ethnic and socioeconomic disparities in obesity among 2to 18-year old youth in the United States, 2001-2010. Ann Epidemiol 22:698-704. Rundle A, Diez-Roux A, Freel L, Miller D, Necerkman K, Weiss C, 2007. The urban built environment and obesity in New York City: a multilevel analysis. Am J Health Promot 21:326-34. Rundle A, Neckerman KM, Freeman L, Lovasi GS, Purciel M, Quinn J, Richards C, Sircar N, Weiss C, 2009. Neighborhood food environment and walkability predict obesity in New York City. Environ Health Persp 117:442-7. Saelens BE, Sallis JF, Franks LD, Couch SC, Zhou C, Colburn T, Cain KL, Chapman J, Glanz K, 2012. Obesogenic neighborhood environments, child and parent obesity: the Neighborhood Impact on Kids Study. Am J Prev Med 42:e57-e64. Sampson R, Raudenbush S, 2004. Seeing disorder: neighborhood stigma and the social construction of broken windows. Soc Psychol Quart 67:319-42. Schmidt CO, Kohlmann T, 2008. When to use the odds ratio or the relative risk? Int J Public Health 53:165-7. Schwartz BS, Stewart WF, Godby S, Pollak J, Dewalle J, Larson S, Mercer DG, Glass TA, 2011. Body mass index and the built and social environments in children and adolescents using electronic health records. Am J Prev Med 41:e17-e28. Shulman GI, 2014. Ectopic fat in insulin resistance, dyslipidemia, and cardiometabolic disease. New Engl J Med 371:1131-41. Stark JH, Neckerman K, Lovasi GS, Konty K, Quinn J, Arno P, Viola D, Harris TG, Weiss CC, Bader MD, Rundle A, 2013. Neighbourhood food environments and body mass index among New York City adults. J Epidemiol Community H 67:736-42. Tabuchi T, Fukuhara H, Iso H, 2012. Geographically-based discrimination is a social determinant of mental health in a deprived or stigmatized area in Japan: a cross-sectional study. Soc Sci Med 75:1015-21. Thomas E, Collins A, McCarthy J, Fitzpatrick J, Durighel G, Goldstone A, Bell J, 2010. Estimation of abdominal fat compartments by bioelectrical impedance: the validity of the ViScan measurement system in comparison with MRI. Eur J Clin Nutr 64:525-33. Thompson L, Pearce J, Barnett J, 2007. Moralising geographies: stigma, smoking islands and responsible subjects. Area 39:508-17.

Thompson ML, Myers JE, Kriebel D, 1998. Prevalence odds ratio or prevalence ratio in the analysis of cross sectional data: what is to be done? Occup Environ Med 55:272-7. Troped PJ, Starnes HA, Puett RC, Tamua K, Cromley EK, James P, BenJoseph E, Melly SJ, Laden F, 2014. Relationships between the built environment and walking and weight status among older women in three U.S. States. J Aging Phys Activ 22:114-25. Troped PJ, Wilson JS, Matthews CE, Cromley EK, Melly SJ, 2010. The built environment and location-based physical activity. Am J Prev Med 38:429-38. Unger E, Diez-Roux AV, Lloyd DM, Mujahid MS, Nettleton JA, Bertoni A, Badon SE, Ning H, Allen NB, 2014. Association of neighborhood characteristics with cardiovascular health in the Multi-Ethnic Study of Atherosclerosis. Circ Cardiovasc Qual Outcomes 7:524-31. United States Census Bureau, 2010. American Community Survey. Available from: https://www.census.gov/programs-surveys/acs/ Victor RG, Ravenell JE, Freeman A, Leonard D, Bhat DG, Shafiq M, Knowles P, Storm JS, Adhikari E, Bibbins-Domingo K, 2011. Effectiveness of a barber-based intervention for improving hypertension control in black men: the BARBER-1 study: a cluster randomized trial. Arch Intern Med 171:342-50. WHO, 2013. A global brief on hypertension. Available from: http://apps.who.int/iris/bitstream/10665/79059/1/WHO_DCO_WHD _2013.2_eng.pdf Wiehe SE, Kwan MP, Wilson J, Fortenberry JD, 2013. Adolescent healthrisk behavior and community disorder. PLoS One 8:e77667. Wilson WJ, 1996. When work disappears: the new world of the urban poor. Alfred Knopf, New York, NY, USA. Yahia N, El-Ghazale H, Achkar A, Rizk S, 2011. Dieting practices and body image perception among Lebanese university students. Asia Pac J Clin Nutr 20:21. Yan K, Tracie B, Marie-Eve M, Melanie H, Jean-Luc B, Benoit T, St-Onge M, Marie L, 2014. Innovation through wearable sensors to collect real-life data among pediatric patients with cardiometabolic risk factors. Int J Pediatr 2014:328076. Yen IH, Leung CW, Lan M, Sarrafzadeh M, Kayekjian KC, Duru OK, 2015. A pilot study using global positioning systems (GPS) devices and surveys to ascertain older adultsâ&#x20AC;&#x2122; travel patterns. J Appl Gerontol 34:190-201. Zenk SN, Schulz AJ, Matthews SA, Odoms-Young A, Wilbur J, Wegrzyn L, Gibbs K, Braunschweig C, Stokes C, 2011. Activity space environment and dietary and physical activity behaviors: a pilot study. Health Place 17:1150-61.

[Geospatial Health 2016; 11:399]

[page 173]


gh-2016_2.qxp_Hrev_master 31/05/16 11:45 Pagina 174

Geospatial Health 2016; volume 11:429

Zoning the territory of the Republic of Kazakhstan as to the risk of rabies among various categories of animals Sarsenbay K. Abdrakhmanov,1 Akhmetzhan A. Sultanov,2 Kanatzhan K. Beisembayev,1 Fedor I. Korennoy,3 Dosym B. Kushubaev,1 Ablaikhan S. Kadyrov1 1S. Seifullin Kazakh Agro-Technical University, Astana; 2Kazakh Research Veterinary Institute, Almaty, Kazakhstan; 3Federal Center for Animal Health (FGBI ARRIAH), Vladimir, Russia Abstract

Introduction

This paper presents the zoning of the territory of the Republic of Kazakhstan with respect to the risk of rabies outbreaks in domestic and wild animals considering environmental and climatic conditions. The national database of rabies outbreaks in Kazakhstan in the period 2003-2014 has been accessed in order to find which zones are consistently most exposed to the risk of rabies in animals. The database contains information on the cases in demes of farm livestock, domestic animals and wild animals. To identify the areas with the highest risk of outbreaks, we applied the maximum entropy modelling method. Designated outbreaks were used as input presence data, while the bioclim set of ecological and climatic variables, together with some geographic factors, were used as explanatory variables. The model demonstrated a high predictive ability. The area under the curve for farm livestock was 0.782, for domestic animals -0.859 and for wild animals 0.809. Based on the model, the map of integral risk was designed by following four categories: negligible risk (disease-free or favourable zone), low risk (surveillance zone), medium risk (vaccination zone), and high risk (unfavourable zone). The map was produced to allow developing a set of preventive measures and is expected to contribute to a better distribution of supervisory efforts from the veterinary service of the country.

According to the World Health Organization, rabies ranks fifth among all infectious diseases based on the economic loss suffered from the disease. The disease, reported in 113 countries to date, is characterised by an acute course with overt signs of polyencephalomyelitis. Mortality is 100% in the absence of immediate treatment. More than 55,000 people and more than one million animals in the world die annually due to this infection. Direct damage caused by rabies costs society about 4 billion EUR per year (Nouvellet et al., 2013; Robardet et al., 2013). Rabies is one of especially dangerous zoonotic diseases with uneven spread of infection that is sharply delineated in the world. It is registered in every continent except Australia and Antarctica (Zavodskih and Sludov, 2007; Makarov et al., 2008; Smreczak et al., 2009, 2012; Orlowska et al., 2011; Youla et al., 2014). The same pattern of growth of rabies in the world has also been observed in the Republic of Kazakhstan (RK). Curently, the number of rabies cases recorded in animals (fox, raccoon dogs, wolves, cats and cattle) tends to rise by an average of 7% annually. About 700 heads of farm livestock (more than 50% of which refers to cattle and up to 25% to small ungulates) die of rabies every year in the (Abdrakhmanov et al., 2010). Thus, the rabies epizootic situation has become extremely difficult in most regions of the country. Natural foci of infection have sharply intensified with the number of cases among different species increasing, including human cases with fatal outcome (Bersagurov, 2002; Zholshorinov and Sansyzbayev, 2004). Current measures held in RK to control rabies mainly include: i) oral vaccination of wild animals within outbreaks and adjacent areas; ii) forced and preventive vaccination of susceptible livestock and domestic carnivores. The vaccination of the latter is needed as a necessary measure of urban control aiming to limit spread of the disease to humans. In addition, strict account and control of stray and domestic carnivores as well as awareness-raising activities among the human population are part of the main control activities enforced. Despite these ongoing efforts, it is still not possible to control the disease and prevent its spread. This fact is due to the factors mentioned, in particular the presence of natural foci of infection (Domsky, 2002; Chubirko et al., 2003; Dudnikov, 2003). The disease is constantly under scrupulous attention of the veterinary services in RK, which plans surveillance campaigns and decides which demes of livestock and wild animals to subject to mass vaccination. One of traditional activities undertaken by the veterinary service is zoning of the country in accordance with the presence of rabies outbreaks in the past and the probability of their occurrence in the future. Generally, four zones are distinguished: i) unfavourable zones, where outbreaks are presently recorded; ii) vaccination zones, where outbreaks have not been recorded for three years and where vaccination of susceptible livestock is being carried out; iii) surveillance zones,

Correspondence: Sarsenbay K. Đ?bdrakhmanov, S. Seifullin Kazakh AgroTechnical University, 62 av. Pobeda, 010011 Astana, Kazakhstan. Tel. +77.013.881467. E-mail: S_abdrakhmanov@mail.ru Key words: Kazakhstan; GIS; MaxEnt; Rabies; Zoning. Received for publication: 21 November 2015. Revision received: 19 January 2016. Accepted for publication: 20 January 2016. ŠCopyright S.K. Abdrakhmanov et al., 2016 Licensee PAGEPress, Italy Geospatial Health 2016; 11:429 doi:10.4081/gh.2016.429 This article is distributed under the terms of the Creative Commons Attribution Noncommercial License (CC BY-NC 4.0) which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

[page 174]

[Geospatial Health 2016; 11:429]


gh-2016_2.qxp_Hrev_master 31/05/16 11:45 Pagina 175

Article

which are areas directly adjacent to the vaccination zones; and iv) favourable zones, where outbreaks of the disease have not been recorded. Zoning is only carried out with regard to existing outbreaks and does not involve research. The purpose of zoning is to allocate veterinary service resources for surveillance and vaccination to prevent the development of epizootic rabies and its diffusion into zones designated as favourable. The objective of the present study was to institute modern analytical methods for zoning that consider not only the presence or absence of outbreaks on a given territory, but also the probability of their occurrence in the future based on the aggregation of ecological and geographical characteristics on the territory in question.

region of South Kazakhstan (Figure 1). Livestock farming, especially cattle breeding, is one of the priority sectors of RK’s economy. As of 1 January 2015, the number of (thousand heads) livestock cattle reached 6028.7, small cattle – 17,911.3 (sheep – 15,532.4, goats – 2378.9), pigs – 844.2, horses – 1936.7, camels – 165.9, birds – 35,000.7, domesticated Marals and Sika deer – 750.0.

Data Data on the rabies outbreaks in RK from 2003 to 2014 were provided by the veterinary services of administrative territories (region and rayon) during visits. The database includes 762 registered cases of rabies in animals (Table 1). To perform modelling, the animals were

Materials and Methods

Table 1. List of animals registered as infected by rabies in the 2003-2014 period.

Study area

Species

RK, covering an area of 2,724,902 km2, is the 9th largest country in the world and the 4th in Eurasia and has a population of 17 million. The country is administratively divided into 14 regions (oblasts), each of which subdivided into administrative districts (rayons). In total, there are 179 rayons with areas ranging from 7530 to 7820 km2 in the regions of North and South Kazakhstan, but can reach 38,910 km2 in Karaganda region, which is situated in the centre of the country. In terms of population, the rayons range from 5.0 to 5.8 persons per km2 in Akmola region, situated just north of Karaganda, to 23.7 persons per km2 in the

Camel Cat Cow Dog Fox Horse Sheep Wolf

Infected animals (n) 11 13 342 157 166 28 38 7

Figure 1. Administrative map of Kazakhstan.

[Geospatial Health 2016; 11:429]

[page 175]


gh-2016_2.qxp_Hrev_master 31/05/16 11:45 Pagina 176

Article

divided into three categories with cats and dogs counted as domestic animals (170 cases), wolves and foxes as wild animals (173), and horses, cows, sheep and camels as farm livestock (419). Figure 2 shows an administrative map of RK overlaid with cases of rabies in these three categories. The database contains the following records on each outbreak, which are relevant to the further modelling: geographic coordinates (latitude and longitude); date of outbreak; number and type of infected animals; name of the rural settlement, rayon and region where infection occurred. All data were converted into ESRI shape-file format for the purpose of cartographic representation. The following factors were used as explanatory variables. First, the BIO1-19 set of bioclimatic variables (bioclim), derived from the remotely sensed data on temperature and precipitation on the Earthâ&#x20AC;&#x2122;s surface (Appendix). Data are available on the worldclim.org website (WorldClim, 2015). We used the current data set for 1950-2000. Second, data on altitude above the mean sea level (ALT) in meters (WorldClim, 2015). Third, data on the maximum green vegetation fraction, reflecting the presence and intensity of vegetation cover. The average for 20012012 was used [United States Geological Survey (USGS, 2015)]. Forth, land cover data for 2001-2010 (USGS, 2015). Categories of land cover are specified in Appendix. All the variables were clipped by the contour of RK, resampled to a common spatial resolution of 1x1 km and converted into the ASCII format, which is required for modelling in MaxEnt software package.

Software Cartographic preparation and processing as well as visualisation of data were made using geographical information system (GIS) ArcGIS,

version 10.3.1 (ESRI, Redlands, CA, USA) (ESRI, 2015). The MaxEnt software package (Princeton University, Princeton, NJ, USA; http://www.cs.princeton.edu/~schapire/maxent/) was used for modelling based on the method of maximum entropy.

Methodology To identify the predominant trend of rabies outbreaks in animals in areas with a specific combination of natural and climatic conditions we used the modelling by the maximum entropy method (Phillips et al., 2006; Elith et al., 2011). The principle is geospatial regression establishing a relationship between precisely known locations of the phenomenon under study (presence data) and a set of the potential risk factors (tipically geographical, climatic, socio-economic and miscellaneous) in the territory under study. The essence of the maximum entropy method is to obtain a probability distribution that most closely describes the known pattern of the phenomenon under study, i.e. has the maximum information entropy. The advantage of this method is that it requires presence data only, which are easily available in many

Table 2. Risk categorisation and zoning. Probability value (%)

Risk level

Type of zone

<10 10-25 25-50 >50

Negligible Low Medium High

Disease-free (favourable) Surveillance Vaccination Unfavourable

Figure 2. Administrative map of Kazakhstan and rabies cases in three categories of animals in the period 2003-2014.

[page 176]

[Geospatial Health 2016; 11:429]


gh-2016_2.qxp_Hrev_master 31/05/16 11:45 Pagina 177

Article

A

B

C

Figure 3. The probability distribution of outbreaks in farm (A), domestic (B), and wild (C) animals.

[Geospatial Health 2016; 11:429]

[page 177]


gh-2016_2.qxp_Hrev_master 31/05/16 11:45 Pagina 178

Article

cases. This method is typically applied to simulate the habitat of certain species based on i) precisely known locations where specimens of the species can be found, and ii) a set of explaining environmental variables within the territory (Phillips et al., 2006; Illoldi-Rangel et al., 2012; Pedersen et al., 2014). However, some studies apply the maximum entropy method to simulate the suitability of an area to the emergence of a particular disease’s cases (Stevens and Pfeiffer, 2011; Mischler et al., 2012; Korennoy et al., 2014). In this case, the variables describing the socio-economic conditions in the study area can be used along with environmental variables. In our study, we also used the maximum entropy method to identify areas at risk of animal rabies outbreaks. Recorded locations of rabies cases were used as presence data. Modelling by MaxEnt was performed separately for each of the three categories of animals: domestic, wild animals and farm livestock. The modelling of each category was performed in 10 iterations using crossvalidation to obtain average values and confidence intervals. This means that in each iteration all input data were randomly split into equal number of folds and each fold in turn was excluded from modelling, an approach which allows using all available data for validation. To compensate for the possible bias of data caused by uneven diagnostics near populated areas, the road density grid for RK was used. This expresses the assumption that cases are more likely to be diagnosed in a close proximity to settlements and roads. To build the road density grid, we used data on highways, available in the ESRI global database of 2013 (ESRI, 2015). The density grid was built using the Kernel Density procedure from ArcGIS geoprocessing toolbox with the same spatial resolution (1x1 km). After obtaining continuous risk surfaces with MaxEnt, they were generalised by the boundaries of administrative rayons by calculating the average value of the risk through the territory of each rayon. Risk values were categorised according to the scale presented in Table 2. During the final stage, the three maps that represent risk to the farm livestock, domestic and wild animals were combined into an integral risk map. Thus, the largest of the three values in each category was considered as the integral value of the risk for each rayon.

The response curves are shown in Figure 5. For the farm livestock category (Figure 5A), the variables that contributed most to the model were BIO19, LANDCOV and BIO1, for the domestic animals category, they were LANDCOV, ALT, BIO12 and BIO19 (Figure 5B) and for the wild animals category, LANDCOV, BIO19, ALT and BIO12 (see Appendix for explanation of the abbreviated bioclim variables). Following the integration of risk values by the three animal categories, a final picture of regionalisation was obtained (Figure 6). This map shows the result of regionalisation (zoning) in RK between the four risk categories among the various species. The probability distribution of rabies outbreaks (Figure 3A) among farm species demonstrating more than 60% risk of cases was seen in West Kazakhstan, Aktobe, Kostanay, East Kazakhstan, Almaty, Zhambyl and partially Atyrau, Mangistau and South Kazakhstan regions, while

A

B

Results Distributions of the mean probabilities of outbreaks (the suitability surfaces) were obtained for each of the three categories of animal, converted into shape-files and presented in maps format (Figure 3A-C). Receiver operating characteristic (ROC) curves, which reflect the ability of the model to explain the data available, are presented in Figure 4A-C. Here and below the red line of the graph shows the mean values and the blue field shows boundaries of 95% confidence interval obtained by multiple model runs. The predictive accuracy of the model, expressed as area under the ROC curve, i.e. the area under the curve (AUC) value, is: 0.782±0.031 for the farm livestock category; 0.859±0.042 for the domestic animals and 0.809±0.045 for the wild animals. It is usually considered that AUC values around 0.5 have no predictive power, while AUC values >0.7 are acceptable and those above 0.8 sufficiently high to indicate a strong ability of the model to explain available data (Elith et al., 2011). Thus, the probability distributions obtained allowed us, with a high degree of reliability, to describe the distribution of existing rabies cases in RK as depending on the set of prevailing climatic and geographical factors in the different locations investigated. It was decided to emphasise those variables that contributed 10% and more to the modelling results and present them as the important ones. [page 178]

C

Figure 4. Receiver operating characteristic-curve for the category farm (A), domestic (B), and wild (C) animals.

[Geospatial Health 2016; 11:429]


gh-2016_2.qxp_Hrev_master 31/05/16 11:45 Pagina 179

Article

A

B

C

Figure 5. Response curves for the variables BIO 19, LANDCOV and BIO1 (left to right) for the model farm animals (A); variables LANDCOV, ALT, BIO12 and BIO19 (left to right) for the model domestic animals (B); and variables LANDCOV, BIO 19, ALT, and BIO12 (left to right) for the model wild animals (C). Red colour shows the mean values and the blue field shows boundaries of 95% confidence interval obtained by multiple model runs.

Figure 6. The integrated zoning map of the Republic of Kazakhstan in terms of animal rabies risk.

[Geospatial Health 2016; 11:429]

[page 179]


gh-2016_2.qxp_Hrev_master 31/05/16 11:45 Pagina 180

Article

the lowest probability (less than 11%) was mostly observed in areas such as North Kazakhstan, Akmola, Karaganda, Kyzylorda and the largest part of Mangistau, South Kazakhstan regions, partly Aktobe, Pavlodar, East Kazakhstan, Almaty and Zhambyl regions. With regard to the distribution of the probability of rabies occurrence in domestic animals (Figure 3B), it was noted that more than 58% of outbreaks of rabies accounted for almost half of the territory of Western Kazakhstan, one third of Kostanai regions and partly of the North Kazakhstan oblast, East Kazakhstan, South Kazakhstan, Zhambyl and Almaty regions. The lowest value (less than 9%) was observed in most parts of the country, namely, Mangistau, Kyzylorda, and Karaganda, much of the Akmola, South Kazakhstan, Zhambyl, partly Aktobe, Atyrau, Pavlodar, East Kazakhstan and Almaty regions. In the case of wild animals (Figure 3C), the situation was somewhat different, with the highest probability of rabies occurrence (over 67%) observed in the West Kazakhstan and Kostanai regions, partly Atyrau, South Kazakhstan, Zhambyl and to a lesser extent the East Kazakhstan and Almaty regions. Almost a similar picture emerges as to the minimum occurrence of rabies (less than 10%), both livestock and wild animals found throughout most of Mangistau, Kyzylorda, Karaganda regions, partly Atyrau, Aktobe, East Kazakhstan, Almaty, Zhambyl and South Kazakhstan regions, and slightly in Kostanai, North Kazakhstan, Akmola, Pavlodar regions.

Discussion Many authors (Bersagurov, 2002; Kiryakova et al., 2004; Abdrakhmanov et al., 2010) associate the probability distribution of rabies outbreaks in the country with the influence of combined climatic and socio-economic factors. Although we did not study the social and economic influence, we also identified climatic variables as being important influencing the spread of rabies among different categories of susceptible animals (Figure 5A-C). The most important ones were the prevailing type of land cover and the amount of preciptation in the coldest quarter that were both found for all three categories, while the average annual rainfall and altitude were also important. These findings together with the territorial distribution of risk (Figure 2) support an assumption of existence of natural diseaseâ&#x20AC;&#x2122;s foci and of strong interference between epizootic processes in these sub-populations. This interference is also supported by a previous correlation analysis conducted by (Abdrakhmanov et al., 2010). In this study, it was found that the correlation coefficient between the incidence of rabies in populations of wild and domestic animals over time amounts to 0.68. This suggests one infected fox may increase the incidence of rabies in domestic carnivores and as a consequence among productive animals by 2-3 heads. Appendix demonstrates the distribution of administrative rayons in RK by risk areas in accordance with Figure 6. Attention should be paid to the fact that the area of highest risk is concentrated mainly at the borders of RK with neighbouring countries, i.e. the Russian Federation, Uzbekistan, Kyrgyzstan and China. This may indicate the presence of rabies importation from the territory of these states. This epidemiologically significant fact requires further science-based mathematical analysis and joint research to improve the epizootic situation. Similar results were obtained by many researchers who study the nature of the rabies spread (Shestopalov et al., 2001; Yin et al., 2013). In addition, a high density of settlements and animals (domestic and farm livestock, wild animals) was registered in the border regions of RK, where most reported cases of rabies in animals occur. [page 180]

Further work on geospatial analysis of rabies epizootic in Kazakhstan may include the introduction of additional variables in the model. In particular, these variables may include distribution of populations of susceptible animals in the territory and the proximity to sources of infection in the territory of neighbouring states.

Conclusions Based on the results obtained, guidelines have been developed to arrange the preventive and anti-epizootic measures against rabies in the regions of RK. Depending on the risk zone, the region falls into measures ranging from system of rigid veterinary-sanitary actions (including vaccination) to monitoring and verification. Guidelines have been communicated to the national authorities and are expected to become a basis for a national programme for the prevention and control of rabies.

References Abdrakhmanov SK, Sytnic II, Tursunkulov SZh, 2010. Visualization and analysis of veterinary and geographical rabies spread by using GIS technologies. In: Proceedings of the 5th International Scientific Practical Conference, 2010 March 17-18, Barnaul. AGAU Publ., Barnaul, Russia, pp 283-6. Bersagurov KA, 2002. Epizootic and epidemiological situation of rabies in the West. Kazakhstan region and prevention measures. Official Bulletin of the State Sanitary and Epidemiological Service of the Republic of Kazakhstan 2002:24-30. Chubirko MI, Chervanev VA, Efanova LI, 2003. Epizootiology of rabies in Voronezh region. Veterinary and medical aspects of zooanthroponosis. Pokrov 2003:102-7. Domsky IA, 2002. Natural foci of rabies and its principal owners. Vet Pathol 1:119-22. Dudnikov SA, 2003. Foxes as a marker of risk in rabies: epizootological aspects. Veterinary and medical aspects of zooanthroponosis. Pokrov 2003:69-73. Elith J, Phillips SJ, Hastie T, Dudik M, Chee YE, Yates CJ, 2011. A statistical explanation of MaxEnt for ecologists. Diversity and Distributions 17:43-57. ESRI, 2015. GIS mapping software, solutions, map series, apps and data. Available from: http://www.esri.com/ Illoldi-Rangel P, Rivaldi C-L, Sissel B, Fryxell RT, Gordillo-Perez G, Rodriguez-Moreno A, Williamson P, Montiel-Parra G, SanchezCordero V, Sarkar S, 2012. Species distribution models and ecological suitability analysis for potential tick vectors of lyme disease in Mexico. J Trop Med 2012:959101. Kiryakova LS, Khaytova AB, Kovalenko IS, 2004. The use of geographical information technologies in the epidemiological diagnosis of highly dangerous infections. Probl Dang Inf 87:24-7. Korennoy FI, Gulenkin VM, Malone JB, Mores CN, Dudnikov SA, Stevenson MA, 2014. Spatio-temporal modeling of the African swine fever epidemic in the Russian Federation, 2007-2012. Spatial and Spatiotemporal Epidemiol 11:135-41. Makarov VV, Sukhareva OI, Gulyukin AM, Litvinov OB, 2008. The trend of the rabies spread in Eastern Europe. Vet Med 7:20-2. Mischler P, Kearney M, McCarroll JC, Scholte RGC, Vounatsou P, Malone JB, 2012. Environmental and socio-economic risk model-

[Geospatial Health 2016; 11:429]


gh-2016_2.qxp_Hrev_master 31/05/16 11:45 Pagina 181

Article

ling for Chagas disease in Bolivia. Geospat Health 6:59-66. Nouvellet P, Donnelly CA, De Nardi M, Rhodes CJ, De Benedictis P, Citterio C, Obber F, Lorenzetto M, Pozza MD, Cauchemez S, Cattoli G, 2013. Rabies and canine distemper virus epidemics in the red fox population of northern Italy (2006-2010). PLoS One 8:e61588. Orłowska A, Smreczak M, Trębas P, Żmudziński JF, 2011. Rabies outbreak in Małopolska region in Poland in 2010. Bull Vet Inst Pulawy 55:555-61. Pedersen UB, Midzi N, Mduluza T, Soko W, Stensgaard A-S, Vennervald BJ, Mukaratirwa S, Kristensen TK, 2014. Modeling spatial distribution of snails transmitting parasitic worms with importance to human and animal health and analysis of distributional changes in relation to climate. Geospat Health 8:335-43. Phillips SJ, Anderson RP, Schapire RE, 2006. Maximum entropy modeling of species geographic distributions. Ecol Model 190:231-59. Robardet E, Ilieva D, Iliev E, Gagnev E, Picard-Meyer E, Cliquet F, 2013. Epidemiology and molecular diversity of rabies viruses in Bulgaria. Epidemiol Infect 5:1-7. Shestopalov AM, Kissurina MI, Gruzdev KN, 2001. Rabies and its distribution in the world. Probl Virol 2:7-12. Smreczak M, Orłowska A, Trębas P, Żmudziński JF, 2012. Rabies epidemiological situation in Poland in 2009 and 2010. Bull Vet Inst

Pulawy 56:115-266. Smreczak M, Orłowska A, Żmudziński JF, 2009. Rabies situation in Poland in 2008. Bull Vet Inst Pulawy 53:583-7. Stevens KB, Pfeiffer DU, 2011. Spatial modeling of disease using dataand knowledge-driven approaches. Spatial Spatiotemporal Epidemiol 2:125-33. USGS, 2015. United States Geological Survey Land Cover Institute. Available from: http://landcover.usgs.gov/ WorldClim, 2015. Global climate data. Available from: http://worldclim.org/ Yin W, Dong J, Tu C, Edwards J, Guo F, Zhou H, Yu H, Vong S, 2013. Challenges and needs for China to eliminate rabies. Infect Dis Poverty 2:23. Youla AS, Traore FA, Sako FB, Feda RM, Emeric MA, 2014. Canine and human rabies in Conakry: epidemiology and preventive aspects. Bull Soc Pathol Exot 7:19-21. Zavodskih AI, Sludov AI, 2007. The behaviour of raccoon dogs infected with the rabies. Vet Med 2:15-6. Zholshorinov AZ, Sansyzbayev YB, 2004. Surveillance of rabies in wildtype centres of domination: the methodical recommendations. Astana 2004:16.

[Geospatial Health 2016; 11:429]

[page 181]


gh-2016_2.qxp_Hrev_master 31/05/16 11:45 Pagina 182

Geospatial Health 2016; volume 11:420

Spatial analysis of the regional variation of hypertensive disease mortality and its socio-economic correlates in South Korea Seong-Yong Park,1 Jin-Mi Kwak,2 Eun-Won Seo,2 Kwang-Soo Lee2 1Korea Health Promotion Foundation, Seoul; 2Department of Health Administration, College of Health Sciences, Yonsei University, Seoul, South Korea

Abstract This paper presents a cross-sectional study based on the cause of death statistics in 2011 extracted from all 229 local governments in South Korea. The standardised hypertensive disease mortality rate (SHDMR) was defined by age- and sex-adjusted mortality by hypertensive diseases distinguished by International Classification of Disease10 (ICD-10). Variables taken into account were the number of doctors per 100,000 persons, the proportion with higher education (including university students and high school graduates), the number of recipients of basic livelihood support per 100,000 persons, the annual national health insurance premium per capita and the proportion of persons classified as high-risk drinkers. Ordinary least square (OLS) regression and geographically weighted regression (GWR) were

Correspondence: Kwang-Soo Lee, Department of Health Administration, College of Health Sciences, Yonsei University, 1 Yonseidaegil, Wonju, Gwangwondo 220-710, South Korea. Tel: +82.33.760.2426 - Fax: +82.33.760.2519. E-mail: planters@yonsei.ac.kr Key words: Hypertensive disease; Geographic Information System; Spatial analysis; Regional variation; Mortality rate. Acknowledgments: we would like to thank the Department of Population Trend in the Division of Social Statistics, Korea for providing data and information. Contributions: SYP, collecting data, analysing and drafting the manuscript; JMK, modifying draft and developing the final version of the manuscript; EWS, reviewing manuscript and modifying form; KSL, participating in the coordination of the entire study, interpreting the result and drafting the manuscript. All authors read and approved the final submitted version. Conflict of interest: the authors declare no potential conflict of interest. Received for publication: 7 October 2015. Revision received: 29 January 2016. Accepted for publication: 1 February 2016. ŠCopyright S-Y. Park et al., 2016 Licensee PAGEPress, Italy Geospatial Health 2016; 11:420 doi:10.4081/gh.2016.420 This article is distributed under the terms of the Creative Commons Attribution Noncommercial License (CC BY-NC 4.0) which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

[page 182]

applied to identify the potential associations. The statistical analysis was conducted with SAS ver. 9.3, while ArcGIS ver. 10.0 was utilised for the spatial analysis. The OLS results showed that the number of basic livelihood recipients per 100,000 persons had a significant positive association with the SHDMR, and the proportion with higher education had a significant negative one. GWR coefficients varied depending on region investigated and some regional variables had various directions. GWR showed higher adjusted R2 than that of OLS. It was found that the SHDMR was affected by socio-economic status, but as the effects observed were not consistent in all regions of the country, the development of health policies will need to consider the potential for regional variation.

Introduction South Korea is starting to see a shift from acute diseases to chronic ones (Ministry of Health and Welfare, 2010). Hypertension, a typical chronic disease, is not accompanied by severe pain but requires careful management since it can result in complications, such as stroke, renal failure, heart failure and cardiovascular disease if neglected (Kim et al., 2005). Studies of community healthcare projects managing hypertensive disease (and the factors affecting its level) have focused on personal health and a wide range of behavioural characteristics, such as age, family and personal history of hypertension, alcohol intake, smoking, body mass index (BMI), blood levels of cholesterol and neutral fat, salt intake, physical activity, mental stress and obesity. These studies analysed the relation between these factors and hypertension (Nicholson et al., 1983; Kam et al., 1991; Moreira et al., 1998; Wannamethee et al., 1998; He et al., 2000; Masahiko et al., 2000; Oh et al., 2000; Ishikawa et al., 2002; Player et al., 2007; Eom et al., 2008; Kim et al., 2009; Eom, 2009; Jang et al., 2009; Wen et al., 2010; Matyas et al., 2011; Centers for Disease Control and Prevention, 2012; Odden et al., 2012) with the majority of them conducted at the individual level. Hypertension is known as the principal cause of death in the area of cardiovascular diseases (Jacqmin-Gadda et al., 2013). Researches on hypertension have recently also paid attention to socio-demographic characteristics at the community level. Schlundt et al. (2006) analysed the differences with respect to hypertension between regions by using the geographic clustering method. Lawes et al. (2008) contend that the financial burden of hypertension varies depending on the income level and age distribution in each country, while Egan et al. (2002) also suggest the need to consider given demographic characteristics and regional diversity in investigating the burden of this disease. Kim (2010) has analysed the socio-economic conditions, lifestyle, community health status, physical environment, and healthcare system, as well as the demographic characteristics of the community to identify the factors having an impact on the management of hypertension.

[Geospatial Health 2016; 11:420]


gh-2016_2.qxp_Hrev_master 31/05/16 11:45 Pagina 183

Article

Most previous studies on hypertensive disease correlates have been focused at the individual level. Research on hypertensive disease needs to consider the inter-regional variation and influential factors, as well as the inter-regional differences, taking into account the demographic and socio-economic characteristics of each community when pursuing effective disease control (Macintyre and Ellaway, 2000). Our study therefore aimed to examine the relations among the variables representing demographic, socio-economic, and health and behavioural characteristics of a community and the mortality caused by hypertension. We analysed the relationship between the standardised hypertensive disease mortality rate (SHDMR) and regional factors sing traditional ordinary least square (OLS) regression and geographically weighted regression (GWR).

sion prevalence, which is relatively higher in the economically unprivileged bracket in agricultural regions. Lawes et al. (2008) contend that lower-income countries have a greater burden of hypertensive disease than developed ones. Health insurance The annual per capita health insurance premium is known to be a variable closely associated with the income level and it has therefore been widely used as a proxy for income. Lee et al. (1989) report that the insurance premium level is positively related to healthcare use and that the lower-income brackets have a relatively greater need to settle the issue of unmet healthcare. Kim et al. (2009) find that people in lowerincome brackets are significantly more likely to be affected by hypertension than those in higher-income brackets. Health-related behavioural issues

Materials and Methods Study area South Korea is located in north-eastern Asia. The eastern areas of Korean Peninsula are mountainous, while the Southwest is characterised by plains. The capital, Seoul, is located in the north-western part of the country. The population is about 50 million with over 20 million residing in Seoul and surrounding provinces. In this study, all local governments in South Korea as of 2011 were included. The different levels investigated were the Gun, which corresponds to a county-level administrative region and usually consists of rural areas; the city, which denotes large and smaller towns; and the Gu, which is an urban district (a large city can have multiple Gus). There were 251 local governments in 2011. However, if data for a local government, e.g. a Gu of a city, were not available, these data were incorporated into the City this Gu is part of. For this reason, the final count of local governments analysed amounted to 229.

Study variables On the basis of a literature review, we selected five variables representing different regional characteristics and analysed the relationship of these variables with the mortality rate caused by hypertension of various kinds. Medicine and healthcare Chun et al. (2008) report that medical institutions, doctorsâ&#x20AC;&#x2122; ability and the trust in doctors have an impact on blood pressure control among hypertension patients. Kim (2010) notes that regions at the lower end of healthcare supply are more likely to be exposed to progressing hypertensive disease and this author empathises the importance of broadening medical resources to settle this issue.

Kam et al. (1991) include history of alcohol intake, obesity status and smoking habits as hypertension risk factors, while Moreira et al. (1998) note that alcohol intake might affect blood pressure but that the volume of alcohol intake has a greater impact on the blood pressure level than the frequency of intake.

Mortality rates caused by hypertensive disease The dependent study variable used in this study was the mortality rate due to hypertensive disease per 100,000 persons with rates standardised in terms of age and sex. The plural, hypertensive diseases, refers to the various organ-specific complications caused by hypertension and includes primarily heart failure, renal failure, stroke and cardiovascular disease. The 2011 statistical data about causes of death, provided by the Department of Population Trends in the Division of Social Statistics Statistics Korea (http://www.kosis.kr), were used to examine the mortality of hypertensive diseases coded in the major classes I10-I13 according to the International Classification of Disease-10 (ICD-10). In this study, the hypertensive diseases were defined with following subcodes: benign hypertension (I100); malignant hypertension (I101); unspecified hypertension (I109); hypertensive heart diseases accompanied by (congestive) heart failure (I111); hypertensive heart diseases not accompanied by (congestive) heart failure (I119); hypertensive renal diseases accompanied by renal failure (I120); hypertensive renal diseases not accompanied by renal failure (I129); hypertensive heart and renal diseases accompanied by (congestive) heart failure (I130); hypertensive heart and renal diseases accompanied by heart failure (I131); hypertensive heart and renal diseases accompanied by (congestive) heart and renal failure (I132); unspecified hypertensive heart and renal diseases (I139). The standardised hypertensive disease mortality rate (SHDMR) is estimated by the following equation:

Education and socio-demography Many existing studies indicate that the educational factor affect hypertension (Kim et al., 2005; Campbell et al., 2006; Kim et al., 2009; Jang et al., 2009). Demaio et al. (2013) feel that individuals with lower levels of knowledge are more likely to develop hypertension, while Eom et al. (2008) find that high school graduates are less likely to be affected by hypertension. General economy Kim et al. (2005) report that financial status might affect hyperten-

where i is the age group (0-4, 5-9,....95-99, â&#x2030;Ľ100; Oij the age group, observation of hypertensive disease mortality of j sex; nij the age group, population of j sex; and Nij the age group of the standard population, population of j sex.

[Geospatial Health 2016; 11:420]

[page 183]


gh-2016_2.qxp_Hrev_master 31/05/16 11:45 Pagina 184

Article

tional OLS equation, the dependent variable was modeled as a function of a set of independent variable and the equation as used by Brunsdon et al. (1996) is then:

Regional variables Medicine and healthcare The number of doctors was identified in the 2011 statistical data concerning healthcare use for each region, which are supplied by the National Health Insurance Service (NHIS; http://www.nhis.or.kr). As dentists are less likely to be associated with hypertension they were excluded and only medical doctors (specialists, residents, general practitioners, and interns) and doctors of traditional Korean medicine (specialists, special interns, general interns, and general practitioners) included. Conversion into the number of doctors per 100,000 persons was made to control the effects of the population size. Education and socio-demography The proportion of the population with higher education (including university students and high school graduates) was used as a sociodemographic variable. Of the census data from Statistics Korea (http://www.kosis.kr); here the 2010 data were analysed since the census was conducted at five-year intervals. General economy The economic characteristics were measured on the basis of the number of recipients of basic livelihood security support per 100,000 persons. The basic livelihood security programme aims to assist people with no economic ability to lead life at least at a basic level of living; this can be an index representing the economic status of the region. In general, the region with a larger number of basic livelihood security recipients is likely to have a relatively lower economic level than other regions (Kim, 2010; Seok and Kang, 2013; Park and Lee, 2014). The number of basic livelihood security recipients was identified in the annual statistical report from each municipal, provincial, and metropolitan autonomous government in 2011 and was converted into the number of livelihood security recipients per 100,000 persons. Health insurance The insurance premium index was drawn from the 2011 statistical data concerning healthcare use for each region, which were supplied by NHIS (http://www.nhis.or.kr). An annual per capita insurance premium per region was obtained by dividing the sum of the regional National Health Insurance premiums by the number of people in each region. Health-related behavioural issues High-risk alcohol intake rate was used as a proxy variable for health and behavioural characteristics and was based on data from the 2011 community health research provided by the Department of Chronic Disease Management, Disease Control and Prevention Center, Ministry of Health and Welfare (https://www.chs.cdc.go.kr). The rates used in this study refer to the percentage of men drinking ≥7 glasses of strong alcohol a month or that of women drinking ≥5 glasses of alcohol a month as reported by Kim (2010).

where yi is ith observation of the dependent variable; xik the ith observation of the kth independent variable; εi the error term with zero means; and ak determined from a sample of n observations. GWR is a technique based on the traditional regression model and the coefficients in the GWR model are specific to each location. The GWR equation is, following Brunsdon et al. (1996):

where aik is the value of the kth parameter at location i. Contrary to OLS that infers a global regression coefficient for all the analysis targets (Fotheringham et al., 1996, 2002), the GWR spatial statistic analysis method suggested by Brunsdon et al. (1996) can infer local regression coefficients for each region. In other words, GWR can easily determine association between neighbouring regions in terms of variables and analyse the differences with respect to the influence of regional characteristics for each region. It can also combine spatial data and attribute data that contain non-spatial characteristics. It was therefore expected that spatial homogeneity and heterogeneity between regional units, which has been less researched in the field of public health, could be explored in this way. To confirm that GWR had greater explanatory power for a model than OLS, the Akaike Information Criterion (AIC) for each of the two statistical approaches OLS and GWR was compared. AIC shows goodness-offit for a model (Hurvich et al., 1998; Charlton and Fotheringham, 2009) and a difference in AIC between OLS and GWR would tell which statistical approach would be preferable. The degree of similarity of a variable with reference to its spatial location based on location and value can be evaluated by spatial autocorrelation appraisal (Huo et al., 2012). Global Moran’s I analysis was used to assess the global spatial autocorrelation of the model. It assesses whether the patterns were clustered, dispersed, or random where a value close to +1 means a potent spatial autocorrelation and indicates thus that the values of attributes have strong similarities (Anselin, 1995). The variables in this study were statistically processed using SAS ver. 9.3 (SAS Institute, Cary, NC, USA), while the spatial analysis tool ArcGIS ver. 10.0 (ESRI, Redlands, CA, USA) was used for mapping and spatial analysis.

Analytic methods

Results

Pearson’s correlation coefficients were calculated from correlation analysis. This technique analyses the relationship between two variables and measures the strength of association between them. Most studies on hypertension employ multivariate regression analysis based on OLS for individuals. This study employed GWR, a type of spatial analysis that can reflect spatial relations and the heterogeneity of regions (Fotheringham et al., 2002; Helbich et al., 2013). In the tradi-

Table 1 represents descriptive statistics showing the general characteristics of the regional factors. It should be noted that the average of SHDMR was 14.29, ranging from 1.75 to 52.68. The standard deviation (SD) of annual per capita insurance premiums was larger than the mean. For the reliable comparison of hypertensive disease mortality between regions, hypertensive disease mortality rate for age- and sex-

[page 184]

[Geospatial Health 2016; 11:420]


gh-2016_2.qxp_Hrev_master 31/05/16 11:45 Pagina 185

Article

groups in each region was calculated and applied to the corresponding age- and sex-groups of the entire Korean population. All values for each age and sex group were summed to produce the adjusted rate. Figure 1 shows the distribution of SHDMR in each region. Yeongyang Gun, North Gyeongsang Province (52.68 persons) had the highest mortality rate due to hypertensive diseases followed by Euiseong Gun, North Gyeongsang Province (49.70) and Cheongsong Gun, North Gyeongsang Province (47.75). Seocho Gu, Seoul (1.76) had the lowest mortality rate followed by Hanam City, Gyeonggi Province (2.51) and Jinju City, South Gyeongsang Province (2.55). Table 2 represents the results of the correlation analysis among the regional characteristics. Since no variable had a correlation coefficient exceeding 0.8, no serious problem with multicollinearity should be expected (Chan, 2003). Of the variables with significant correlation coefficient, there was a statistically significant negative correlation between the number of basic livelihood security recipients per 100,000 persons and the proportion of people with higher education (coefficient=-0.696); the same was true between the number of basic livelihood security recipients per 100,000 persons and annual per capita insurance premiums (coefficient=-0.211). There was a statistically significant positive correlation between the number of doctors per 100,000 persons and the proportion of people with higher education (coefficient=0.252); between the number of doctors per 100,000 persons and annual per capita insurance premiums (coefficient=0.350); and between the proportion of people with higher education and annual per capita insurance premiums (coefficient=0.276). Table 3 presents the results from OLS regression and GWR. The OLS analysis found significance in the proportion of people with higher education and in the number of basic livelihood security recipients per 100,000 persons at the 95% significance level. The proportion of people with higher education was negatively related to the mortality rate for hypertension. The number of basic livelihood security recipients per 100,000 persons was positively associated with mortality. Since no variable had a variance inflation factor (VIF) exceeding 10, the problem of multicollinearity was not expected.

The explanatory power of GWR (48.5%) was slightly greater than that offered by OLS regression (46.6%). AIC was estimated to be 1559 by OLS regression and 1554 by GWR. Since the difference with respect to goodness-of-fit between the two models exceeded 4, it was concluded that GWR is the best approach (Lee and Choi, 2013). In addition, GWR

Figure 1. Age- and sex-adjusted hypertensive disease mortality rate per 100,000 persons.

Table 1. General characteristics of study variables. Variable SHDMR° Doctors (n)° Educated people (%) Basic livelihood recipients (n)° Annual NHI premium per capita High-risk drinkers (%)

Average value

Standard deviation

Minimum value

Maximum value

14.27 194.47 53.88 4047.19 626.89 18.24

9.83 179.72 13.16 2005.05 965.66 3.78

1.75 28.99 26.04 667 138 5.50

52.68 1732.38 79.32 11,608 11,055 29.00

SHDMR, standardised hypertensive disease mortality rate; NHI, National Health Insurance. °Per 100,000 persons.

Table 2. Results of Pearson’s correlation analysis. Variable Doctors (n)° Educated people (%) Basic livelihood recipients (n)° Annual NHI premium per capita High-risk drinkers (%)

Doctors (n)° 1.000 0.252* 0.088 0.350* -0.021

Educated people (%)

Basic livelihood recipients (n)°

Annual NHI premium per capita

High-risk drinkers (%)

1.000 -0.696* 0.276* 0.029

1.000 -0.211* -0.056

1.000 -0.056

1.000

NHI, National Health Insurance. °Per 100,000 persons. *P<0.05.

[Geospatial Health 2016; 11:420]

[page 185]


gh-2016_2.qxp_Hrev_master 31/05/16 11:45 Pagina 186

Article

had lower levels of Sigma (σ) and residual squares than OLS. Koenker statistics (Nkeki and Osirike, 2013) had significant P-values, which implies that there was no fixed type of relationship between the dependent variables and the regional characteristics and that they varied by region. Figure 2 shows the distribution of regression coefficient and local R2 for each region as estimated by GWR. Figure 2A to 2E shows regional variation in regression coefficients. The correlation among the numbers of doctors per 100,000 persons, annual per capita insurance premiums, and the mortality rates by hypertensive diseases by region varied, i.e. it was sometimes positive and sometimes negative. Figure 2F shows that the coefficient of determination (local R2) ranges from 0.3650 to 0.5439 by region, and that the level of explanatory power of a GWR model varies by region. Usually, local R2 decreased toward the Southeast and increased toward the northern regions.

Discussion In this study, the relations between the regional characteristics and the SHDMR in 100,000 persons were analysed in 229 basic local governments of different levels as of 2011. The variations in gender and age among the regions were controlled to reflect the differences in the population structure in each region, whereas OLS regression and spatial analysis were used for the analysis. Previous studies on the relationship between regional characteristics and disease occurrences have employed multivariate analysis based on OLS regression. However, this approach has a limitation in reflecting the variation on the effect of regional characteristics by location; that is, the OLS analysis assumes identical relational characteristics regardless of the characteristics of the regions because it estimates one regression coefficient for all the regions together. GWR analysis, on the other hand, presumes an analysis model by using a geographically weighted function based on the assumption that a region has more similar characteristics to neighbouring regions than to distant ones. On the basis of these results, regression coefficient and local R2 were estimated for each region (Brunsdon et al., 1996); that is, a GWR model would be expected to be useful for the reflection

of geographical characteristics in the field of healthcare that explores social phenomena that vary with regard to type and number of diseases by region. GWR can thus be used to make a comparison between dependent and explanatory variables and in that way find out and classify regions with differences. In this study, GWR was successfully employed to confirm the variation in the size and direction of the relations between the regional characteristics and the mortality rate from hypertensive diseases for each region. Compared with OLS regression, GWR had a smaller residual variate and a greater explanatory power in general (Griffith, 2008). Lastly, the results of the analysis can be presented in the form of maps to allow policy-makers to understand them with ease. The variable representing the economic level of the region, i.e. the number of basic livelihood security recipients in the population of 100,000 persons was positively related with death caused by hypertensive diseases. It was expected that the lower the economic level of the region, the higher the likelihood of having problems caused by hypertensive diseases. This result was consistent with the finding that a lower-income region can be expected to have a heavier burden due to hypertensive diseases (Lawes et al., 2008) and that the social and economic burden of hypertension lower the quality of life (Eom, 2009). Since hypertensive diseases do not generally cause pain or awareness of the condition, it is very likely to be neglected and therefore result in premature death. The proportion of higher education levels was negatively related with SHDMR. In previous research, the percentage of proportion was often used as an index to reflect the socio-demographic characteristics. The group with a lower proportion of proportion was more likely to have little knowledge about hypertension or to believe that the condition of hypertension cannot be cured by medication. They also tended to use folk remedies that can give short-term relief rather than be the proper treatment for hypertension (Kim et al., 2003). The results of the GWR analysis make important suggestions for policy-makers in the field of healthcare aim at reducing the mortality of hypertensive diseases. However, implementing a uniform policy with no regional characteristic taken into account cannot be less effective in avoiding death caused by hypertensive diseases. The result that the relations between death caused by this condition vary by the regional unit suggests the need to differentiate policy compositions and pro-

Table 3. Results of ordinary least square and geographically weighted regression. Variable Doctors (n)° Educated people (%) Basic livelihood recipients (n)° Annual NHI premium per capita High-risk drinkers (%) Adjusted R2 AIC σ Residual squares Moran’s I Koenker statistics

OLS Standardised regression coefficient

VIF

Mean (SD)

-0.0012 -0.3714* 0.3823* 0.0081 0.0497 46.63 1559 7.18 11,508 0.03* 35.0*

1.40 2.40 2.33 1.23 1.01 -

0.0008 (0.0006) -29.06 (5.37) 0.0017 (0.0002) 0.00002 (0.00005) 0.1147 (0.0339) 48.49 1554 7.05 10,876 0.02 -

GWR Minimum value Maximum value -0.0008 -40.16 0.0011 -0.00007 0.0293 -

0.0019 -19.01 0.0023 0.00009 0.1773

OLS, ordinary least square; GWR, geographically weighted regression; VIF, variance inflation factor; SD, standard deviation; NHI, National Health Insurance; AIC, Akaike Information Criterion. °Per 100,000 persons. *P<0.05.

[page 186]

[Geospatial Health 2016; 11:420]


gh-2016_2.qxp_Hrev_master 31/05/16 11:45 Pagina 187

Article

A

B

C

D

E

F

Figure 2. Coefficients and local R2 by region by geographically weighted regression. A) Number of doctors per 100,000 persons; B) proportion of educated people; C) number of basic livelihood recipients per 100,000 persons; D) annual National Health Insurance premium per capita; E) proportion of high-risk drinkers; F) local R2.

[Geospatial Health 2016; 11:420]

[page 187]


gh-2016_2.qxp_Hrev_master 31/05/16 11:45 Pagina 188

Article

grammes to meet the regional situations. The results of the analysis presented here imply the need to give priority to regions with a higher proportion of the group with a lower living standard in implementing a health policy related to hypertension. The number of doctors available, annual per capita insurance premiums and high-risk alcohol intake rate were not statistically significant in the OLS analysis but were either positively or negatively related by region in the GWR analysis. It is necessary to make an in-depth analysis of the reasons for this result. In contrast to existing studies, this investigation employed spatial analysis in examining the regional characteristics related to death caused by hypertensive diseases. It is significant in that it made an approach to the problems at the regional level, while the previous research was mostly conducted at the individual level. It is therefore expected that this study would contribute to the development of measures against the problems associated with hypertension at the regional level. This study has some limitations, however. First, we used only a limited number of variables that represent regional characteristics. A number of known factors that affect hypertension, including the availability of medical institutions, the regional deprivation index, smoking, obesity and stress, were not used due to the possibility of multicollinearity or the possibility of statistical errors in the process of developing a GWR model. It is necessary to add and analyse variables that can overcome these limitations and still accurately reflect the regional characteristics. Second, analysis of administrative districts smaller than City, Gun and Gu, i.e. Eup, Myeon and Dong, would be expected to give the most detailed results. In reality, however, the variables that represent the regional characteristics are collected in the units of City, Gun, Gu; therefore, it is impossible to make a better analysis using these units as they are more detailed. However, it is expected that more in-depth regional research could be conducted if data in the unit of smaller regions were available. Third, it is necessary to avoid the ecological fallacy. This study explored the relations with the regional characteristics in the analysis unit of the population of basic local governments, instead of individuals, and has therefore a limitation in interpreting personal characteristics. Researchers are expected to contribute to making more effective policies in coping with problems caused by hypertensive diseases if they were to secure data at the individual level and compare the results with those at the group level.

Conclusions This study has shown that the proportion of educated people and the number of basic livelihood security recipients per 100,000 persons are associated with a mortality rate due to hypertensive disease at a statistically significant level, and that there is a regional variation of these associations. The findings presented here emphasise the need to take regional differences into account if the burden of hypertensive diseases should be effectively reduced. It would be desirable to examine the regional situation with respect to hypertension thoroughly before developing and implementing policies and management programmes so that they reflect the regional characteristics. As a way to solve this problem, the regional characteristics should be analysed in depth and the findings used to develop approaches that can be adjusted to fit each region by considering the local conditions. Further research into the inclusion of variables that can reflect the regional characteristics in more detail would complement the regional characteristics used in this study and lead to a better understanding of how to approach the situation. It is expected that such information [page 188]

would support the development of policies that suit each region according to their needs, which would result in a more effective management of the hypertensive diseases in South Korea.

References Anselin L, 1995. Local indicators of spatial association-LISA. Geogr Anal 27:93-155. Brunsdon C, Fotheringham AS, Charlton ME, 1996. Geographically weighted regression: a method for exploring spatial nonstationarity. Geogr Anal 28:281-98. Campbell NR, Petrella R, Kaczorowski J, 2006. Public education on hypertension: a new initiative to improve the prevention, treatment and control of hypertension in Canada. Can J Cardiol 22:599603. Centers for Disease Control and Prevention, 2012. CDC grand rounds: dietary sodium reduction - time for choice. MMWR Morb Mortal Wkly Rep 61:89-91. Chan YH, 2003. Biostatistics 104: correlational analysis. Singapore Med J 44:614-9. Charlton M, Fotheringham AS, 2009. Geographically weighted regression: white paper. Available from: www.geos.ed.ac.uk/~gisteac/ fspat/gwr/gwr_arcgis/GWR_WhitePaper.pdf Chun SA, Na BJ, Kim CW, Lee MS, 2008. The effect of re-building of public health facilities on the hypertension control in the rural area. J Agric Med Community Health 33:37-45. Demaio AR, Otgontuya D, de Courten M, Bygbjerg IC, Enkhtuya P, Meyrowitsch DW, Oyunbileg J, 2013. Hypertension and hypertension-related disease in Mongolia; findings of a national knowledge, attitudes and practices study. BMC Public Health 13:1-10. Egan BM, Daniel TL, Jan NB, 2002. American society of hypertension regional chapters: leveraging the impact of the clinical hypertension specialist in the local community. Am J Hypertens 15:372-9. Eom AY, 2009. Influencing factors on health related to quality of life in hypertension patients. J Korean Biol Nurs Sci 11:136-42. Eom JS, Lee TY, Park SJ, Ahn YJ, Chung YJ, 2008. The risk factors of the pre-hypertension and hypertension of rural inhabitants in Chungnam-do. Korean J Nutr 41:742-53. Fotheringham AS, Brunsdon C, Charlton M, 2002. Geographically weighted regression: the analysis of spatially varying relationships. John Wiley and Sons, London, UK. Fotheringham AS, Charlton M, Brunsdon C, 1996. The geography of parameter space: an investigation into spatial non-stationarity. Int J GIS 10:605-27. Griffith DA, 2008. Spatial-filtering-based contribution to a critique of geographically weighted regression (GWR). Environ Plan 40:275169. He J, Whelton PK, Appel LJ, Charleston J, Klag MJ, 2000. Long-term effects of weight loss and dietary sodium reduction on incidence of hypertension. Hypertension 35:544-9. Helbich M, Brunauer W, Vaz E, Nijkamp P, 2013. Spatial heterogeneity in hedonic house price models: the case of Austria. Urban Studies 51:390-411. Huo XN, Li H, Sun DF, Zhou LD, Li BG, 2012. Combining geostatistics with Moranâ&#x20AC;&#x2122;s I analysis for mapping soil heavy metals in Beijing, China. Int J Environ Res Publ H 9:995-1017. Hurvich CM, Simonoff JS, CL Tasi, 1998. Smoothing parameter selection in nonparametric regression using an improved Akaike Information Criterion. J Roy Stat Soc B 60:271-93.

[Geospatial Health 2016; 11:420]


gh-2016_2.qxp_Hrev_master 31/05/16 11:45 Pagina 189

Article

Ishikawa T, Ohta T, Moritaki K, Gotou T, Inoue S, 2002. Obesity, weight change and risks for hypertension, diabetes and hypercholesterolemia in Japanese men. Eur J Clin Nutr 56:601-7. Jacqmin-Gadda H, Alperovitch A, Montlahuc C, Commenges D, Leffondre K, Dufouil C, Elbaz A, Tzourio C, MĂŠnard J, Dartigues JF, Joly P, 2013. 20-year prevalence projections for dementia and impact of preventive policy about risk factors. Eur J Epidemiol 28:493-502. Jang HS, Hyoung HK, Kim KH, 2009. Comparison of health-related characteristics and self-care behavior between a hypertension controlled group and a non-controlled group of hypertension patients in a customized home visiting health service. J Korean Commun Nurs 20:483-92. Kam S, Yes MH, Lee SK, Chun BY, 1991. A case-control study for risk factor related to hypertension. Korean J Prev Med 24:221-31. Kim DH, 2010. Analysis of small area variation of health behavior using 2008 Community Health Survey in Korea. Korea Centers for Disease Control and Prevention, Cheongju, Korea. Kim HS, Kim JY, Park KS, Lee KS, Kam S, 2003. Hypertension management status in rural hypertensives. J Agr Med Commun H 28:93106. Kim JG, Lee MS, Na BJ, Kim KY, Cho HW, Hong JY, Kang MY, Kim DK, 2005. A cross-sectional study on the risk factors for the development of hypertension in some rural area. Korean Publ Health Res 31:114-23. Kim OS, Jeon HO, Kim DH, Kim BH, Kim HJ, 2009. Risk factors of prehypertension in Korean adults: the Korean National Health and Nutrition Examination Survey 2005. Korean J Adult Nurs 21:28192. Lawes CM, Vander Hoorn S, Rodgers A, International Society of Hypertension, 2008. Global burden of blood pressure related disease, 2001. Lancet 371:1513-8. Lee KS, Choi YJ, 2013. Analysis on the relationships between the spatial distribution of primary care organizations and socio-demographic characteristics in a metropolitan city using the geographic weighted regression method. Product Rev 27:193-214. Lee SI, Choi HY, Ahn HS, Kim YI, Shin YS, 1989. A study on the insurance contribution and health care utilization of the regional medical insurance scheme. Korean J Prev Med 22:578-90. Macintyre S, Ellaway A, 2000. Ecological approaches: rediscovering the role of the physical and social environment. In: Berkman L, Kawachi I, eds. Social epidemiology. Oxford University Press, Oxford, UK, pp 332-48. Masahiko T, Saori O, Chiho I, Shogo S, Yasushi H, Takeshi T, Yoshiharu

I, Kunitoshi l, Koshiro F, 2000. Multiple risk factor clustering of hypertension in a screened cohort. J Hypertension 18:1375-85. Matyas E, Jeitler K, Horvath K, Semlitsch T, Hemkens LG, Pignitter N, Siebenhofer A, 2011. Benefit assessment of salt reduction in patients with hypertension: systematic overview. J Hypertension 29:821-8. Ministry of Health and Welfare, 2010. An introduction to the management of major chronic diseases in Korea in 2010. Ministry of Health and Welfare, Seoul, Korea. Moreira LB, Fuches FD, Moras RS, 1998. Alcohol intake and blood pressure: the importance of time elapsed since last drink. J Hypertension 16:175-80. Nicholson JP, Teichman SL, Alderman MH, Sos TA, Pickering TG, Laragh JH, 1983. Cigarette smoking and renovascular hypertension. Lancet 322:765-6. Nkeki, FN, Osirike AB, 2013. GIS-based local spatial statistical model of cholera occurrence: using geographically weighted regression. J Geogr Info Syst 5:6. Odden MC, Peralta CA, Haan MN, Covinsky KE, 2012. Rethinking the association of high blood pressure with mortality in elderly adults. Arch Int Med 172:1162-8. Oh HS, Kam S, Yeh MH, Kang YS, Kim KY, Lee YS, Park KS, Son JH, Lee SW, Ahn MY, Chun BY, 2000. The risk factors for the development of hypertension in a rural area: a 1-year prospective cohort study. Korean J Prev Med 33:199-207. Park SY, Lee KS, 2014. The effect of the regional factors on the variation of suicide rates: Geographic Information System analysis approach. Health Policy Manage 24:143-52. Player MS, King DE, Mainous AG, Geesey ME, 2007. Psychosocial factors and progression from prehypertension to hypertension or coronary heart disease. Ann Fam Med 5:403-11. Schlundt DG, Hargreaves MK, McClellan L, 2006. Geographic clustering of obesity, diabetes, and hypertension in Nashville, Tennessee. J Ambul Care Manage 29:125-32. Seok HS, Kang SH, 2013. A study on the regional variation factor of hypertension prevalence. Health Social Welfare Rev 33:210-36. Wanna-methee SG, Shaper AG, Walker M, Ebrahim S, 1998. Lifestyle and 15-year survival free of heart attack, stroke and diabetes in middle-aged British men. Arch Int Med 158:2433-40. Wen TH, Chen DR, Tsai MJ, 2010. Identifying geographical variations in poverty-obesity relationships: empirical evidence from Taiwan. Geospat Health 4:257-65.

[Geospatial Health 2016; 11:420]

[page 189]


gh-2016_2.qxp_Hrev_master 31/05/16 11:45 Pagina 190

Geospatial Health 2016; volume 11:428

Making the most of spatial information in health: a tutorial in Bayesian disease mapping for areal data Su Yun Kang,1,2 Susanna M. Cramb,1,3 Nicole M. White,1,2 Stephen J. Ball,4 Kerrie L. Mengersen1,2 1Mathematical Sciences School, Queensland University of Technology, Brisbane; 2Cooperative Research Centre Programme for Spatial Information, Melbourne; 3Viertel Centre for Research in Cancer Control, Cancer Council Queensland, Brisbane; 4School of Nursing, Midwifery and Paramedicine, Faculty of Health Sciences, Curtin University, Perth, Australia

Abstract Disease maps are effective tools for explaining and predicting patterns of disease outcomes across geographical space, identifying areas of potentially elevated risk, and formulating and validating aetiological hypotheses for a disease. Bayesian models have become a standard approach to disease mapping in recent decades. This article aims to provide a basic understanding of the key concepts involved in Bayesian disease mapping methods for areal data. It is anticipated that this will help in interpretation of published maps, and provide a useful starting point for anyone interested in running disease mapping methods for areal data. The article provides detailed motivation and descriptions on disease mapping methods by explaining the concepts, defining the technical terms, and illustrating the utility of disease mapping for epi-

Correspondence: Su Yun Kang, Mathematical Sciences School, Queensland University of Technology, GPO Box 2434, Brisbane QLD 4001, Australia. Tel: +61.7.31382603. E-mail: suyun602@gmail.com Key words: Areal data; Bayesian mapping; Disease mapping; Spatial information; Visualisation. Acknowledgements: the work was supported by the Cooperative Research Centre for Spatial Information, whose activities are funded by the Australian Commonwealth’s Cooperative Research Centres Programme. The authors are grateful to the Editor and two anonymous referees for their very useful comments that improved the quality of the paper. Received for publication: 12 November 2015. Revision received: 22 January 2016. Accepted for publication: 5 February 2016. ©Copyright S.Y. Kang et al., 2016 Licensee PAGEPress, Italy Geospatial Health 2016; 11:428 doi:10.4081/gh.2016.428 This article is distributed under the terms of the Creative Commons Attribution Noncommercial License (CC BY-NC 4.0) which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

[page 190]

demiological research by demonstrating various ways of visualising model outputs using a case study. The target audience includes spatial scientists in health and other fields, policy or decision makers, health geographers, spatial analysts, public health professionals, and epidemiologists.

Introduction Disease mapping is a flourishing field due to the growing amount of routinely collected health information worldwide (Rytkönen, 2004). Advances in geographic information systems have greatly aided the analytical manipulation and visual representation of spatial data (Burrough and McDonnell, 1998). Spatial information in health is especially useful for informing the locations of disease occurrences and the onus is on making the best possible use of this information. Some excellent introductory guides for disease mapping are available in the literature. Nonetheless, many of these are either not intended for non-statistical audiences, or lack specific details. For instance, Elliot et al. (2000) present a comprehensive review of the recent developments in spatial epidemiology but the statistical methods require a level of background knowledge, which may not be suitable for beginners. Marshall (1991) covers a broad range of methods for the analysis of the geographical distribution of disease, rather than upskill the reader in using particular methods. Lawson and Williams (2001) provide a broad overview of the issues concerning disease mapping but is short on specifics (English, 2001). Banerjee et al. (2014) present a fully model-based approach to all types of spatial data, including point level, areal, and point pattern data. Cramb et al. (2011b) offer insight into the decisions made in generating a health atlas, but is not intended as an entry-level article for a non-statistical audience. This article fills the niche by providing motivation, definition and description at a general level, and illustrating these ideas via a substantive case study. Although disease mapping has been undertaken in various forms for over 100 years, the opportunity now exists to use model-based maps that acknowledge uncertainty in inputs and outputs (LópezAbente et al., 2014; Catelan and Biggeri, 2010), take account of the spatial nature of the data to borrow strength from neighbouring areas in order to improve small area estimates, and can provide probability statements (Goovaerts, 2006b). In this article, we describe Bayesian disease mapping for areal data (Lawson, 2001, 2009) as an approach

[Geospatial Health 2016; 11:428]


gh-2016_2.qxp_Hrev_master 31/05/16 11:45 Pagina 191

Article

that addresses these issues. We focus on a running example of mapping cancer, although the methods are applicable to other diseases. The primary purpose of this article is to provide a basic understanding of the key concepts involved in Bayesian statistical models for disease mapping of areal data. We commence with a discussion of why disease model-based mapping methods are required. Background on Bayesian methods typically used for disease mapping is then provided, and then some of the cartographic outputs commonly used are discussed, including methods for indicating statistical uncertainty in relative risk (Appendix Part D) of disease.

Case study: cancer in Australia Cancer is now the worldâ&#x20AC;&#x2122;s and Australiaâ&#x20AC;&#x2122;s biggest killer (IARC, 2014). The number of cases diagnosed continues to increase worldwide due to population growth and aging, with the increasing prevalence (Appendix Part D) of physical inactivity, poor diet and reproductive changes (such as later parity) also contributing (Torre et al., 2015). In Australia, cancer accounts for almost one-fifth (19%) of the total disease burden (AIHW, 2014). Disparities in cancer outcomes across broad socioeconomic status and urban/rural categories have been reported internationally (Wilkinson and Cameron, 2004; Woods et al., 2006; Ernst et al., 2010). Within Australia, there are disparities in cancer outcomes with respect to geographic remoteness and socioeconomic status (AIHW, 2014). Cancers such as cervical and lung had higher incidence (Appendix Part D) and mortality as remoteness or area-level disadvantage increased. Furthermore, the five-year relative survival from all cancers combined decreased with greater remoteness and greater socioeconomic disadvantage. Understanding disparities in these broad areas, while useful, is unlikely to accurately reflect the heterogeneity in outcomes at the local level. Efforts to monitor and reduce cancer disparities can benefit greatly from quantifying variation across population groups and pertinent, small geographical areas. An understanding of the geographic patterns of cancer enables health decision-making by health service planners, clinicians, epidemiologists and industry groups to be more accurate and effective, for example by targeting policy development and resource allocation at areas of greater need (Mason et al., 1975; Kulldorff et al., 2006). Cramb et al. (2011a) produced the first Atlas of Cancer in Queensland to describe geographical variation in cancer incidence and survival across small areas in Queensland, using routinely-collected health information from the Queensland Cancer Registry. For the first time, Bayesian model-based cancer incidence and survival maps for Queensland were systematically presented at a comprehensive level. The Atlas significantly contributed to the understanding of geographical variation of cancer incidence and survival across Queensland, and subsequently influenced government policy decisions.

Materials and Methods Disease maps are a visual representation of disease outcomes. The use of disease maps to aid decision making in epidemiological and medical research is well recognised (Koch, 2011). Disease maps are effective tools for explaining and predicting patterns of disease outcomes across geographical space, identifying areas of potentially elevated risk, and formulating and validating aetiological hypotheses for a disease (Shen and Louis, 2000). They are able to uncover local-level inequalities frequently masked by health estimates from large areas

such as states, regions or cities (Borrell et al., 2010), enabling the development of disease reduction and prevention programmes targeting high-risk populations, see for instance, Mason et al. (1975) and Kulldorff et al. (2006) who have used cancer maps to depict the geographic patterns of cancer outcomes. Disease mapping encompasses small area studies that use data aggregated over small areas and take into account local spatial correlation, see for example, Clayton and Kaldor (1987); Cressie and Chan (1989); Besag et al. (1991) and Bernadinelli et al. (1997). Data sparseness is common in small area analyses, especially when working with less common diseases. A small number of observed and expected disease occurrences leads to unstable risk estimates (Ancelet et al., 2012). The problem of potentially unstable risk estimates for sparse spatial data needs to be mitigated to obtain reliable estimates. In practice, this is achieved by implementing spatial smoothing techniques. Spatial smoothing effectively borrows strength across small areas, so that the disease rate estimated for an area with a small population denominator would be weighted towards the estimated disease rate of neighbouring areas that have larger denominators. The estimates obtained by smoothing information from neighbouring areas are more reliable and robust due to the increased precision in the risk estimates in areas with few observations (Ancelet et al., 2012). In the context of disease mapping for small areas, the implementation of spatial smoothing is commonly achieved via the incorporation of a conditional autoregressive prior distribution for the spatial effects (Lee, 2011). A disease-mapping model is essentially a regression model that links a disease outcome to a set of risk factors. An important concept in disease mapping models, which is common to many other regression models, is the use of random effects (Appendix Part D). In this context, random effects provide a way of estimating variation in disease risk between areas that is not otherwise captured by known risk factors (e.g. age, sex, socioeconomic status, etc.).

Why Bayesian? Bayesian statistics takes its name from the English clergyman Thomas Bayes (1702-1761), although the key concepts were also contemporaneously established by Laplace and embedded in the general view of inverse probability at that time (Bernardo and Smith, 2009). It is an approach to data analysis that focuses on relating observed and unknown quantities using conditional probabilities, which are measures of the probability of an event given that another event has occurred. In a Bayesian model (Appendix Part E, Box 1), an unknown parameter (Appendix Part D) is represented using a distribution rather than a single point estimate (Johnson, 2004). The model parameters have distributions and are probabilistic [e.g. parameters representing coefficients associated with covariates in a regression model might be given a Normal distribution (Appendix Part E, Box 2)]. These distributions are known as prior distributions. These prior distributions can be considered as representing the uncertainty about the parameter before the data are seen. The parameters in the prior distributions (e.g. the mean and variance of the prior on a regression coefficient) can also have distributions, which are known as hyperprior distributions. Again, these distributions also represent uncertainty about our knowledge of these values. The combination of the prior information and the data results in a posterior distribution. The posterior distribution can be thought of as a probability distribution on the values of an unknown parameter that combines prior knowledge about the parameter and the observed data. The Bayesian model thus consists of parameters related to one another in the form of a hierarchy. The complex nature of spatial data can be captured using this hierarchical structure (Appendix Part D)

[Geospatial Health 2016; 11:428]

[page 191]


gh-2016_2.qxp_Hrev_master 31/05/16 11:45 Pagina 192

Article

(Shen and Louis, 2000; Best et al., 2005). Random effects are generally included in these models. Typically, a random effect is specified as being Normally distributed, whereby a few areas are allowed to have a disease incidence much lower than expected based on these risk factors, a few areas much higher, but most are close to expected (following a bell curve). For spatial data, we assume that sites closer to each other are more similar, so we can use information from neighbouring sites to obtain better estimates of disease risk. Hence, when we fit a spatially correlated random effect, the variation at a particular site is Normally distributed relative to the mean of its neighbours. These random effects thus relate disease risk estimates to neighbouring estimates, producing a smoothing effect across the area of interest. There are many reasons why the Bayesian approach is a useful framework for disease mapping. Firstly, Bayesian smoothing methods produce robust and reliable estimation of health outcomes of interest in a small area, even when based on small sample sizes (Ancelet et al., 2012). Within these small areas, the sample sizes are sometimes too small to yield estimates with adequate precision and reliability. Bayesian smoothing techniques improve the estimation by using information from neighbouring areas. Secondly, the use of prior distributions (usually based on existing knowledge or expert opinion) in disease mapping models helps strengthen inferences (Appendix Part D) about the true value of the parameter and ensures that all relevant information is included (Gurrin et al., 2000). These can be uninformative (e.g. set to be normally distributed with a mean of zero and a very large variance) or informative if there is other information about the effect of this risk factor (given the other risk factors in the model). Thirdly, the Bayesian approach allows for quantification of the uncertainty related to the health estimates from the posterior distributions (Ghosh et al., 1999; Wakefield, 2007). Spatial uncertainties added to the resulting risk maps depict local details of the spatial variation of the risk and provide valuable information for policy makers to make decisions about thresholds and public health (ApSimon et al., 2002; Johnson, 2004; Goovaerts, 2006b). Lastly, direct probabilistic statements can be made about the underlying and unobserved parameters of interest using their posterior probability distributions. In disease mapping, it might be of interest to make probability statements about areas of high risk for a disease. For instance, computing and mapping probabilities that the risk in an area exceeds certain thresholds can be done using the posterior probability distributions (Green and Richardson, 2002). This probability of exceedance can then be used to decide whether an area should be classified as having excess risk of a disease (Richardson et al., 2004). It is straightforward to make these kinds of statements in a Bayesian context, since they are directly obtained from the corresponding posterior distribution.

Data Often health data are only available with location data supplied as a small area (known as areal data), rather than a street address geocoded to a latitude/longitude point. Determining the most appropriate region size to use involves several considerations (Appendix Part E, Box 3). This article focuses on the application of disease mapping methods for spatial data aggregated over small areas and omits the discussion of other forms of spatial data such as geostatistical and point patterns data. As an alternative, health outcome data may also be analysed at the individual level, while incorporating spatial information at any geographical scale such as a point or an area. The data described in the Atlas (Cramb et al., 2011a) focused on [page 192]

Queensland cancer data aggregated to the statistical local area (SLA) level, which was the smallest area with annual population data available. However, consistent with most administrative regions, the areas are of varying sizes, and larger areas tend to dominate the map. An alternative approach is to aggregate disease data with continuous coordinate information to regular grid cells (Li et al., 2012a, 2012b; Kang et al., 2014). Such an approach allows modelling of disease data at a fine spatial scale, independent of administrative boundaries while preserving patient confidentiality. Using this approach, the spatial scale can be manipulated to a practically, geographically and computationally sensible scale. It does, however, require individual level geocoded data, which may not be accessible due to confidentiality concerns. Spatial data may also be available at various geographical scales and hence there is a need to combine information from multiple sources (Gotway and Young, 2002). Cramb et al. (2011a) mapped two health outcome measures in the Atlas, namely the incidence estimates and the relative survival estimates (discussed in the following Section). Incidence is a measure of the risk of developing a disease within a specified period of time. Relative survival is the standard measure of survival from a disease in population-based disease survival studies (Yu, 2013). Each of these outcomes requires specific input data (Appendix Part E, Boxes 4 and 5). Although other estimates of disease, such as prevalence, are beyond the scope of this article, Bayesian mapping approaches are described in Congdon (2006).

Bayesian spatial statistical models A response variable is the event studied and expected to vary whenever the independent variable is altered. It is also known as a dependent variable. Here we consider two response variables, namely the number of cancers diagnosed (incidence model) and the number of deaths within x years of diagnosis (relative survival model). Because both response distributions are counts, and the disease is less common, a Poisson distribution is used to model them (Appendix Part E, Box 6). The resulting estimate for the incidence of a disease is known as the standardised incidence ratio (SIR; Appendix Part D), which is an estimate of relative risk within each area based on the population size, that compares the observed incidence against the expected incidence. The SIR explains if the observed incidence in a particular area is higher or lower than the average across all areas included, given the age and sex distribution and population size of the area. The relative survival of a disease is modelled using an excess mortality model that contrasts the mortality in the background population with disease mortality. The survival model results in an excess hazard, which is called the relative excess risk (RER). The RER informs the relative survival (Appendix Part D) of a disease within each area, by reporting the risk of death within a certain number of years of diagnosis after adjusting for broad age groups, compared to the average. The SIR and RER are further explained in Appendix Part A. Small-area disease data typically exhibit spatial correlation due to spatial structure in the unknown risk factors. The presence of spatial correlation can be caused by a combination of socio-demographic clustering and environmental effects (Browning et al., 2003). Traditional regression models assume independence of random effects and so ignore the potential presence of spatial correlation. This may lead to false conclusions regarding covariate effects and unstable risk estimates (Fahrmeir and Kneib, 2011). The spatial correlation can be accounted for using spatial smoothing techniques, by estimating the effect of interest at a location using the effect values at nearby locations (Wang, 2006). Spatial smoothing approaches based on neighbourhood dependence are widely employed

[Geospatial Health 2016; 11:428]


gh-2016_2.qxp_Hrev_master 31/05/16 11:45 Pagina 193

Article

in disease mapping where areas with a common boundary are treated as neighbours (Paciorek, 2013). By accounting for the spatial correlation, model inference, prediction and estimation can be improved (Haran, 2011). The effect of the arbitrary geographical boundaries can also be reduced via spatial smoothing. Other smoothing techniques include interpolation methods, kernel regression, kriging and partition methods (Lawson et al., 2003; Goovaerts, 2006a). Two popular ways of defining a neighbourhood structure for the modelling of spatial correlation are the Queen definition and the Rook definition. The Rook method defines that two areas are considered neighbours if they share a common boundary whereas the Queen method specifies that two areas are termed neighbours if they share a common boundary or vertex. Following Earnest et al. (2007), the illustration of these two methods for defining a neighbourhood structure is given in Figure 1. Such information can be used to calculate the average of spatially correlated random effects of neighbours for area i. The following Bayesian spatial models take the spatial correlation into account by incorporating spatially correlated random effects. Both the incidence and relative survival models assume a Poisson distribution for the observed data and contain spatial and unstructured (nonspatial) random effects. The well-known Bayesian spatial model of Besag et al. (1991) is widely used to model disease incidence (Appendix Part E, Box 7) as it has desirable properties for disease mapping, particularly in modelling the geographical dependence between neighbouring areas (Best et al., 2005). The incidence model can also be used to model mortality. With regard to relative survival, the excess mortality can be modelled via a generalised linear model, using exact survival times (Dickman et al., 2004). The excess mortality is the mortality that is attributable to a particular disease. It is a measure of the deaths, which occur over and above those that would be expected for a given population. Such a Bayesian relative survival model (Appendix Part E, Box 8) has been used by Fairley et al. (2008) and Cramb et al. (2011a). See Appendix Part A for the statistical models for incidence and relative survival. In both models, the spatial random effect is the component that accounts for spatial correlation between neighbouring areas. The unstructured or non-spatial random effect accounts for the unexplained variation in the model. In a Bayesian analysis, it is assumed that all parameters arise from a probability distribution. As such, distributions representing the likely spread of values are placed on each parameter. Commonly, a vague Normal distribution such as one with mean 0 and variance 1.0×106 or Normal (0,1.0×106) is used for the intercept or coefficients of predictor terms (Appendix Part D). Vague priors refer to distributions with high spread, such as a Normal distribution with extremely large variance. Such a distribution gives similar prior value over a large range of parameter values. Generally, the unstructured (non-spatial) random effects and the spatial random effects are both assigned a prior distribution with additional hyperparameters (Appendix Part E, Box 9). To allow for spatial correlation, commonly an intrinsic conditional autoregressive (CAR) distribution is used. The CAR prior models the spatial dependence in a study region by effectively borrowing information from neighbouring areas than from distant areas and smoothing local rates toward local, neighbouring values. The method provides some shrinkage and spatial smoothing of the raw relative risk estimates (Clayton and Kaldor, 1987). This results in a more stable estimate of the pattern of the underlying disease risk than that provided by the raw estimates. Consequently, the variance in the associated estimates is reduced and the spatial effect of geographical differences can be identified. This prior has been widely employed in disease mapping to study the geographical variation of disease risk (Clayton and Bernardinelli, 1992;

Mollié, 1996; Wakefield et al., 2000), and works particularly well to smooth out variability not relevant to the underlying risk (Assunção and Krainski, 2009). Commonly, both the precision (inverse of the variance) hyperparameters (Appendix Part D) are assigned a Gamma distribution. Alternative hyperprior distributions may include placing either a Uniform or half-Normal distribution on the standard deviation (square root of the variance) (Gelman and Hill, 2006). The prior distributions used for the parameters may influence the results and therefore should be carefully considered and compared. There are two issues to consider when deciding on a prior distribution (Gelman, 2002): i) what information is going into the prior distribution; and ii) the impact on the resulting posterior distribution. A sensitivity analysis (Appendix Part D) (Junaidi et al., 2011) can be used to investigate the dependence of the posterior distribution on prior distributions by comparing posterior inferences under different reasonable choices of prior distribution. A literature review is usually helpful to determine the prior distributions being used in similar Bayesian models.

Computation The complexity of these models means they cannot be solved analytically. Instead, some method of approximation is required. One approach is to use Markov chain Monte Carlo (MCMC) methods (Appendix Part D), which samples from the posterior distribution. A variety of software is available to conduct MCMC, including BUGS (Bayesian inference Using Gibbs Sampling), JAGS (Just Another Gibbs Sampler), Stan and BACC (Bayesian Analysis, Computation & Communication). WinBUGS is one of the most popular options (Brooks et al., 2011) that provides great flexibility in Bayesian modelling, has a simple programming language (Crainiceanu et al., 2005) and interfaces with multiple statistical software, including R, Matlab, Stata and SAS. See Appendix Part B for the WinBUGS code for the discussed models. Some useful resources to help learn WinBUGS include Lawson et al. (2003), Lunn et al. (2012), Ntzoufras (2009), Lykou and Ntzoufras (2011), and Spiegelhalter et al. (2003). Bayesian computation for the above models can also be conducted in R (R Core Team, 2012), by call-

Figure 1. The representation of neighbourhood structure of area i. Based on the Rook method, neighbours for area i include areas 2, 4, 6 and 8, while the Queen method defines regions 1-8 as neighbours of area i.

[Geospatial Health 2016; 11:428]

[page 193]


gh-2016_2.qxp_Hrev_master 31/05/16 11:45 Pagina 194

Article

ing the INLA programme and adopting the integrated nested Laplace approximation (INLA) approach proposed by Rue et al. (2009). The INLA approach performs Bayesian inference for spatial models and is able to return accurate parameter estimates in a much shorter time than MCMC. The use of R-INLA for statistical analysis in various disciplines is increasingly common in recent years, including disease mapping. Appendix Part C provides R-INLA code to perform computation for the discussed models. Some useful resources for getting started with RINLA include Schrรถdle and Held (2011a, 2011b), Blangiardo et al. (2013), and Rue et al. (2012). To incorporate neighbourhood dependence into the Bayesian models, a neighbourhood matrix is required. The neighbourhood matrix contains a list of neighbours for an area. Freely available software programmes that will calculate a neighbourhood matrix include GeoDa (Anselin et al., 2006), the spdep R package Bivand et al. (2011), or within WinBUGS.

Perhaps the greatest advantage of Bayesian methods is the diversity of options available to assist in the decision making process. Communicating results in a way that is easily interpretable and accurate enables informed decisions to be made. Here we outline some of the ways modelled estimates can be used and visualised. The SIR and RER estimates produced using the methods described in the previous sections are two commonly seen measures of disease risk. Appendices B and C outline the code required for producing the estimates. The estimates produced by Bayesian models give great flexibility in reporting results, including comparison of the risk estimates against the average, ranking estimates, and/or examining the uncertainty around the estimates. Ranking of disease estimates ensures that public health investigations or interventions are prioritised correctly (Shen and Louis, 2000).

In the Bayesian context, the posterior distributions of health outcome measures (such as SIR and RER) allow for the calculation of rank estimates of each area (Clayton and Kaldor, 1987; Lawson et al., 2000). For instance, Athens et al. (2013) use five health outcome measures to obtain county rank estimates for a composite health outcome measure. The five health outcome measures are converted to a score, and then ranked by weighted means. The ranking of health outcomes is useful for representing health performance of each area which can then be used to inform health decision making. Moreover, comparison between two areas can be made easily in the Bayesian framework. Outside of Bayesian methods, it may be difficult and problematic to conduct a large number of pairwise comparisons for all areas using post-hoc tests (Jaccard et al., 1984). The problem is that by conducting so many comparisons, the probability of finding some of the differences statistically significant by chance alone increases. The Bayesian context eliminates this issue with pairwise comparisons of the posterior distributions. Bayesian methods produce measures of uncertainty for each modelled estimate. The uncertainty attached to the spatial distribution of risk values across the study region can be known as spatial uncertainty (Goovaerts, 2006b). It is valuable to visualise spatial uncertainty as it provides local details of the spatial variation of the risk, as well as an input to resource allocation, management and policy strategies. Several methods have been proposed to describe the uncertainty attached to the smoothed rates, including mapping the 95% credible interval (Appendix Part D) of the posterior distribution of smoothed rates (Johnson, 2004) and the probability that the risk in each small area exceeds a certain threshold (Richardson et al., 2004). Under the Bayesian paradigm, there is great flexibility in communicating and visualising results. Options include maps or graphs of the smoothed estimates, their associated uncertainty, or the probabilities

Figure 2. Bayesian smoothed estimate of relative excess risk (RER). To show the spatial pattern of the underlying risk, the median of the posterior distribution of statistical local area (SLA)-level RER is mapped. An inset of South-East Queensland is provided for greater detail as this region has a large number of SLAs. Thematic categories are based on fixed breaks method.

Figure 3. Uncertainty of Bayesian smoothed estimate of relative excess risk (RER). This map depicts the uncertainty associated with the estimates of relative risk. The 95th percentile range (97.5th minus the 2.5th percentile) of the 10,000 values sampled from the posterior distribution of RER for each statistical local area (SLA) is mapped here. An inset of South-East Queensland is provided for greater detail as this region has a large number of SLAs. Thematic categories are based on quintiles.

Making decisions

[page 194]

[Geospatial Health 2016; 11:428]


gh-2016_2.qxp_Hrev_master 31/05/16 11:45 Pagina 195

Article

of being above/below certain values. Mapping of disease rates or outcomes facilitates comparison of spatial patterns in disease rates between males and females, between age groups, between races, over time, and motivates comparison with patterns of potential causes (Brewer and Pickle, 2002). By comparing disease rates of different areas, clues to possible causation may be found and this serves as a starting point for further investigation. The purpose of this Section is to showcase various visualisations that can be produced using the outputs obtained from Bayesian modelling techniques and the associated interpretation. This is demonstrated on a common cancer with poor survival: male lung cancer in Queensland. Figures 2 to 7 present an array of maps or plots based on the results from modelled survival (RER of death within 5 years of diagnosis) for each SLA that are useful for communicating the results of statistical analysis via the Bayesian paradigm. The RER expresses the risk of cancer patients dying from their cancer within five years of diagnosis in an SLA compared to the Queensland average (RER=1), and therefore should not be directly compared between two SLAs. The figures were produced using the software R, package maptools. Figure 2 maps the posterior distribution of SLA-level RER and provides a picture of the spatial pattern of the underlying risk. Figure 3 depicts the uncertainty associated with the Bayesian estimates of RER by mapping the 95th percentile range of the 10,000 values sampled from the posterior distribution of RER for each SLA. A graph showing the ranked RER with the associated 95% credible interval for each SLA is provided in Figure 4. Horizontal box plots (Appendix Part D) of the RER estimates by socioeconomic status and rurality are provided in Figure 5 to provide additional information about where the extent of variability across the Queensland state. Figure 6 maps the SLAs having a 90% probability of RER being higher than the Queensland average (RER=1) (highlighted in red) and the SLAs having at least a 90% probability of RER being lower than the Queensland average (RER=1) (highlighted in blue). Figure 7A depicts the probability of the SLAs having RER exceeding 1 and Figure 7B depicts the probability of the SLAs having RER exceeding 1.2.

Figure 4. Uncertainty of Bayesian smoothed estimate of relative excess risk (RER). The 95% credible interval (97.5th-2.5th percentile) of the 10,000 values sampled from the posterior distribution of RER for each statistical local area (SLA) is plotted here. This plot shows how much reliance can be placed on the estimates. The black line is the median RER for each SLA. The blue vertical lines are the 95% credible intervals, and indicate the amount of uncertainty associated with each estimate. The red line shows the Queensland average (set to 1).

A

Results and Discussion

B

In this article we have outlined the benefits of Bayesian models for both analysis and visualisation. The public health arena regularly makes practical decisions affecting peopleâ&#x20AC;&#x2122;s health. To facilitate decisions, it is vital that the analysis is conducted appropriately, and results are communicated effectively. Bayesian methods are increasingly being used to analyse routinely collected data. The Bayesian framework is now the tool of choice in many applied statistical areas, including disease mapping (Lawson et al., 1999). In small area studies, Bayesian methods often have better model fit than non-Bayesian smoothing methods (Lawson et al., 2000). Greater flexibility in distributional assumptions is possible under Bayesian methods than in traditional regression models (Waller and Gotway, 2004). Whether to standardise response rates depends on the study objectives. For the cancer atlas, it was desirable to remove the influence of age, so that differences were not due to different age structures between areas. For incidence, we used the SIR, which adjusts for the area-specific age and sex structure. An alternative method to standardisation for dealing with confounders is via the use of regression models (McNamee, 2005). These can be particularly useful when multiple confounders need to be controlled for simultaneously. For relative survival, we included age in the regression equation to remove its

Figure 5. Distribution of smoothed relative excess risk (RER) estimates according to: socioeconomic status (A) and rurality (B). The distributional plots reflect the general patterns in the smoothed RER estimates across the area-based categories of socioeconomic status and rurality. These plots show the proportion of RER estimates that are above or below the Queensland average (vertical red line) within each of the area-based categories. The plots only present the range of point estimates, and so do not take the amount of uncertainty associated with each statistical local area-specific estimate into account.

[Geospatial Health 2016; 11:428]

[page 195]


gh-2016_2.qxp_Hrev_master 31/05/16 11:45 Pagina 196

Article

influence on the results. However, if the purpose of a study is to identify where the highest rates of disease are, such as for service provision, then there is no need to standardise (or otherwise adjust) the incidence rates. This is because the cause of the variation (whether sex, age or other factors), is inconsequential. Visualising disease patterns through maps remains an effective method to convey a large amount of information in an engaging way. Few modern day visualisations include uncertainty measures, yet this greatly assists in decision-making. Online, interactive visualisations can dynamically link maps (e.g. Figure 2 showing the smoothed Bayesian RER), with plots of the uncertainty (e.g. Figure 3 showing the 95% credible interval for each area). Selecting an area would then highlight the corresponding region in both plots, providing much greater information to the user. There are limitations associated with using routinely collected data. Determining the direction of causation may not be possible. Often there is a lag time between exposure and disease detection, and patients may move during this time. Bayesian methods also have certain limitations, including greater computational time if using Markov chain Monte Carlo approaches, and requiring sensitivity analyses to ensure priors are not exerting undue effect. With regard to computation using R-INLA, models must be expressible in the linear model for-

mat and there are restrictions on the types of prior distributions that can be assumed. However, we believe the advantages outlined in this article outweigh any limitations. Routinely collected data exist to enable disease monitoring and control. Appropriate analyses convert this data into information, which once communicated, enables action. Bayesian methods not only enable appropriate analyses to be performed, they also provide greater flexibility in visual communications. Can descriptive studies really influence government policy? The disparities identified in the cancer atlas resulted in the Queensland government including a specific objective aimed at reducing the geographic disparities in cancer outcomes in their Strategic Directions (Statewide Health Service Strategy and Planning Unit, 2014). Results were also used in lobbying to increase the amount of financial assistance the government provided to remote patients to offset travel and accommodation costs while obtaining treatment away from home, and the amount provided was subsequently increased. Our experience is that routinely collected data, when appropriately analysed and communicated, facilitate appropriate government action.

A

B

Figure 6. In the Bayesian paradigm, the statistical local areas (SLAs) highlighted in red have a 90% probability of relative excess risk (RER) being higher than the Queensland average (RER=1). This means that the lower 10th percentile of the posterior distribution of RER exceeds 1. The SLAs highlighted in blue express at least a 90% probability of RER being lower than the Queensland average (RER=1). This means that the upper 90th percentile of the posterior distribution of RER is less than 1. The density plots show the posterior distribution of RER for four randomly chosen SLAs where the x-axis is the RER values. The two density plots on the left show that there is more than 90% chance for the RER to be higher than 1. The two density plots on the right show that there is more than 90% chance for the RER to be lower than 1. The percentage of low risk or high risk for each SLA is also given in each density plot. An inset of South-East Queensland is provided for greater detail as this region has a large number of SLAs.

[page 196]

Figure 7. Thematic map depicting the probability of relative excess risk exceeding 1 (A) and 1.2 (B). The threshold 1.2 was chosen to reflect high risk as it lies in the fifth quintile. Four statistical local areas (SLAs) are chosen to demonstrate how the probabilities change when the thresholds change. An inset of South-East Queensland is provided for greater detail as this region has a large number of SLAs.

[Geospatial Health 2016; 11:428]


gh-2016_2.qxp_Hrev_master 31/05/16 11:45 Pagina 197

Article

Conclusions We hope this article will enable greater understanding and potentially uptake of Bayesian methods in disease mapping, along with available options for communicating estimates and their uncertainty.

References AIHW, 2014. Cancer in Australia: an overview, 2014. Australian Institute of Health and Welfare, Canberra, Australia. Ancelet S, Abellan JJ, Vilas VJDR, Birch C, Richardson S, 2012. Bayesian shared spatial-component models to combine and borrow strength across sparse disease surveillance sources. Biom J 54:385-404. Anselin L, Syabri I, Kho Y, 2006. Geoda: an introduction to spatial data analysis. Geogr Anal 38:5-22. ApSimon HM, Warren RF, Kayin S, 2002. Addressing uncertainty in environmental modelling: a case study of integrated assessment of strategies to combat long-range transboundary air pollution. Atmos Environ 36:5417-26. Assunção R, Krainski E, 2009. Neighborhood dependence in Bayesian spatial models. Biom J 51:851-69. Athens JK, Catlin BB, Remington PL, Gangnon RE, 2013. Using empirical Bayes methods to rank counties on population health measures. Available from: http://www.cdc.gov/pcd/issues/2013/13_ 0028.htm Banerjee S, Carlin BP, Gelfand AE, 2014. Hierarchical modeling and analysis for spatial data. 2nd ed. Chapman and Hall/CRC, Boca Raton, FL, USA. Bernadinelli L, Pascutto C, Best NG, Gilks WR, 1997. Disease mapping with errors in covariates. Stat Med 16:741-52. Bernardo JM, Smith AFM, 2009. Bayesian theory. Vol 405. John Wiley & Sons Ltd, Hoboken, NJ, USA. Besag J, York J, Mollié A, 1991. Bayesian image restoration, with two applications in spatial statistics. Ann Inst Stat Math 43:1-20. Best N, Richardson S, Thomson A, 2005. A comparison of Bayesian spatial models for disease mapping. Stat Methods Med Res 14:35-59. Bivand R, Anselin L, Berke O, Bernat A, Carvalho M, Chun Y, Dormann CF, Dray S, Halbersma R, Lewin-Koh N, 2011. Spdep: spatial dependence: weighting schemes, statistics and models. Available from: https://cran.r-project.org/web/packages/spdep/index.html Blangiardo M, Cameletti M, Baio G, Rue H, 2013. Spatial and spatiotemporal models with R-INLA. Spat Spatiotemporal Epidemiol 7:3955. Borrell C, Marí-Dell’Olmo M, Serral G, Martínez-Beneito M, Gotsens M, 2010. Inequalities in mortality in small areas of eleven Spanish cities (the multicenter MEDEA project). Health Place 16:703-11. Brewer CA, Pickle L, 2002. Evaluation of methods for classifying epidemiological data on choropleth maps in series. Ann Assoc Am Geogr 92:662-81. Brooks S, Gelman A, Jones G, Meng XL, 2011. Handbook of Markov Chain Monte Carlo. Chapman & Hall/CRC, Boca Raton, FL, USA. Browning CR, Cagney KA, Wen M, 2003. Explaining variation in health status across space and time: implications for racial and ethnic disparities in self-rated health. Soc Sci Med 57:1221-35. Burrough PA, McDonnell R, 1998. Principles of geographical information systems. Oxford University Press, Oxford, UK. Catelan D, Biggeri A, 2010. Multiple testing in disease mapping and

descriptive epidemiology. Geospat Health 4:219-29. Clayton D, Bernardinelli L, 1992. Bayesian methods for mapping disease risk. In: Elliott P, Cuzick J, English D, Stern R, eds. Geographical and environmental epidemiology: methods for small area studies. Oxford University Press, Oxford, UK, pp 205-20. Clayton D, Kaldor J, 1987. Empirical Bayes estimates of age-standardized relative risks for use in disease mapping. Biometrics 43:67181. Congdon P, 2006. Estimating diabetes prevalence by small area in England. J Public Health 28:71-81. Crainiceanu CM, Ruppert D, Wand MP, 2005. Bayesian analysis for penalized spline regression using WinBUGS. J Stat Softw 14:1-24. Cramb SM, Mengersen KL, Baade PD, 2011a. Atlas of cancer in Queensland: geographical variation in incidence and survival, 1998-2007. Viertel Centre for Research in Cancer Control, Cancer Council Queensland, Brisbane, Australia. Cramb SM, Mengersen KL, Baade PD, 2011b. Developing the atlas of cancer in Queensland: methodological issues. Int J Health Geogr 10:9. Cressie N, Chan NH, 1989. Spatial modeling of regional variables. J Am Stat Assoc 84:393-401. Dickman PW, Sloggett A, Hills M, Hakulinen T, 2004. Regression models for relative survival. Stat Med 23:51-64. Earnest A, Morgan G, Mengersen K, Ryan L, Summerhayes R, Beard J, 2007. Evaluating the effect of neighbourhood weight matrices on smoothing properties of conditional autoregressive (CAR) models. Int J Health Geogr 6:54. Elliot P, Wakefield JC, Best NG, Briggs DJ, 2000. Spatial epidemiology: methods and applications. Oxford University Press, Oxford, UK. English PB, 2001. An introductory guide to disease mapping. Am J Epidemiol 154:881-2. Ernst J, Zenger M, Schmidt R, Schwarz R, Brähler E, 2010. Medical and psychosocial care needs of cancer patients: a systematic review comparing urban and rural provisions. Deut Med Wochenschr 135:1531-7. Fahrmeir L, Kneib T, 2011. Bayesian smoothing and regression for longitudinal, spatial and event history data. Oxford University Press, Oxford, UK. Fairley L, Forman D, West R, Manda S, 2008. Spatial variation in prostate cancer survival in the Northern and Yorkshire region of England using Bayesian relative survival smoothing. Brit J Cancer 99:1786-93. Gelman A, 2002. Prior distribution. In: El-Shaarawi AH, Piegorsch WW, eds. Encyclopedia of environmetrics. John Wiley & Sons Ltd, Chichester, UK, pp 1634-7. Gelman A, Hill J, 2006. Data analysis using regression and multilevel/hierarchical models. Cambridge University Press, New York, NY, USA. Ghosh M, Natarajan K, Waller LA, Kim D, 1999. Hierarchical Bayes GLMs for the analysis of spatial data: an application to disease mapping. J Stat Plan Inference 75:305-18. Goovaerts P, 2006a. Geostatistical analysis of disease data: accounting for spatial support and population density in the isopleth mapping of cancer mortality risk using area-to-point Poisson kriging. Int J Health Geogr 5:52. Goovaerts P, 2006b. Geostatistical analysis of disease data: visualization and propagation of spatial uncertainty in cancer mortality risk using Poisson kriging and p-field simulation. Int J Health Geogr 5:7. Gotway CA, Young LJ, 2002. Combining incompatible spatial data. J Am Stat Assoc 97:632-48.

[Geospatial Health 2016; 11:428]

[page 197]


gh-2016_2.qxp_Hrev_master 31/05/16 11:45 Pagina 198

Article

Green PJ, and S Richardson, 2002. Hidden markov models and disease mapping. J Am Stat Assoc 97:1055-70. Gurrin LC, Kurinczuk JJ, Burton PR, 2000. Bayesian statistics in medical research: an intuitive alternative to conventional data analysis. J Eval Clin Pract 6:193-204. Haran M, 2011. Gaussian random field models for spatial data. In: Brooks SP, Gelman A, Jones GL, Meng XL, eds. Handbook of Markov Chain Monte Carlo. CRC Press, Boca Raton, FL, USA, pp 449-78. IARC, 2014. World cancer report 2014. International Agency for Research on Cancer-World Health Organization, Geneva, Switzerland. Jaccard J, Becker MA, Wood G, 1984. Pairwise multiple comparison procedures: a review. Psychol Bull 96:589. Johnson GD, 2004. Small area mapping of prostate cancer incidence in New York State (USA) using fully Bayesian hierarchical modelling. Int J Health Geogr 3:29. Junaidi, Stojanovski E, Nur D, 2011. Prior sensitivity analysis for a hierarchical model. In: Proceedings of the Fourth Annual ASEARC Conference, 17-18 February 2011, University of Western Sydney, Paramatta, Australia. Kang SY, McGree J, Baade P, Mengersen K, 2014. An investigation of the impact of various geographical scales for the specification of spatial dependence. J Appl Stat 41:2515-38. Koch T, 2011. Disease maps: epidemics on the ground. University of Chicago Press, Chicago, USA. Kulldorff M, Song C, Gregorio D, Samociuk H, DeChello L, 2006. Cancer map patterns: are they random or not? Am J Prev Med 30:37-49. Lawson AB, 2001. Statistical methods in spatial epidemiology. Wiley, Chichester, UK. Lawson AB, 2009. Bayesian disease mapping: hierarchical modeling in spatial epidemiology. CRC Press, Boca Raton, FL, USA. Lawson AB, Biggeri AB, Böhning D, Lesaffre E, Viel J-F, Bertollini R, 1999. Disease mapping and risk assessment for public health. John Wiley & Sons, Chichester, UK. Lawson AB, Biggeri AB, Böhning D, Lesaffre E, Viel J-F, Clark A, Schlattmann P, Divino F, 2000. Disease mapping models: an empirical evaluation. Stat Med 19:2217-41. Lawson AB, Browne WJ, Rodeiro CV, 2003. Disease mapping with WinBUGS and MLwiN. Vol 11. John Wiley & Sons, Chichester, UK. Lawson AB, Williams FLR, 2001. An introductory guide to disease mapping. John Wiley & Sons, Chichester, UK. Lee D, 2011. A comparison of conditional autoregressive models used in Bayesian disease mapping. Spat Spatiotemporal Epidemiol 2:7989. Li Y, Brown P, Gesink DC, Rue H, 2012a. Log Gaussian Cox processes and spatially aggregated disease incidence data. Stat Methods Med Res 21:479-507. Li Y, Brown P, Rue H, al Maini M, Fortin P, 2012b. Spatial modelling of lupus incidence over 40 years with changes in census areas. J Roy Stat Soc C Appl Stat 61:99-115. López-Abente G, Aragonés N, García-Pérez J, Fernández-Navarro P, 2014. Disease mapping and spatio-temporal analysis: importance of expected-case computation criteria. Geospat Health 9:27-35. Lunn D, Jackson C, Best N, Thomas A, Spiegelhalter D, 2012. The BUGS book: a practical introduction to Bayesian analysis. Chapman and Hall/CRC Press, Boca Raton, FL, USA. Lykou A, Ntzoufras I, 2011. WinBUGS: a tutorial. Wiley Interdiscip Rev Comput Stat 3:385-96. Marshall RJ, 1991. A review of methods for the statistical analysis of spatial patterns of disease. J Roy Stat Soc A Sta 154:421-41.

[page 198]

Mason TJ, McKay FW, Hoover R, Blot WJ, Fraumeni JF, 1975. Atlas of cancer mortality for U.S. counties: 1950-1969. US Govt. Printing Office, Washington, DC, USA. McNamee R, 2005. Regression modelling and other methods to control confounding. Occup Environ Med 62:500-6. Mollié A, 1996. Bayesian mapping of disease. In: Gilks WR, Richardson S, Spiegelhalter DJ, eds. Markov Chain Monte Carlo in practice. Chapman & Hall, London, UK, pp 359-79. Ntzoufras I, 2009. Bayesian modeling using WinBUGS. John Wiley & Sons, Hoboken, NJ, USA. Paciorek CJ, 2013. Spatial models for point and areal data using Markov random fields on a fine grid. Electron J Stat 7:946-72. R Core Team, 2012. R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. Richardson S, Thomson A, Best N, Elliott P, 2004. Interpreting posterior relative risk estimates in disease-mapping studies. Environ Health Persp 112:1016. Rue H, Martino S, Chopin N, 2009. Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. J Roy Stat Soc B Met 71:319-92. Rue H, Martino S, Lindgren F, 2012. The R-INLA project. Available from: http://www.r-inla.org Rytkönen MJ, 2004. Not all maps are equal: GIS and spatial analysis in epidemiology. Int J Circumpolar Health 63:9-24. Schrödle B, Held L, 2011a. A primer on disease mapping and ecological regression using INLA. Computation Stat 26:241-58. Schrödle B, Held L, 2011b. Spatiotemporal disease mapping using INLA. Environmetrics 22:725-34. Shen W, Louis TA, 2000. Triple-goal estimates for disease mapping. Stat Med 19:2295-308. Spiegelhalter D, Thomas A, Best N, Lunn D, 2003. WinBUGS user manual. Available from: www.mrc-bsu.cam.ac.uk/wp-content/uploads/ manual14.pdf Statewide Health Service Strategy and Planning Unit, 2014. Cancer care services statewide health service strategy 2014. Statewide Health Service Strategy and Planning Unit, Brisbane, Australia. Thomas DC, 2014. Statistical methods in environmental epidemiology. Oxford University Press, Oxford, UK. Torre LA, Bray F, Siegel RL, Ferlay J, Lortet-Tieulent J, Jemal A, 2015. Global cancer statistics, 2012. CA: Cancer J Clin 65:87-108. Wakefield J, 2007. Disease mapping and spatial regression with count data. Biostatistics 8:158-83. Wakefield JC, Best NG, Waller LA, 2000. Bayesian approaches to disease mapping. In: Elliot P, Wakefield JC, Best NG, Briggs DJ, eds. Spatial epidemiology: methods and applications. Oxford University Press, Oxford, UK, pp 104-27. Waller LA, Gotway CA, 2004. Applied spatial statistics for public health data. John Wiley & Sons, Hoboken, NJ, USA. Wang F, 2006. Quantitative methods and applications in GIS. CRC Press, Boca Raton, FL, USA. Wilkinson D, Cameron K, 2004. Cancer and cancer risk in South Australia: what evidence for a rural-urban health differential? Aust J Rural Health 12:61-6. Woods LM, Rachet B, Coleman MP, 2006. Origins of socio-economic inequalities in cancer survival: a review. Ann Oncol 17:5-19. Yu B, 2013. A class of transformation covariate regression models for estimating the excess hazard in relative survival analysis. Am J Epidemiol 177:708-17.

[Geospatial Health 2016; 11:428]


gh-2016_2.qxp_Hrev_master 31/05/16 11:45 Pagina 199

Geospatial Health 2016; volume 11:455

Revealing spatio-temporal patterns of rabies spread among various categories of animals in the Republic of Kazakhstan, 2010-2013 Sarsenbay K. Abdrakhmanov,1 Kanatzhan K. Beisembayev,1 Fedor I. Korennoy,2 Gulzhan N. Yessembekova,1 Dosym B. Kushubaev,1 Ablaikhan S. Kadyrov1 1S. Seifullin Kazakh Agro-Technical University, Astana, Kazakhstan; 2Federal Center for Animal Health (FGBI ARRIAH), Vladimir, Russia

have been used to produce set of recommendations for organising of preventive and contra-epizootic measures against rabies expected to be applied by state veterinarian services.

Abstract This study estimated the basic reproductive ratio of rabies at the population level in wild animals (foxes), farm animals (cattle, camels, horses, sheep) and what we classified as domestic animals (cats, dogs) in the Republic of Kazakhstan (RK). It also aimed at forecasting the possible number of new outbreaks in case of emergence of the disease in new territories. We considered cases of rabies in animals in RK from 2010 to 2013, recorded by regional veterinary services. Statistically significant space-time clusters of outbreaks in three subpopulations were detected by means of Kulldorff Scan statistics. Theoretical curves were then fitted to epidemiological data within each cluster assuming exponential initial growth, which was followed up by calculation of the basic reproductive ratio R0. For farm animals, the value of R0 was 1.62 (1.11-2.26) and for wild animals 1.84 (1.083.13), while it was close to 1 for domestic animals. Using the values obtained, an initial phase of possible epidemic was simulated in order to predict the expected number of secondary cases if the disease were introduced into a new area. The possible number of new cases for 20 weeks was estimated at 5 (1-16) for farm animals, 17 (1-113) for wild animals and about 1 in the category of domestic animals. These results

Correspondence: Sarsenbay K. Abdrakhmanov, S. Seifullin Kazakh AgroTechnical University, 62 av. Pobeda, 010011 Astana, Kazakhstan. Tel. +77.013.881467. E-mail: S_abdrakhmanov@mail.ru Key words: Kazakhstan; Rabies; Spatio-temporal patterns; GIS. Acknowledgements: this work was accomplished under a Budgetary Program of Ministry of Agriculture of the Republic of Kazakhstan #0115 RK01952 Scientific support of veterinary well-being. Received for publication: 23 January 2016. Revision received: 10 February 2016. Accepted for publication: 13 February 2016. ŠCopyright S.K. Abdrakhmanov et al., 2016 Licensee PAGEPress, Italy Geospatial Health 2016; 11:455 doi:10.4081/gh.2016.455

Introduction The epidemic situation with respect to rabies in the endemic countries in the world, including the Republic of Kazakhstan (RK), is characterised by an uneven spread of infection. The disease is registered on every continent except Australia and Antarctica (Smreczak et al., 2012; Youla et al., 2014; Eckardt et al., 2015). The disease has an acute course with overt signs of polyencephalomyelitis and the mortality is 100% in the absence of immediate treatment. Due to the accession of RK to the World Trade Organization (WTO), the need for socio-economic stability, including epizootics that may extend to humans, is increasingly important for each administrative unit of the country. Currently, an ascending trend of rabies with an average increase of 7% per year among susceptible animals (fox, raccoon dogs, wolves, cats and cattle) is being observed in RK. About 700 heads of farm animals perish annually from rabies in the republic and more than 50% of them consist of cattle and up to 25% of small ruminants (Makarov et al., 2008; Abdrakhmanov et al., 2010). Rabies gets the attention of the veterinary services, and mass vaccination campaigns and surveillance are applied both in the population of wild animals as well as in livestock and other domestic animals. Predicting the size of possible epidemics in the event of new rabies outbreaks is one of the priorities of the veterinary service. The number of recorded cases in different animal categories during 2010-2013 was studied in order to identify the main epidemiological patterns of the disease at the population level. During this study, significant spatio-temporal clusters of the disease were identified; epidemic curves built for each cluster and the values of the basic reproductive ratio (R0) for each sub-population calculated (Iglesias et al., 2011, 2015). The values obtained were then used to simulate the possible number of new outbreaks over a period of 20 weeks in the case of a new infection focus emergence.

Materials and Methods Study area

This article is distributed under the terms of the Creative Commons Attribution Noncommercial License (CC BY-NC 4.0) which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

The whole territory of RK, covering an area of 2,724,902 km2, was used as a study area in this research. RK is 9th largest country in the world and 4th largest in Eurasia with total population of more than 17 billion. The first-level administrative division is represented by 14 regions

[Geospatial Health 2016; 11:455]

[page 199]


gh-2016_2.qxp_Hrev_master 31/05/16 11:45 Pagina 200

Article

(oblasts), the spatial extent of which ranges from 117,249 to 427,982 km2 (Figure 1). Agriculture, especially cattle breeding, is one of the priority sectors of the economics of RK. As of January 2015, the number of cattle (in thousands of heads) reached 6028.0, sheep – 15,532.4, goats – 2378.9, pigs – 844.2, horses – 1936.7 and camels – 165.9.

Data The data on rabies outbreaks in RK during 2010-2013 were provided by the national veterinary service and recorded during expedition trips. The database consists of 496 outbreaks in animals such as cats, dogs, cows, foxes, camels, sheep and horses (Table 1). To facilitate modelling, all kinds of animals were divided into three categories: farm animals (cattle, camels, horses, sheep) domestic animals (cats and dogs) and wild animals that includes only foxes. Among the total number of outbreaks, 60% were in farm animals (296 outbreaks), 13% in domestic animals (69 outbreaks), and 27% in wild animals (131 outbreaks). Rabies outbreaks have the following attributes, which are relevant for further analysis: geographic coordinates (latitude, longitude); date of the outbreak; number and type of infected animals. The data used in this study were presented in the format of Microsoft Excel spreadsheet, and converted to ESRI shapefiles (http://www.esri.com/) for cartographic representation. Figure 1 shows a map of RK overlaid with cases of rabies in the three sub-populations mentioned above.

Methods The first stage of the work was to identify space-time clusters of outbreaks in each of the three categories by means of Kulldorff Scan statistics (Kulldorff et al., 2005). The space-time permutations type of analysis was chosen. The preliminary analysis of the data using the software tool Multi-distance Spatial Cluster Analysis (Ripley’s K-function) was performed in order to determine the maximum distance of spatial outbreak clustering, which gives an idea of the maximum spa-

Table 1. List of animals registered as infected by rabies in the 2010-2013 period. Species Camel Cat Cow Dog Fox Horse Sheep

Infected animals (n) 10 1 237 68 127 24 29

Figure 1. Geographical location of the Republic of Kazakhstan and its first-level administrative divisions with reported rabies cases in animals, 2010-2013.

[page 200]

[Geospatial Health 2016; 11:455]


gh-2016_2.qxp_Hrev_master 31/05/16 11:45 Pagina 201

Article

tial cluster size (Mitchell, 2005). After detection of the spatio-temporal clusters, only those that were statistically significant at P≤0.05), and where the number of outbreaks were sufficiently numerous for the construction of an epidemic curve (at least seven outbreaks), were selected for further analysis. This curve was constructed for each of the selected clusters based on the available data about the dynamics of rabies cases at the second stage of the analysis. Time step of the epidemic curve implies a period equal to the average incubation period of rabies for the respective category of animal. This is based on the assumption that the animal may already be a source of infection for other susceptible animals during the incubation period, and hence the duration of the incubation period can be taken as the duration of infectivity. Literature data on rabies (Wunner and Jackson, 2010; Alabama Department of Public Health, 2015) suggest that the duration of incubation period may be: 210 weeks for cattle, but may take up to 6 months; and 3-8 weeks usually for dogs, but can last for up to 8.5 months. The duration of the infectious period for foxes was taken to be similar to that in dogs. Based on these ranges, durations of the infectious periods (D) were modelled using Pert distribution (Van Hauwermeiren and Vose, 2009) with the following parameters: for cattle (farm animals) – Pert (0; 7.3; 25); for dogs, cats and foxes (domestic and wild animals) – Pert (0; 4.5; 36). This would allow accounting for uncertainties in the duration of the infectious period and getting the mean values and limits of 95% confidence intervals (CI) during the further analysis. The means for the two above distributions amounted to 9 weeks. This value was used as a

time step ( t) for the epidemic curves. In order to model the possible number of new rabies outbreaks in the event of an epidemic in a new territory, the simplest technique based on the concept of basic reproductive ratio was applied. This approach refers to the analysis of the epidemic curve and assumes exponential growth of the number of new cases since the beginning of the epidemic until its maximum (peak) (Dietz, 1993; Heffernan et al., 2005). The key concept of this approach is the basic reproductive ratio (R0), which shows the average number of secondary infections, which can be caused by one infected individual during its infectious period. The formula for calculating R0 from epidemiological data is as follows:

eq. 1

where: D is the duration of the infectious period, i.e. the period of time during which the infected individual (or herd) can be a source of infection for other susceptible animals (herds); t is time interval between two observations at which N1 and N2 outbreaks were recorded. t also acts as a time step of epidemic curve. Values of R0 greater than 1 usually indicate an increase of an epidemic, while values less than 1 show an attenuation. R0=1 indicates creeping endemic course of the disease without a sharp rise. Thus,

Figure 2. Statistically significant cluster of rabies cases in farm species.

[Geospatial Health 2016; 11:455]

[page 201]


gh-2016_2.qxp_Hrev_master 31/05/16 11:45 Pagina 202

Article

knowing the value R0, the possible number of new outbreaks may be estimated by the inverse formula:

values R0 for a time period of 20 weeks by means of equation 2. The herd immunity threshold Ht was calculated using equation 3, which in the case of rabies shows the proportion of susceptible population that must be vaccinated to prevent an epidemic development.

Software eq. 2

In addition, knowledge of the basic reproductive ratio (R0) enables the calculation of the herd immunity threshold (Ht), i.e. the share of a susceptible livestock that should be vaccinated (alternatively, disposed of) to prevent an epidemic. According to Anderson and May (1992), Ht can be calculated using the formula:

eq. 3

Geospatial analysis, data processing and visualisation were performed using geographic information system ArcGIS 10.3.1 (http://www.esri.com/). Identification of space-time clusters was performed by means of SatScan 9.1 software package (Kulldorff, 2015). Construction of epidemic curves, fitting of approximating exponential curves, and simulation by the Monte Carlo method were performed using the software package @Risk 6.2 (http://www.palisade.com/risk/) based on Microsoft Excel.

Results Cluster detection

After building the epidemic curve for each cluster, approximating exponential curves were fitted to initial segments of the curves (phases of growth) by means of the least square method (Wolberg, 2006), after which R0 values were calculated by equation 1. Accounting for uncertainty in duration of infectious period D, expressed by the distributions, allowed estimating the average values of R0 and the limits of CI 95%. Modelling was performed using the method of Monte Carlo simulations at 10,000 iterations (Vose, 2008). Then, simulation of the possible number of new rabies cases was performed using the obtained

The analysis of rabies cases during 2010-2013 using Multi-distance Spatial Cluster Analysis (Ripleyâ&#x20AC;&#x2122;s K-function) procedure showed that the maximum distance, at which grouping of cases can be observed is about 500 km. Therefore, a maximum search radius of 250 km was adopted to identify potential clusters. In so doing only one statistically significant cluster that also had a sufficient number of outbreaks for the construction of an epidemic curve was revealed for each sub-population of animals. These clusters are represented in the maps in Figures 2-4.

Figure 3. Statistically significant cluster of rabies cases in domestic species.

[page 202]

[Geospatial Health 2016; 11:455]


gh-2016_2.qxp_Hrev_master 31/05/16 11:45 Pagina 203

Article

In the category of farm animals, the clusters were mainly found among cattle, while in the category of domestic animals, it was found among dogs. Clusters representing foxes (the wild animal category) were also observed. The initial segments of epidemic curves plotted for the three identified clusters, including fitted exponential curves, are shown in Figures 5-7.

The basic reproductive ratio The following values were obtained by calculation of the basic reproductive ratio: R0=1.62 (1.11-2.26) for farm animals; R0=1.84 (1.08-3.13) for foxes (wild animals); R0=close to 1 for domestic animals with no significant growth seen in the available data. These values confirm the hypothesis of the secondary character of epidemics in the domestic animals since, apparently, epidemics in these animal populations do not tend to develop independently but are supported by contact with wild animals.

Modelling of epidemic size Taking into account the values obtained for the basic reproductive ratio R0, modelling of the possible number of outbreaks, according to equation 2, in each of the three categories of animals for a period of 20 weeks demonstrated the following results: 5 (1-16) new outbreaks may occur among the farm animals; 1 outbreak could occur among the domestic animals but there is a large uncertainty, because their number is obviously highly dependent on the presence of outbreaks among the wild animals; 17 (1-113) new cases of rabies may occur among the wild animals.

Calculation of the required proportion of vaccination Calculation of herd immunity threshold using equation 3 gave the following values of the proportion of livestock that would need to be vaccinated to prevent an epidemic development: 36% (10-56%) of livestock should be vaccinated in the category of farm animals; 41% (768%) of livestock should be vaccinated in the category of wild animals; an adequate calculation of the herd immunity threshold for the domestic animals was not possible due to the proximity of R0 to 1.

Discussion According to the World Health Organization (WHO), rabies ranks fifth among all infectious diseases in terms of economic effects (WHO, 2015). Rabies occupies a special place among the variety of zoonotic infectious diseases because this virus affects almost all warm-blooded animals along with humans. Therefore, the problem of rabies must be jointly studied by medical and veterinary professionals. Indeed, the risk of the disease spreading among animals and humans has not decreased in RK in recent years. Activation of natural foci of rabies is periodically seen in almost all regions of the country, with a growing number of cases among wild carnivores and domestic animals as well as farm animals. Some authors have noted a certain correlation of rabies registration among farm livestock and domestic carnivores during studies of interspecies transfer between key populations of susceptible animals, thus suggesting a possible risk not only among wild animals but also of man-

Figure 4. Statistically significant cluster of rabies cases in wild species.

[Geospatial Health 2016; 11:455]

[page 203]


gh-2016_2.qxp_Hrev_master 31/05/16 11:45 Pagina 204

Article

ifestation of an urban type of rabies (Shestopalov et al., 2001; Abdrakhmanov et al., 2010; Abdrakhmanov and Eseneva, 2012). The analysis shows the presence of the positive dynamics of rabies epidemics in the categories of farm animals and wild animals. An epidemic development, with an increase in the number of infected animals (herds), is possible in these populations in the event of new outbreaks within high-risk areas. Rabies in the population of dogs and cats is apparently supported only as a result of contact with a wild animal population, and the epidemic would decay in the absence of such contacts. The epidemic curves plotted for the three identified clusters and the fitted exponential curves (Figures 5-7) confirm the hypothesis of the secondary character of epidemics in domestic animals since these animals are thought to have been infected by contact with wild animals. The values found agree well with the results of other authors (Hampson et al., 2009; Zinsstag et al., 2009). Active manifestations of an epidemic process of the same infectious disease in different areas (farms, settlements and administrative areas, etc.) will always vary due to exposure to various anthropogenic and biogenic factors. The basic and additional evaluation criteria are present in each disease, the nature and intensity of which determines the growth or fading of any potential epidemic process. Given the basic reproductive ratio of rabies epidemic process in RK in three of the most significant animal populations, an attempt was made to determine the appropriate proportion of animals needing vaccination, which is the main, general measure to combat rabies. Our results show that, on average, 36% of the animals should be vaccinated in the farm animal category, while the figure increased to 41% for the category of wild animals. The resulting ratio of vaccinations among the productive and wild carnivores (the main natural reservoir) corresponds to the current vaccination in the country. As for domestic carnivores, the situation is different: with R0 equal or close to 1, the disease situation has the character of endemicity without epidemic manifestations. This outcome could be due to the reduction in the number of stray carnivores and successful control of the population within the period subjected to the analysis (2010-2013). The geographical overlap of the clusters of domestic and wild animals (Figures 3 and 4), as well as the matching time period (SpringAutumn 2010), indicate the possibility for mutual contamination of the two populations. The proximity of the two clusters to the national border should be noted, as it can be interpreted as a possible transboundary transfer of rabies in the population of wild animals with further spread into the population of domestic animals. It should be noted that the only statistically significant spatio-temporal cluster of rabies in the population of farm animals was detected in the East Kazakhstan region, where the highest population of susceptible species was recorded (about 870 thousand heads compared to 300 400 thousand heads in other areas as of 2014) according to the Veterinary Service of RK. This can be seen as an indication that the potentially most likely rabies epidemic would develop in high-density populations of susceptible livestock. In areas with relatively low population densities, only occasional pockets of rabies usually arise and they do not lead to sharp increases of the number of cases. In this regard, our analysis can be considered a worst-case scenario, and the predicted number of possible outbreaks reflects a highest possible outcome of a potential epidemic. The continuing unfavourable epidemic situation with regard to rabies in the country is due to the widespread of feral rabies, poor performance on the regulation of the number of wild animals and organisation of oral immunisation, the increase of number of stray animals in urban and rural areas, gross violations of farm animals housing, poor organisation of accounting and registration as well as insufficient [page 204]

public awareness campaign among the human population. Further work on the development of the model used may rely on the creation of an integrated model of rabies epidemic process, including the interaction between different populations, and accounting for data on vaccination. The results obtained have been used to compile a set of methodological recommendations regarding the organisation of preventive and contra-epizootic measures against rabies, which are expected to be applied by state veterinarian services.

Figure 5. Epidemic data with fitted exponential curve for statistically significant cluster of rabies cases in farm species.

Figure 6. Epidemic data with fitted exponential curve for statistically significant cluster of rabies cases in domestic species.

Figure 7. Epidemic data with fitted exponential curve for statistically significant cluster of rabies cases in wild species.

[Geospatial Health 2016; 11:455]


gh-2016_2.qxp_Hrev_master 31/05/16 11:45 Pagina 205

Article

Conclusions Our study allows the estimation of basic reproductive ratio R0 for rabies in different animals at the population level. The results demonstrate that: i) R0 can be higher than 1 resulting in the epidemic growth in farm livestock if there is a high density of animals in the susceptible population; and ii) R0 can be higher than 1 in the population of wild carnivores, together with geographical overlap of outbreak clusters in wild and domestic species, thus indicating close interaction among epidemic processes in these sub-populations with possibilities of transfer of the disease from wild to domestic animals. Therefore, the forecast of possible size of a new epidemic should be considered a worst-case scenario when there is a high density of susceptible animals in the populations.

References Abdrakhmanov SK, Eseneva SS, 2012. The role of various animal species in the epizootic process of rabies. In: Proceedings of International Conference Problems of control of dangerous, exotic and zooanthroponosis diseases of animals dedicated to 70-years jubilee of professor N.G.Asanov, 1:23-8. Abdrakhmanov SK, Sytnic II, Tursunkulov SZh, 2010. Visualization and analysis of veterinary and geographical rabies spread by using GIS technologies. In: Proceedings of the 5th International Scientific Practical Conference, 2010 March 17-18, Barnaul. AGAU Publ., Barnaul, Russia, pp 283-6. Alabama Department of Public Health, 2015. Management of animals that bite humans. Available from: http://www.adph.org/epi/assets/ MgmtAnimalBite15.pdf Anderson RM, May RM, 1992. Infectious diseases of humans. Dynamics and control. Oxford University Press, Oxford, UK. Dietz K, 1993. The estimation of the basic reproduction number for infectious diseases. Stat Methods Med Res 2:23-41. Eckardt M, Freuling C, Muller T, Selhorst T, 2015. Spatio-temporal analysis of fox rabies cases in Germany 2005-2006. Geospat Health 10:313. Hampson K, Dushoff J, Cleaveland S, Haydon DT, Kaare M, Packer C, Dobson A, 2009. Transmission dynamics and prospects for the

elimination of canine rabies. PLoS Biology 7:3. Heffernan JM, Smith RJ, Wahl LM, 2005. Perspectives on the basic reproductive ratio. J Roy Soc Interface 2:281-93. Iglesias I, Munoz MJ, Montes F, Perez A, Gogin A, Kolbasov D, de la Torre A, 2015. Reproductive ratio for the local spread of African swine fever in wild boars in the Russian Federation. Transboundary Emerg Dis 10.1111/tbed.12337. Iglesias I, Perez AM, Sanchez-Vizcaino JM, Munoz MJ, Martinez M, De La Torre A, 2011. Reproductive ratio for the local spread of highly pathogenic avian influenza in wild bird populations of Europe, 2005-2008. Epidemiol Infect 139:99-104. Kulldorff M, 2015. Information management services, Inc. SatScanTM v.9.1: software for the spatial and space-time scan statistics. Available from: http://www.satscan.org/ Kulldorff M, Heffeman R, Hartman J, Assuncao RM, Mostashari FA, 2005. A space-time permutation scan statistic for the early detection of disease outbreaks. PLoS Medicine 2:216- 24. Makarov VV, Sukhareva OI, Gulyukin AM, Litvinov OB, 2008. The trend of rabies spread in Eastern Europe. Vet Med 7:20-2. Mitchell A, 2005. The ESRI guide to GIS analysis. Vol. 2. ESRI Press, Redlands, CA, USA. Shestopalov AM, Kissurina MI, Gruzdev KN, 2001. Rabies and its distribution in the world. Problems of virology. 2:7-12. Smreczak M, Orłowska A, Trębas P, Żmudziński JF, 2012. Rabies epidemiological situation in Poland in 2009 and 2010. Bull Vet Inst Pulawy 56:115-266. Van Hauwermeiren M, Vose D, 2009. A compendium of distributions. Vose Software, Ghent, Belgium. Vose D, 2008. Risk analysis: a quantitative guide. 3rd ed. Wiley, Hoboken, NJ, USA. WHO, 2015. Rabies. Available from http://www.who.int/rabies/en/ Wolberg J, 2006. Data analysis using the method of least squares. Springer-Verlag, Berlin-Heidelberg, Germany. Wunner WH, Jackson AC, 2010. Rabies: scientific basis of the disease and its management. Academic Press, San Diego, CA, USA. Youla AS, Traore FA, Sako FB, Feda RM, Emeric MA, 2014. Canine and human rabies in Conakry: epidemiology and preventive aspects. Bull Soc Pathol Exot 7:19-21. Zinsstag J, Durr S, Penny MA, Mindekem R, Roth F, Menendez Gonzalez S, Naissengar S, Hattendorf J, 2009. Transmission dynamics and economics of rabies control in dogs and humans in an African city. P Natl Acad Sci USA 106:35.

[Geospatial Health 2016; 11:455]

[page 205]


gh-2016_2.qxp_Hrev_master 31/05/16 11:45 Pagina 206

Geospatial Health 2016; volume 11:432

Mapping, cluster detection and evaluation of risk factors of ovine toxoplasmosis in Southern Italy Roberto Condoleo,1 Vincenzo Musella,2 Maria Paola Maurelli,3 Antonio Bosco,3 Giuseppe Cringoli,3 Laura Rinaldi3 1IZSLT, Institute for Experimental Veterinary Medicine of Latium and Tuscany, Rome; 2Department of Health Sciences, Magna GrĂŚcia University, Catanzaro; 3Department of Veterinary Medicine and Animal Productions, University of Naples Federico II, Regional Centre for Monitoring of Infectious Diseases, Campania Region, Naples, Italy

Abstract Toxoplasmosis, an important cause of reproductive failure in sheep, is responsible for significant economic losses to the ovine industry worldwide. Moreover, ovine meat contaminated by the parasite Toxoplasma gondii is considered as a common source of infection for humans. The aim of this study was to develop point and risk profiling maps of T. gondii seroprevalence in sheep bred in Campania Region (Southern Italy) and analyse risk factors associated at the flock-level. We used serological data from a previous survey of 117 sheep flocks, while environmental and farm management information were obtained from an analysis based on geographical information systems and a questionnaire purveyance, respectively. An univariate Poisson regression model revealed that the type of farm production (milk and meat vs only meat) was the only independent variable associated with T. gondii positivity (P<0.02); the higher within-flock seroprevalence in milking herds suggests that milking practices might influence the spread of the infection on the farm. Neither environmental nor other management variables were significant. Since a majority of flocks were seasonally or permanently on pasture, the animals have a high

Correspondence: Laura Rinaldi, Department of Veterinary Medicine and Animal Productions, University of Naples Federico II, via Della Veterinaria 1, 80137 Naples, Italy. Tel: +39.081.2536283 - Fax: +39.081.2536280. E-mail: lrinaldi@unina.it Key words: Toxoplasma gondii; Sheep; Epidemiology; Geographical information systems; Italy. Received for publication: 30 November 2015. Revision received: 13 May 2016. Accepted for publication: 16 May 2016. ŠCopyright R. Condoleo et al., 2016 Licensee PAGEPress, Italy Geospatial Health 2016; 11:432 doi:10.4081/gh.2016.432 This article is distributed under the terms of the Creative Commons Attribution Noncommercial License (CC BY-NC 4.0) which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

[page 206]

exposure to infectious T. gondii oocysts, so the high within-flock seroprevalence might derive from this management factor. However, further studies are needed to better assess the actual epidemiological situation of toxoplasmosis in sheep and to clarify the factors that influence its presence and distribution.

Introduction The cosmopolitan protozoan Toxoplasma gondii, able to infect humans and all warm-blooded animals, has a complex life-cycle that involves both intermediate (mammals and birds) and definitive hosts (felids). Hosts are infected through the ingestion of either meat that contains bradyzoites or of oocysts present in the soil or water (Dubey, 2009). Infection by T. gondii in sheep is recognised as a major cause of infectious reproductive failure in several countries of the world by causing foetal resorption, abortion at any stage of pregnancy, foetal mummification, stillbirth or weak offspring (Guo et al., 2015). Toxoplasmosis also causes heavy economic losses to the sheep industry worldwide (Tenter et al., 2000; Innes et al., 2009). Furthermore, infected sheep meat is a relevant source of T. gondii infection for humans (Dubey, 2009; Guo et al., 2015). A risk assessment study estimated that consumption of undercooked ovine meat is responsible for 14.0% of meat-related T. gondii infections in the Dutch population (Opsteegh et al., 2011). Due to the high likelihood of infection of small ruminants, mostly by the horizontal route (ingestion of oocysts), sheep could serve as an indicator of the T. gondii environmental contamination in a given area. Dubey (2009) reviewed data on T. gondii seroprevalence in sheep since 1988 in different parts of the world, showing high values (up to 95.7%), but the different studies were not comparable, because different serological tests had been used and different cut-off values applied. Similarly, the review by Rinaldi and Scala (2008) on toxoplasmosis in livestock in Italy shows high seroprevalence of T. gondii in sheep (up to 88.6%). However, this study did not find a uniform distribution in the country as a whole; which could depend on the adoption of different laboratory techniques or different sampling methods of farms and animals as well as specific, environmental or management factors. Only few studies have focused on particular risk factors associated with T. gondii seropositivity in sheep (reviewed in Dubey, 2009) and even if spatial investigations have been conducted to study the parasitic infection in domestic animals (Casartelli-Alves et al., 2015; Djoki et al., 2014; Afonso et al., 2013) and wild animal species (Miller et al., 2004; Johnson et al., 2009; Ahlers et

[Geospatial Health 2016; 11:432]


gh-2016_2.qxp_Hrev_master 31/05/16 11:45 Pagina 207

Article

al., 2015; Chadwick et al., 2013; VanWormer et al, 2013, 2014), geospatial tools have rarely been used for the detection of clustering and/or identification of environmental factors which could affect the T. gondii seroprevalence in small ruminants. The objectives of this study were to: i) develop T. gondii distribution maps and risk profiles based on seropositivity in sheep bred in an Italian region (Campania) where sheep farming is important; and ii) further explore the environmental and management risk factors for T. gondii infection in sheep. For both aims, geographical information system (GIS) technology was used as the analytical tool (Rinaldi et al., 2015).

Materials and Methods Data source The Italian Animal Register (2016) reports that 194,310 animals are currently bred in 6332 ovine farms (averagely 30 sheep per flock) in Campania Region (Southern Italy). In this region, sheep are usually raised by an extensive rearing system (animals are not confined but pastured most of the time). Serological data on T. gondii exposure were derived from a regional cross-sectional survey of 117 sheep farms in the Campania Region (Fusco et al., 2007). The study had been

designed to test 10 adult sheep (>18 months old) randomly chosen from each farm; 333 of 1170 animals (28.5%) were positive by the immunofluorescent antibody technique (IFAT) at the titre â&#x2030;Ľ1:200, while 77.8% of the flocks had at least one seropositive sheep (for details, see Fusco et al., 2007). It should be noted that serum samples were uniformly collected at ovine farms using an identified grid-based approach within a GIS as reported in Fusco et al. (2007) in order to uniformly sample the farms throughout the entire region: a grid representing quadrants of 10x10 km was overlaid on the regional map within the GIS. As a result, the territory of the Campania Region was divided into equal quadrants, the centroid of each quadrant was identified and the farm closest to this centroid was selected among all the farms present in the GIS database. The number of animals per farm varied between 50 and 1350 with an average size of 237 heads per flock. However, no risk factor analysis was performed in the mentioned prevalence study by Fusco et al. (2007).

Geographical information system A GIS was used to integrate the data layers on environmental features including administrative boundaries (at the provincial and municipal levels), land cover, elevation, slope direction (aspect) and degree of steepness. Furthermore, the farms chosen for sampling were geo-referenced by inserting their longitudes and latitudes into the GIS as shown in Figure 1. The buffer generation analysis function of GIS

Figure 1. Farms inserted in the geographical information system (geo-referenced), their serological status and generation of circular buffer zones of 1.5-km diameter around each geo-referenced point.

[Geospatial Health 2016; 11:432]

[page 207]


gh-2016_2.qxp_Hrev_master 31/05/16 11:45 Pagina 208

Article

was used to generate circular buffer zones of 1.5 km diameter around each geo-referenced point (Figure 1).

Corine Land Cover Data on the land cover of the study area were obtained by the Corine Land Cover (CLC) map (version 8/2005; European Environment Agency, Copenhagen K, Denmark), which has a spatial resolution of 100 m. CLC categorises land cover (with some information on land use) hierarchically into three levels and 44 classes. Level 1 includes 5 classes, which correspond to the main categories of land cover/land use (artificial areas, agricultural areas, forests and semi-natural areas, wetlands and water surfaces); level 2 (15 classes) covers physical and physiognomic entities in more detail (e.g., urban zones, type of forests, types of water bodies, etc.); while level 3 is composed of 44 classes based on even more detailed information. CLC was elaborated based on the visual interpretation of satellite-generated images [e.g., SPOT (Spot LLC, Covington, LA, USA), LANDSATTM (United States Geological Survey, Reston, VA, USA) and and mobile satellite services, the latter referring to networks of communications satellites intended for use with mobile and portable wireless telephones]. Ancillary data (i.e., aerial photographs, topographic or vegetation maps, statistics and local knowledge) were used to refine interpretation and the assignment of the territory to the CLC class. For each buffer zone identified with the GIS the predominant CLC class was considered.

Elevation, slope and aspect Data on elevation, slope steepness and aspect of the study area were obtained from a digital elevation model (DEM) having 40 m spatial resolution. Aspect was divided into the following eight classes: North (337.5-360° and 0-22.5°), North-East (22.5-67.5°), East (67.5-112.5°), South-East (112.5-157.5°), South (157.5-202.5°), South-West (202.5247.5°), West (247.5-292.5°) and North-West (292.5-337.5°). The slope steepness was divided into the following four classes: flat (0°), low (115°), medium (16-30°) and high (31-54°).

Mapping and clustering In order to display the spatial distribution of T. gondii detected at the sheep farms (here used as epidemiological units), farm distribution maps were drawn within the GIS. The clustering of test-positive farms was investigated based on location determined by exact coordinates and using two software applications: i) the spatial scan statistic (SatScan) as described by Kulldorff (1997) choosing the analysis approach Purely Spatial Probability Model, Discrete Poisson Scan for Areas with High or Low Rates; and ii) the ArcGis 9.3 (ESRI, Redlands, CA, USA) tool Average Nearest Neighbor Procedure. The latter approach measures the distance between each feature centroid and its nearest neighbour centroid location, then averages all the nearest neighbour distances. If the average distance is less than the average for a hypothetical random distribution, the distribution of the features being analysed should be considered as clustered. If the average distance is greater than a hypothetical random distribution, the features should instead be considered as dispersed.

Questionnaire management data Different management variables (type of production, number of animals, presence of other domestic animals at sheep farms, frequency of domestic slaughtering, frequency of grazing, transhumance, elevation of grazing area, size of grazing area, water sources in the main grazing area) related to farm and pasture typology were included in the analysis. This information was obtained by distributing questionnaires to all participating farm owners.

Statistical analyses The results of the IFAT serological tests at the farm level (including the number of positive animals per farm) and independent variables associated with farms (environmental data as well as farm management data obtained from the questionnaires) were recorded and double-checked in an Excel spreadsheet (Microsoft, Redmond, WA, USA). A univariate Poisson regression model was used to assess the association between the within-flock prevalence and each risk factor using

Table 1. Mean seroprevalence of Toxoplasma gondii in sheep farms (n=93) and environmental factors for the univariate Poisson regression model. Environmental factor Land cover class

Aspect

DEM

Slope

Specifications

Farms (n)

Mean prevalence (%)

P

Artificial surfaces Agricultural areas Forest and semi-natural areas 1 2 3 4 5 6 7 8 0-500 amsl 501-1000 m amsl Above 1000 m amsl Flat Low Medium High

4 60 29 19 4 11 11 13 19 9 7 57 29 7 14 68 8 3

30.0 38.0 31.0 40.5 25.0 31.8 48.1 31.5 31.6 31.1 37.1 35.6 36.6 30.0 43.5 34.4 37.5 16.7

0.22

DEM, digital elevation model; amsl, above mean sea level.

[page 208]

[Geospatial Health 2016; 11:432]

0.19

0.70

0.12


gh-2016_2.qxp_Hrev_master 31/05/16 11:45 Pagina 209

Article

the number of positive animals in a farm as the dependent variable (Cenci-Goga et al., 2013) and a Log-link function. Farms with presence of dubious reactive animals (serum titres 1:100) or not associated questionnaire data were excluded from statistical analysis of management data (no.=24 farms). The statistical analysis was performed using SPSS (IBM, Armonk, NY, USA).

Results The results of the regression model of environmental and farm data are shown in Table 1. Among our farm management factors, the type of

production (milk or meat) was the unique variable statistically associated with the within-flock prevalence; the occurrence of T. gondii was significantly higher in farms with mixed production than farms where animals are bred only for the meat production (Table 2). Regarding the environmental factors, more than half of the farms were located at an altitude of 0-500 m above the mean sea level (amsl). The average altitude for all the farms was 476.2 amsl (min. 12-max. 1594). Besides, almost all of them were distributed within areas characterised by low slope steepness (0-15°) with a median for all the farms of 5.5° (min 0-max 33.4). The farms had a uniform southern exposure (189.4°). Elevation of the farm was thus not seen to be related to the seroprevalence but we found that the seropositive rate of flocks, which grazed in mountain pastures was lower than flocks which had their

Table 2. Mean seroprevalence of Toxoplasma gondii in sheep farms (n=93) and management factors for the univariate Poisson regression model. Management factor Type of production Number of animals

Presence of sheep from other farms Presence of animals of different species Presence of animals from other farms Presence of bovines Presence of goats Presence of cats Presence of pigs Presence of wild animals Presence of dogs Frequency of domestic slaughtering

Frequency of grazing

Transhumance Elevation of the grazing area

Size of the grazing area

Water sources in the main grazing area

Specifications

Farms (n)

Mean prevalence (%)

P

Meat Milk and meat <101 animals 101-300 animals >300 animals No Yes No Yes No Yes No Yes No Yes No Yes No Yes No Yes No Yes Sporadically Never Often Permanent Seasonal Sporadic No Yes Plain (<300 m asl) Hill (300-700 m asl) Mountain (>700 m asl) Small (≤10 he) Medium (11-99 he) Large (≥100 he) Absence Presence

10 83 31 33 29 81 12 24 69 71 22 62 31 43 50 68 25 82 11 28 65 11 82 58 14 8 68 19 3 74 19 22 36 22 23 27 34 31 62

22.0 37.1 36.8 33.0 36.9 36.7 27.5 37.9 34.6 36.5 32.3 36.8 32.9 37.7 33.6 35.3 36.0 34.8 40.9 40.0 33.5 38.2 35.1 33.3 45.7 27.5 33.7 40.5 40.0 36.8 30.5 33.6 39.4 28.2 36.5 37.4 33.2 36.1 35.2

0.02* 0.64

0.11 0.46 0.36 0.35 0.29 0.87 0.31 0.13 0.61 0.42

0.34

0.19 0.08

0.65

0.81

asl, above sea level. *Statistically highly significant.

[Geospatial Health 2016; 11:432]

[page 209]


gh-2016_2.qxp_Hrev_master 31/05/16 11:45 Pagina 210

Article

main pasture area at lower elevation (however only at P=0.08). The spatial statistical analysis showed that there no clusters correlated to the environmental factors, neither according to SatScan (relative risk: 0, log likelihood ratio: 2.363769 and P=0.98), nor the Average Nearest Neighbor tool (Z score: 0.83 and P=0.71). The results of our statistical analysis did not show association between presence of cats in farm or grazing area and the serological prevalence, neither was any evidence found supporting that the contact with other animal species can be a risk factor for ovine toxoplasmosis (Table 2). We did not find any statistical relationship with the seroprevalence of the parasite and presence of a water source in the main grazing area (Table 2). Figure 2 reports comparison between serological results and the obtained predominant CLC classes, and DEM data (elevation, aspect and slope steepness).

Discussion GIS has been used in other studies to study risk factors of toxoplasmosis in different domestic species (Afonso et al., 2013; Djokic et al.,

2014; Casartelli-Alves et al., 2015). Spatial investigations demonstrated significant variation in seroprevalence of infection in goats between different regions in Serbia (Djokic et al., 2014). In some studies, GIS were used to study the association between prevalence of toxoplasmosis in animals and geographical/climate factors. Afonso et al. (2013) reported that places with a high farm density and cool and moist winter may pose a higher risk of toxoplasmosis in cats. Distance from water sources (>500 m) and proximity to dense vegetation (â&#x2030;¤500 m) were found to influence the probability of infection in chickens (CasartelliAlves et al., 2015). Similar studies have also been conducted on wild animals: sea otters, feral domestic cats and wild felids, foxes and coyotes (Miller et al., 2004; Johnson et al., 2009; Ahlers et al., 2015; Chadwick et al., 2013; VanWormer et al., 2013, 2014). However, this is the first spatial analysis of mapping, cluster detection and risk factors of toxoplasmosis in ovine species based on GIS. Although no significant association with elevation of the farm was detected, we have to consider that sheep-rearing system is not intensive in Campania Region, so sheep are rarely confined to stables. On the contrary, they have access to permanent or frequent grazing (Table 2). As a consequence, taking into account this scenario, the position of the farm may be considered as a minor factor compared to the location of the grazing area.

Figure 2. Overlapping of serological results and predominant Corine Land Cover classes, elevation, aspect, and slope steepness in the study area. [page 210]

[Geospatial Health 2016; 11:432]


gh-2016_2.qxp_Hrev_master 01/06/16 14:01 Pagina 211

Article

CLC permitted the identification of 13 predominant level 3-classes in the study area. They belonged mainly to two level 1-classes, agricultural (64.5%) and forest/semi-natural (31.2%) areas. We did not find any association between seroprevalence and an aspect-independent variable; similar results were reported by Djokić et al. (2014) for goats in Serbia, while Kantzoura et al. (2013) found a significant higher prevalence of the parasite in ovine farms located in forest and urban/crop areas in Greece. Nevertheless, such results are difficult to compare to our study because we mapped territory at a more detailed level (third level of CLC land cover nomenclature), so we had a larger number of land classes in our study than they did. Several studies have analysed farm risk factors associated with T. gondii seropositivity in sheep (Klun et al., 2006; Vesco et al., 2007; Romanelli et al., 2007; Tzanidakis et al., 2012; Cenci-Coga et al., 2013; Alvarado-Esquivel et al., 2013; Kantzoura et al., 2013). However, comparisons between the results of these studies are not straightforward due to differences in epidemiological units (e.g. animal vs farm), laboratory analysis (IFAT vs ELISA) and different risk factors considered. An important finding was the fact that the occurrence of T. gondii was significantly higher in farms with mixed production than farms with only meat production. None of the abovecited studies included this variable in its statistical analysis as a potential risk factor for ovine toxoplasmosis. This finding could be related to the fact that, in a dairy farm, the farmers tend to keep the sheep that produce larger quantities of milk for years and, as a consequence, there is a presence of older animals in the flock. Many studies have shown that the probability of being infected is positively correlated to age of the sheep (Dubey, 2009), therefore a higher seroprevalence in farm with mixed production might be a consequence of such higher average age of animals in the flock. In agreement with many other research groups (CosendeyKezenLeite et al., 2014; Cenci-Coga et al., 2013; García-Bocanegra et al., 2013; Tzanidakis et al., 2012), our analysis did not show any effect of the presence of cats in the farm or grazing area and serological prevalence in the sheep. This is, however, a contended issue reported by several authors (Mainar et al., 1996; Skjerve et al., 1998; Vesco et al., 2007; Romanelli et al., 2007; Andrade et al., 2013). Basically, it must be admitted that the presence of cats in a sheep farm, as well as in grazing pastures, could be a relevant risk factor for ovine, because of the risk of contamination of feed and water with oocysts. For instance, CenciCoga et al. (2013) reported a higher farm prevalence in farms where stray cats are allowed to access to animals’ water, while the presence of resident or stray cats on farm was not significant. Similarly, Romanelli et al. (2007) found a positive relationship between the number of infected sheep and access of cats to farm food deposits. Only very few studies have evaluated if the joint presence of sheep and other species of animals (excluding cat species) in the same farm or pasture can affect the within-flock prevalence of T. gondii (WaltnerToews et al., 1991; Tzanidakis et al., 2012). Using a multivariate model, the former researchers found a higher seroprevalence in ovine farms where pigs were also raised or farms where flocks share the pasture with other species. However, neither our study nor any other supports the idea that the contact with animal species other than the felines can be a risk factor for ovine toxoplasmosis. Despite the fact that type or origin of the water supply has been frequently demonstrated to be a potential risk factor (Waltner-Toews et al., 1991; Vesco et al., 2007; Tzanidakis et al., 2012; Andrade et al., 2013), we did not find any statistical relationship with the seroprevalence of the parasite and presence of a water source in the main grazing area. Separately, we also tested different sources of water (spring, river, stream or lake) but did not find any effect on the within-flock occurrence of T. gondii (data not shown).

In contrast to some authors (Skjerve et al., 1998; Alvarado-Esquivel et al., 2013; Kantzoura et al., 2013), we did not find a clear association between altitude and occurrence of toxoplasmosis. In our results, however, similarly to study by Kantzoura et al. (2013), we found that the seropositive rate of flocks, which grazed in mountain pastures, was indeed lower than flocks, which had their main pasture area at lower elevation but not strongly so. Mountain pasture might be less contaminated with oocysts because of a lower anthropisation and, consequently a lower domestic cat population density; this could be the reason we found differences in seroprevalence in our study. This hypothesis is supported by the fact that prevalence rate of T. gondii is lower in transhumance flocks since this practice consist in moving the sheep to mountain pastures during the summer. Unlike other similar studies, we followed a grid-based approach to select the farms involved in the study. This type of sampling is particularly effective for geospatial investigations because it permits researchers to obtain data from the whole study area and, consequently, allows a more representative picture of the territory. Another strong point is that our study involved a higher number of flocks compared to other similar studies (Kantzoura et al., 2013; Cenci-Goga et al., 2013; Andrade et al., 2013). However, in consideration of the type of regression model we used, the number of animals tested per flock might not have been sufficient to estimate the within-flock prevalence with an adequate level of accuracy.

Conclusions Population-wise, our study mirrored the rearing conditions in Campania Region, so a vast proportion of flocks were seasonally or permanently on pasture (96.0%). Since this type of farming system implies that animals have a high number of opportunities to be exposed to infectious T. gondii oocysts from many different sources of contamination (Guo et al., 2015), differences in levels of within-flock seroprevalence could be expected. However, with exception of the type of production and some environmental/management variables, few issues seemed to affect the risk of ovine toxoplasmosis. Despite advances in our understanding of ovine toxoplasmosis, some aspects of the infection in sheep obviously require further research efforts. Therefore, a coordinated national-scale survey on toxoplasmosis in livestock – based on homogeneous sampling and laboratory techniques – is strongly needed, in order to better assess the actual epidemiological situation of this under-estimated zoonosis in livestock and to clarify factors that influence its presence and distribution.

References Afonso E, Germain E, Poulle ML, Ruette S, Devillard S, Say L, Villena I, Aubert D, Gilot-Fromont E, 2013. Environmental determinants of spatial and temporal variations in the transmission of Toxoplasma gondii in its definitive hosts. Int J Parasitol 2:278-85. Ahlers AA, Mitchell MA, Dubey JP, Schooley RL, Heske EJ, 2015. Risk factors for Toxoplasma gondii exposure in semiaquatic mammals in a freshwater ecosystem. J Wildlife Dis 51:488-92. Alvarado-Esquivel C, Silva-Aguilar D, Villena I, Dubey JP, 2013. Seroprevalence and correlates of Toxoplasma gondii infection in domestic sheep in Michoacán State, Mexico. Prev Vet Med 112:433-7.

[Geospatial Health 2016; 11:432]

[page 211]


gh-2016_2.qxp_Hrev_master 01/06/16 14:12 Pagina 212

Article

Andrade MMC, Carneiro M, Medeiros AD, Neto VA, Vitor RWA, 2013. Seroprevalence and risk factors associated with ovine toxoplasmosis in Northeast Brazil. Parasite 20:20. Casartelli-Alves L, Amendoeira MRR, Boechat VC, Ferreira LC, Carreira JCA, Nicolau JL, Trindade EPF, Peixoto JNB, Magalhães MAFM, Oliveira RVC, Schubach TMP, Menezes RC, 2015. Mapping of the environmental contamination of Toxoplasma gondii by georeferencing isolates from chickens in an endemic area in Southeast Rio de Janeiro State, Brazil. Geospat Health 10:311. Cenci-Goga B, Ciampelli A, Sechi P, Veronesi F, Moretta I, Cambiotti V, Thompson PN, 2013. Seroprevalence and risk factors for Toxoplasma gondii in sheep in Grosseto district, Tuscany, Italy. BMC Vet Res 9:25. Chadwick EA, Cable J, Chinchen A, Francis J, Guy E, Kean EF, Paul SC, Perkins SE, Sherrard-Smit E, Wilkinson C, Forman DW, 2013. Seroprevalence of Toxoplasma gondii in the Eurasian otter (Lutra lutra) in England and Wales. Parasite Vector 6:75. Cosendey-KezenLeite RI, de Oliveira FC, Frazão-Teixeira E, Dubey JP, de Souza GN, Ferreira AM, Lilenbaum W, 2014. Occurrence and risk factors associated to Toxoplasma gondii infection in sheep from Rio de Janeiro, Brazil. Trop Anim Health Pro 46:1463-6. Djokić V, Klun I, Musella V, Rinaldi L, Cringoli G, Sotiraki S, DjurkovićDjaković O, 2014. Spatial epidemiology of Toxoplasma gondii infection in goats in Serbia. Geospat Health 8:479-88. Dubey JP, 2009. Toxoplasmosis in sheep-the last 20 years. Vet Parasitol 163:1-14. Fusco G, Rinaldi L, Guarino A, Proroga YT, Pesce A, De Marco G, Cringoli G, 2007. Toxoplasma gondii in sheep from the Campania region (Italy). Vet Parasitol 149:271-4. García-Bocanegra I, Cabezón O, E. Hernández, Martínez-Cruz MS, Martínez-Moreno Á, Martínez-Moreno J, 2013. Toxoplasma gondii in ruminant species (cattle, sheep, and goats) from Southern Spain. J Parasitol 99:438-40. Guo M, Dubey JP, Hill D, Buchanan RL, Gamble HR, Jones JL, Pradhan AK, 2015. Prevalence and risk factors for Toxoplasma gondii infection in meat animals and meat products destined for human consumption. J Food Protect 8:457-76. Innes EA, Bartley PM, Buxton D, Katzer F, 2009. Ovine toxoplasmosis. Parasitology 136:1887-94. Italian Animal Register, 2016. https://www.vetinfo.sanita.it/ Johnson CK, Tinker MT, Estes JA, Conrad PA, Staedler M, Miller MA, Jessup DA, Mazet JA, 2009. Prey choice and habitat use drive sea otter pathogen exposure in a resource-limited coastal system. P Natl Acad Sci 106:2242-7. Kantzoura V, Diakou A, Kouam MK, Feidas H, Theodoropoulou H, Theodoropoulos G, 2013. Seroprevalence and risk factors associated with zoonotic parasitic infections in small ruminants in the Greek temperate environment. Parasitol Int 62:554-60. Klun I, Djurkovic-Djakovic O, Katic-Radivojevic S, Nikolic A, 2006. Cross-sectional survey on Toxoplasma gondii infection in cattle,

[page 212]

sheep and pigs in Serbia: seroprevalence and risk factors. Vet Parasitol 135:121-31. Kulldorff M, 1997. A spatial scan statistics. Commun Stat A-Theor 26:1481-96. Mainar RC, de la Cruz C, Asensio A, Domìnguez L, Vàzquez-Boland JA, 1996. Prevalence of agglutinating antibodies to Toxoplasma gondii in small ruminants of the Madrid region, Spain, and identification of factors influencing seropositivity by multivariate analysis. Vet Res Commun 20:153-9. Miller MA, Grigg ME, Kreuder C, James ER, Melli AC, Crosbie PR, Jessup DA, Boothroyd JC, Brownstein D, Conrad PA, 2004. An unusual genotype of Toxoplasma gondii is common in California sea otters (Enhydra lutris nereis) and is a cause of mortality. Int J Parasitol 34:275-84. Opsteegh M, Prickaerts S, Frankena K, Evers EG, 2011. A quantitative microbial risk assessment for meatborne Toxoplasma gondii infection in The Netherlands. Int J Food Microbiol 150:103-14. Rinaldi L, Scala A, 2008. Toxoplasmosis in livestock in Italy: an epidemiological update. Parassitologia 50:59-61. Rinaldi L, Biggeri A, Musella V, Waal T de, Hertzberg H, Mavrot F, Torgerson PR, Selemetas N, Coll T, Bosco A, Grisotto L, Cringoli G, Catelan D, 2015. Sheep and Fasciola hepatica in Europe: the GLOWORM experience. Geospat Health 9:309-17. Romanelli PR, Freire RL, Vidotto O, Marana ERM, Ogawa L, De Paula VSO, Garcia JL, Navarro IT, 2007. Prevalence of Neospora caninum and Toxoplasma gondii in sheep and dogs from Guarapuava farms, Parana State, Brazil. Braz Res Vet Sci 82:202-7. Skjerve E, Waldeland H, Nesbakken T, Kapperud G, 1998. Risk factors for the presence of antibodies to Toxoplasma gondii in Norwegian slaughter lambs. Prev Vet Med 35:219-27. Tenter AM, Heckeroth AR, Weiss LM, 2000. Toxoplasma gondii: from animals to humans. Int J Parasitol 30:1217-58. Tzanidakis N, Maksimov P, Conraths F, Kiossis E, Brozos C, Sotiraki S, Schares G, 2012. Toxoplasma gondii in sheep and goats: seroprevalence and potential risk factors under dairy husbandry practices. Vet Parasitol 190:340-8. VanWormer E, Conrad PA, Miller MA, Melli AC, Carpenter TE, Mazet JAK, 2013. Toxoplasma gondii, source to sea: higher contribution of domestic felids to terrestrial parasite loading despite lower infection prevalence. Eco Health 10:277-89. VanWormer E, Miller MA, Conrad PA, Grigg ME, Rejmanek D, Carpenter TE, Mazet JA, 2014. Using molecular epidemiology to track Toxoplasma gondii from terrestrial carnivores to marine hosts: implications for public health and conservation. PLoS Negl Trop Dis 8:e2852. Vesco G, Buffolano W, La Chiusa S, Mancuso G, Caracappa S, Chianca A, Villari S, Currò V, Liga F, Petersen E, 2007. Toxoplasma gondii infections in sheep in Sicily, southern Italy. Vet Parasitol 146:3-8. Waltner-Toews D, Mondesire R, Menzies P, 1991. The seroprevalence of Toxoplasma gondii in Ontario sheep flocks. Can Vet J 32:734-7.

[Geospatial Health 2016; 11:432]


gh-2016_2.qxp_Hrev_master 31/05/16 11:45 Pagina 213

Geospatial Health 2016; volume 11:442

Air quality classification and its temporal trend in Tehran, Iran, 2002-2012 Raheleh Saniei,1 Ali Zangiabadi,1 Mohammad Sharifikia,2 Yousef Ghavidel3 1Department of Urban Planning, Faculty of Geography and Planning, University of Isfahan; 2Department of Remote Sensing, Tarbiat Modares University, Tehran; 3Department of Physical Geography, Tarbiat Modares University, Tehran, Iran

Abstract Airborne particulate matter with a diameter of 2.5 microns or less (PM2.5), as well as slightly bigger particles (PM10), arrive from the westerly direction and collect in the city centre of Tehran, the capital of Iran. The statistical characteristics and daily trend of the air quality index (AQI) in Theran were studied over an 11-year period (20022012). Various statistical analyses were applied including descriptive statistics, correlation analysis, trend analysis and the sequential nonparametric Mann-Kendall test. The significance of the series was investigated by regression analysis and Kriging interpolation. It was found that Tehranâ&#x20AC;&#x2122;s daily AQI increased by 11.8% over the study period, with the frequency distribution of days with good and average air quality showing a strongly declining trend. The AQI of Tehran was shown to contain a large part of PM10 and PM2.5, the latter having the largest contribution (coefficient=0.853).

Introduction Air quality (AQ) and its temporal and spatial variations in a region are largely determined by the nature of anthropogenic activities associated with gaseous and particulate emissions in combination with local prevailing meteorological conditions. Epidemiological studies have established the association between air pollution and increased mortality (Dockery and Pope, 1994) as well as morbidity (Kassomenos

Correspondence: Raheleh Saniei, Department of Urban Planning, Faculty of Geography and Planning, University of Isfahan, Isfahan, Iran. Tel: +98.21.77602838 - Fax: +98.21.82883617. E-mail: crisismanagement2008@gmail.com Key words: AQI; Air pollution; PM10; PM2.5; Time series analysis. Received for publication: 14 December 2015. Accepted for publication: 14 January 2016. ŠCopyright R. Saniei et al., 2016 Licensee PAGEPress, Italy Geospatial Health 2016; 11:442 doi:10.4081/gh.2016.442 This article is distributed under the terms of the Creative Commons Attribution Noncommercial License (CC BY-NC 4.0) which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

et al., 2008). Poor AQ has acute as well as chronic health impacts (Nastos et al., 2010), the severity of which largely depends on the ambient concentration of the air pollutant in question and the time of exposure. While the latter is relatively clear-cut, the former varies depending on local topography, source emission and surrounding meteorological conditions. Out of these, the meteorological variables are mostly responsible for the variation in the ambient concentrations of air pollutants (Banerjee et al., 2011). Concern about air pollution in urban regions is receiving increasing importance worldwide (Chattopadhyay et al., 2010). The urban areas might be viewed as a dense source of anthropogenic emission of pollutants altering the atmospheric composition and chemistry in its downwind regimes that can extend several hundred km from the source (Gupta et al., 2008). Petrol and diesel engines of motor vehicles emit a wide variety of pollutants, principally oxides of nitrogen (NOx), which exert an increasing impact on urban AQ (Mage et al., 1996). Urban air pollution in Iran has increased rapidly with the growth of population, number of motor vehicles and use of fuel with poor environmental concert. Inadequate transportation systems, poor land use patterns and, above all, ineffective environmental regulations add to the problems. A better understanding of the nature of sources of air pollution and the influence of meteorological conditions on AQ profiles is urgently required to guide the development of appropriate strategies for air pollution prevention. As for the health impact of air pollutants, the AQ index (AQI), defined as an indicator of the daily combined effect of the ambient air pollutants (Kumar and Goyal, 2011), can easily be understood by the general public and thus constitute a useful basis for decision-making with respect to pollution mitigation, thereby instituting improved environmental management. The AQI not only provides a measure on how clean or polluted the air is, but it also accounts for its associated health effects. Many published studies focus on the air problems of urban areas because of their special importance and challenges (Junk et al., 2003). Some research compares AQ trends in different metropolitans areas (Wise and Comrie, 2005; Chang and Lee, 2007; Firdaus and Ateeque, 2011; Moradi Dashtpagerdi et al., 2014), the spatio-temporal analysis of urban air pollution (Chan et al, 2009; Yao and Lu, 2014) or urban activities, such as traffic and land use (Branis, 2009; Bandeira et al., 2011). Others deal mainly with the various sources of atmospheric pollutants (Shu et al., 2001; Singh et al., 2013) or air pollution as a consequence of rapid urban expansion (Zhao et al., 2006). It is well documented that there is a strong link between air pollution and specific human health outcomes (Hyun Shin et al., 2009; Naddafi et al., 2012; Vanos et al., 2014). Numerous studies show that among air pollutants, particulate matter (PM), especially with diameters of 10, 2.5 microns or less (PM2.5), have the strongest links with human health effects (Lary et al., 2014). Air pollution of metropolitan areas is one of the major problems of the world at present. In Tehran, it has been created by population

[Geospatial Health 2016; 11:442]

[page 213]


gh-2016_2.qxp_Hrev_master 31/05/16 11:45 Pagina 214

Article

growth and industrial development. Tehran, the capital of Iran, has a population of approximately 13 million people. Located at 35° 42’N and 51° 25’E with an area of 2300 km2, the city is situated in a semienclosed basin just south of the Alborz Mountain chain (with an average height of 2000 m). Tehran has suffered from poor air quality since the oil boom decade of the 1970s and over the last fifteen years rapid population growth has made matters even worse. On some days, the pollution loading of the atmosphere is so high that the impressive Alborz Mountains become invisible from most vantage points. Tehran’s Clean Air Committee stated recently that 10,000 people die every year in cardio-pulmonary disease due to air pollution. Tehran’s level of air pollution exceeds the world standard for acceptable AQ and the Government of the Islamic Republic of Iran has identified the pollution as a high priority environmental and health challenge. An important cause of air pollution are the 70,000 industrial units operating within the metropolitan area and the exhaust from about 0.5 million motorcycles and close to one million other kinds of motor vehicle that operate in the extremely congested road space of Tehran (the average vehicle speed is below 18 km/h). Between 65 to 70% of the total emissions are related to urban transport operations (Bohler et al., 2002). The reported average concentrations of pollutants such as carbon monoxide (CO), sulphur dioxide (SO2) and particulate matter (PM10 and PM2.5) recorded in the city centre in 2007 were two to three times above the average levels recommended by the World Health Organization (WHO) and the United States Environmental Protection Agency (USEPA) (Gharagozlou et al., 2014). The growth in the number of vehicles over the last two years has made the situation even more severe. The present research deals with temporal AQ classification and analysis over the Tehran metropolis based on local standards. Moreover, AQI measurements of the city will be extracted over the different sub-areas. The temporal and spatial AQ during one decade (2002-2012) is expected to provide grounds for analysis and major trends of air pollutants.

Materials and Methods Data from 36 AQ monitoring stations in Tehran (Figure 1) were selected for the current study on air pollution. The monitoring data used included hourly data on SO2, CO, NOX, O3, PM10 and PM2.5 over the 11-year period, 2002-2012. The most polluted days were chosen to show the spatial pattern of the AQI. Due to the fact that in some years the network of monitoring stations did not cover the whole area of Tehran, three years (2009, 2010 and 2011) were selected for the spatial pattern.

eq. 1

where AQI=air quality index for the pollutant; AQIHi=the AQI value corresponding to BPHi; AQILo=the AQI value corresponding to BPLo; BPHi=the breakpoint that is greater than or equal to CONC; BPLo=the breakpoint that is less than or equal to CONC; and Cp=concentration of the pollutant. The AQI varies from 0 to 500. The higher the AQI value, the greater the level of air pollution and the greater the health concerns. An AQI value of 100 generally corresponds to the air quality standard for the pollutant, which is the level that the Environment Protection Authority (EPA) has set to protect public health (Mintz, 2009). AQI values below 100 are generally thought of as satisfactory (Mohan and Kandya, 2007). When AQI values are above 100, air quality is considered to be unhealthy, at first for certain sensitive groups of people, then for everyone as the AQI values get higher. The breakpoint concentrations (high and low) are adjusted to of Iran’s ambient standard for each of the pollutants (Mintz, 2009).

Spatial interpolation In the most general sense, spatial interpolation provides a valuable tool for better understanding of air quality and the extent of our nation’s air pollution problems. Among other applications, spatial interpolation models can be used to define the spatial extent of episodes of unhealthy air quality, to illuminate relationships between different air pollutants and to aid in the design of AQI monitoring systems. However, the degree to which such models can be developed in a successful, applicable manner may strongly depend on the air pollutant in question. Put another way, the chemical properties and atmospheric fate and transport of certain pollutants may present unique problems with important ramifications for spatial interpolation modelling. This study used several spatial interpolation techniques. These techniques include inverse distance weighting (IDW), global polynomial interpolation (GPI), local polynomial interpolation (LPI), radial basis functions (RBF), simple Kriging (SK), ordinary Kriging (OK) and universal Kriging (UK). Kriging is a geostatistical spatial (and potentially also temporal) interpolation method that derives predicted values based on the distance between points in space and treats variation between measurements as a function of distance. Ordinary Kriging – a version of Kriging that assumes the mean as constant but unknown

Air quality index The AQI is an index for reporting daily air quality. It tells us how clean or polluted the air is and what associated health effects might be of concern. The daily AQI was calculated based on the average concentrations of air pollutants at all of the 36 stations were used to obtain percentages of number codes during the desired averaging periods. The air quality index is a piece-wise linear function of the pollutant concentration. At the boundary between AQI categories, there is a discontinuous jump of one AQI unit. Equation 1 (Mintz, 2009) was used to convert from concentration to AQI:

[page 214]

Figure 1. Spatial distribution of monitoring stations in twentytwo urban regions of Tehran, Iran, 2002-2012.

[Geospatial Health 2016; 11:442]


gh-2016_2.qxp_Hrev_master 31/05/16 11:45 Pagina 215

Article

across the spatial domain of interest – has found its way into the environmental sciences and other disciplines. It is an improvement over inverse distance weighting because prediction estimates tend to be less biased and because predictions are accompanied by prediction standard errors (quantification of the uncertainty in the predicted value) (SEs). The semivariogram captures the spatial dependence between samples by plotting semivariance against separation distance and is the basic tool for geostatistics and Kriging. The three basic Kriging methods differ as follows. First, OK assumes absence of drift. It focuses on the spatially correlated component and uses the fitted semivariogram directly for interpolation. Second, UK assumes that the spatial variation in z values has drift or a trend in addition to spatial correlation between sample points. Third, SK assumes that means of the dataset is known. The premise of any spatial interpolation is spatial autocorrelation, i.e. close samples tend to be more similar than distant samples, and property of spatial data is implicitly used in IDW. In Kriging, one must model the spatial autocorrelation using a semivariogram instead of assuming a direct, linear relationship with separation distance. The root mean square error (RMSE) and mean error (ME) validation parameters are used to compare different interpolation techniques, which can be calculated by the following equations:

parative SNMKT index, are calculated by the following equations:

eq. 4

eq. 5

eq. 6

In the equations above, ni is the time-order of the data. The index of Ui has a normal distribution. Because of this, the normal curve table is consulted to detect and test the significance of the trend. To draw the SNMKT graph and realise the degree of significance of the time series trend, the symmetric statistics (E’i, V’i and U’i) must first be calculated. The equations are as follows:

eq. 7 eq. 2 eq. 8

eq. 3

where, ^ Z (xi) is the predicted value, z(xi) the observed (known) value, N the number of values in the dataset and σ2 the Kriging variance for location xi (Webster and Oliver, 2001). The less the value arrived at by these calculations, the better the model. These validation parameters were applied to all of the interpolation techniques applied in this research.

eq. 9

where: N is the statistical duration, or the sample size. The confluence of Ui and U’i in the certainty scope of ±1.96 (the 5% level) indicate significant changes in the time series and the behaviour of Ui after the confluence determines the descending or ascending state of the series. Graphs in which the Ui and U’i lines do not cross, i.e. lack of confluence, represents an indication that the series does not have a trend (Ghavidel and Ahmadi, 2015)

The sequential non-parametric Mann-Kendall test The sequential non-parametric Mann-Kendall test (SNMKT) is used to test for trend significance and abrupt detection in time series. By applying this test on a time series, one can detect the existence and significance of any trend in the data. SNMKT is mainly used in environmental science because it is a simple, robust test that can be used to statistically assess if there is a monotonic upward or downward trend of the variable of interest over time. In other words, the test typically is used for a more specific purpose to determine whether the central value or median changes over time. The spread of the distribution must remain constant, though not necessarily in the original units. If a monotonic transformation such as the ladder of powers is all that is required to produce constant variance, the test statistic will be identical to that of the original units. This statistic is highly recommended for general use by the World Meteorological Organization (WMO), because it detects time series trends without requiring normality or linearity (Wang et al., 2008). The SNMKT has to be done in several steps. First, the data are ranked and ti, i.e. the ratio of rank of t to the ranks before it, is calculated and after that the cumulative frequency of ti, i.e. Σti. Ei (the mathematical expectancy Vi or the variance) and Ui, i.e. the com-

Polynomial trend lines With larger amounts of data, trends become less linear and tend to fluctuate, which is an indication that a polynomial trend has taken over (Ghavidel, 2012). A polynomial trend is useful, for example, when analyzing gains and losses over a large expanse of data. These functions are sometimes expressed as Legendre polynomials, where the order is determined by how many times the direction of the curve changes, i.e. the number of hills and valleys. For example, a second order Legendre polynomial trend line has only one change of direction, while an third order one has two. Polynomial regression models are usually fitted using the method of least squares, which minimizes the variance of the unbiased estimators of the coefficients under the conditions of the Gauss-Markov theorem (Baily, 1993). Polynomial trend models are useful in situations where the analyst knows that curvilinear effects are present in the true response function. With a polynomial model type, we can select a degree between 2 and 8, where the higher polynomial degrees exaggerate the differences between the values of the data. When the number of data increases very rapidly, the lower order terms may therefore have little variation compared to the higher order terms,

[Geospatial Health 2016; 11:442]

[page 215]


gh-2016_2.qxp_Hrev_master 31/05/16 11:45 Pagina 216

Article

a situation that rends the model impossible to estimate accurately. Also, more complicated polynomial models of higher order require more data to estimate properly (Ghavidel, 2012). To find the relation between the AQI and each pollutant and to disclose the amount of each in the index, linear regression was applied. In this method, AQI was the dependent variable (A), with the pollutants as the independent ones and the method applied was the multiple regression.

Results Time series analysis of Tehran’s daily air quality index After calculating and forming the daily time series of the Tehran AQI, the characteristics related to the descriptive statistical parameters of the time series were extracted (Table 1). The most important indicator in Table 1 is the long-term daily average (of all the 4018 study days) of the AQI, which shows that in the long-term Tehran’s air is in fact normal for most of the days, even if it is at the end of the normal average range. However, the fact is that the mean of Tehran’s AQ, calculated based on the arithmetic mean, is not suitable and that the average AQ indicates that AQ is neither good nor dangerous. Indeed, the long-term average value of 88 is unsuitable for seniors, children and people with heart and/or lung disease and it indicates that Tehran AQ is not as good as desired. The frequency of occurrence of large returns in a particular direction is measured by skewness (Skew). This indicator is used in distribution analysis as a sign of asymmetry and deviation from a normal distribution; in other words, Skew measures the degree of asymmetry of a distribution around its mean. The positive Skew of Tehran’s AQI data indicates a distribution with an asymmetric tail extending towards more positive values. With attention to the estimated Skew parameter value that amounts to 63, it was found that most AQI data were concentrated left of the mean, with extreme values to the right (right-skewed distribution). Tehran’s AQI data have a big range (352) that was calculated from differences between the minimum (4 April 2004 with the AQI=29, i.e. an acceptable AQ) and the maximum (5 July 2009 with the AQI=381, i.e. a hazardous AQ). The mode of AQI daily data was 61, which shows that an AQI=69 had the highest frequency of occurrence. The calculated long-term standard deviation (SD) for Tehran’s daily AQI data reached 29. This estimated value for the SD measures how concentrated the data are around the mean; the more concentrated, the smaller the SD. The coefficient of variation (CV) is

the other indicator that indicates a measure of the dispersion of data points in a data series around the mean. The coefficient of variation represents the ratio of the SD to the mean. The estimated CV rate for Tehran’s daily AQI data reached 33%. Tehran’s AQI is a reflection of the temporal variation of air pollutants. Figure 2 shows that the temporal variation of Tehran AQ trend has been increasing over the total study period. The normal range of the AQI varied between 59 as the lower limit (index mean–SD) and 117 as the upper limit (index mean+SD) with figures above the higher limit taken as polluted days and those less than the lower limit considered as clean. As shown in Figure 2, the line marking the temporal variations of Tehran’s AQI shows a positive temporal variation of about 11.8% with a second order Legendre polynomial. The fitted second-order Legendre polynomial trend, positive slope of the trend line and coefficient of determination (R2) as a goodness of fit model mean that the annual maximum AQI will show increase with a high statistical probability in the coming years. The fact that the SNMKT showed a line collision (Ui and U’i) in the certainty scope of 95% (±1.96) indicates a significant variations of series (Figure 3). Ui was seen to be ascending and thus the increasing trend of Tehran’s AQI should continue in this direction in the future with 95% probability. As shown in Figure 4, the maximum AQI has been in an annually increasing trend over the whole period following a secondorder Legendre polynomial and Figure 5 confirms this trend. From Figure 6, it can be concluded that in terms of the frequency of occur-

Figure 2. The temporal variation of Tehran’s air quality index, 2002-2012.

Table 1. Descriptive statistical parameters of the Theran longterm daily air quality index, 2002-2012. Variable Study days (n) Mean SD CV Minimum Median Maximum Range Mode Skew SD, standard deviation; CV, coefficient of variance; Skew, skewness.

[page 216]

Value 4018 88 29 33 29 83 381 352 61 63

Figure 3. The sequential non-parametric Mann-Kendall test for Tehran air quality index, 2002-2012.

[Geospatial Health 2016; 11:442]


gh-2016_2.qxp_Hrev_master 31/05/16 11:45 Pagina 217

Article

rence of the various classes of AQ, the weather condition had a generally poor quality for 1191 days (1 hazardous, 10 very unhealthy, 176 unhealthy and 1003 days unhealthy for sensitive groups). Although the weather was classified as good for 189 days and healthy for 2639 days in the remaining 2826 days (70.3% of the period), it must be concluded that it was unacceptable in 29.7% of the study period. The frequency of occurrence of each of the classes of Tehran’s AQ standards for six categories and their contribution percentage were calculated (Table 2). The temporal trends of annual frequencies of each of the AQ classes for six categories were also calculated (Figure 7; category hazardous is not provided as trend analysis could not be done on the basis of 1 hazardous day only), which shows that 64.8% of the temporal variations of the annual good category and 64.1% of the moderate (healthy) category can be explained by the decreasing polynomial trend. In other words, the number of good and healthy days decreased in the study period. The temporal variation of other categories (unhealthy for sensitive groups, unhealthy and very unhealthy) was thus positive and increasing.

Figure 4. The temporal variation in annual maximum of Tehran’s air quality index, 2002-2012.

Figure 5. Significant annual maximum variation of Tehran’s air quality index using sequential non-parametric Mann-Kendall test.

Figure 6. Classification of Tehran's air quality index for each of the six main standard classes.

Table 2. The Tehran annual air quality classes and frequencies in the period 2002-2012. Category AQI values 2002 2003 2004° 2005 2006 2007 2008° 2009 2010 2011 2012° Total (n) %

Good 0-50

Moderate 51-100

Unhealthy* 101-150

Unhealthy 151-200

Very unhealthy 201-300

8 11 20 23 36 23 13 32 14 6 3 189 4.7

187 191 258 253 254 327 293 291 238 140 207 2639 65.67

145 140 11 86 75 15 57 36 88 209 141 1003 24.96

24 21 7 3 0 0 2 4 24 7 14 176 4.38

1 2 0 0 0 0 1 1 1 3 1 10 0.25

Hazardous 301-400 401-500 0 0 0 0 0 0 0 1 0 0 0 1 0.02

Sum of days 365 365 366 365 365 365 366 365 365 365 366 4018 100

AQI, air quality index. *Only for sensitive groups; leap year.

[Geospatial Health 2016; 11:442]

[page 217]


gh-2016_2.qxp_Hrev_master 31/05/16 11:45 Pagina 218

Article

Spatial distribution of air pollution in Tehran The main air pollutant in the most polluted days of the year is the PM10. Therefore, 10 daily averages (a composite of 10 days classified as a very unhealthy AQ) due to pollution (200-300 parts per million) were plotted to show the AQI spatial distribution for the 22 urban regions of Tehran during these days (Figure 8). In all the selected days that PM10 was responsible for air pollution, dust storm occurred (Table 3). It can be seen that PM10 pollutants in Tehran dominated in the Centre-East of the city with less such pollution in the North and the very South, which is shown in Figure 8 with the focus in regions number 6, 7, 9, 10, 11, 12, 14, 17 and 18 as well as parts of regions number 2, 3, 4, 5, 15, 16, 19, 21 and 22. Thus, the PM10 pollutants enter Tehran from the West carried by local winds due to atmospheric barotropic and inversion conditions (an effect of anticyclons) and they remain stable. In recent years, the main reasons for the decreasing trends of the AQI in the good and moderate categories and the increasing trends in the unhealthy and hazardous categories in Tehran are: i) use of substandard gasoline that does not meet international standards (main reason), which is available in fuel stations of Tehran and the rest of Iran; ii) use of vehicles, especially motorcycles, with sub-optimal environmental motor standards; and iii) road dust (with a share of 95.4% of the

total outdoor PM10 emission in Tehran) that originate from the friction between tires and asphalt and aerosols from the interior deserts of Iran, Middle East and even Africa.

Table 3. The results of interpolation methods used. Model

RMSE

ME

IDW GPI LPI RBF SK OK UK

81.66 90.21 109.30 80.90 89.90 80.87 91.50

15.69 0.73 22.30 5.68 4.18 3.10 3.22

RMSE, root mean square error; ME, mean error; IDW, inverse distance weighting; GPI, global polynomial interpolation; LPI, local polynomial interpolation; RBF, radial basis function; SK, simple Kriging; OK, ordinary Kriging; UK, universal Kriging.

Figure 7. The temporal variation in annual frequencies of each air quality standard category in Tehran.

[page 218]

[Geospatial Health 2016; 11:442]


gh-2016_2.qxp_Hrev_master 31/05/16 11:45 Pagina 219

Article

Calculating the relation between air quality index and pollutants Here, the R2 was found to be 0.985, which shows that 98.5% of variations of the dependent variable (A) can be explained by the independent variables. This coefficient is shown in Table 4. Significance of regression and linear relationship between the variables were checked by applying the analysis of variance (ANOVA) test. The obtained P values, which approve or reject the significance of regression or linear relationship between variables with a 95% confidence level, are shown in Table 5. Considering these values, it can be concluded that the regression was significant and there was a linear relationship between variables; in other words, the assumption of the significance of regression and the linear relationship between variables could be accepted at the confidence level of 95%. The main results of regression are displayed by the coefficients in Table 6. Here, the T-test statistic is shown as calculated for individual regression coefficients (Rs) including their significance levels. Given the values of significance level, the variables with significant effects can be found. The regression equation can be written based on B values in the table. Given the significant level, the equation representing the obtained regression model is as follows: y=1.218 + 0.51 (CO) + 0.207 (PM10) + 0.853 (PM2.5)

eq. 10

Nevertheless, the beta values should be used to find the importance and role of independent variables in the regression equation, as these

values are standardized. Thus, they can be applied to judge the relative importance of variables. The high amount of a variableâ&#x20AC;&#x2122;s beta represents its relative importance and its role on A. Thus, it can be said that PM2.5 had the highest effect on Tehranâ&#x20AC;&#x2122;s AQ.

Discussion There has been no serious study of the AQI trend and impacts of air pollution in Tehran. The most obvious expected effects are those related to human health. Air pollutants, mainly particulate matters, are a major cause of respiratory and cardiovascular diseases, which are highly prevalent in Tehran, particularly in the central business district. The effects of other pollutants on inner city residents are also expected to be considerable. From the geographical point of view, Tehran is wedged between two mountain ranges that trap the fumes of its bumper-to-bumper traffic. The main reason for air pollution is the low standard of gasoline used and emissions from vehicles using low quality and non-standard gasoline. However, the issue of air pollution is very complex and needs detailed further studies as it comes from various sources and as the atmosphere tends to transform it. The fact that the pollution in Tehran enters from the West and moves in an easterly direction towards the city centre means that pollution concentrations are lower in the northern and southern parts of the city. The regression analysis showed that PM2.5, PM10 and CO influenced the

Table 4. Sum of the coefficients of determination for the regression model. R2

Adjusted R2

R2 change SE

0.985

0.985

3.383

R 0.992

Change statistics R2 change F change 0.985

9856.006

R2, coefficient of determination; SE, standard error; F, F statistics.

Table 5. Results of analysis of variance for fitted multiple regression. Model

SS

Regression 676,665.927 Residual 10,298.278 Total 686,964.205

DF

MS

F

P

6 900 906

112,777.655 11.443

9856.006

0.000 Figure 8. Map showing the average risk distribution in Tehran during ten days due to PM10 pollutants.

SS, sum of squares; DF, degree of freedom; MS, mean square; F, F statistics.

Table 6. Coefficients and model estimations for multiple regression. Model

Non-standardised coefficients Beta SE

Constant CO O3 NO2 SO2 PM10 PM2.5

1.218 0.051 -0.013 -0.012 -0.008 0.207 0.853

0.770 0.011 0.009 0.010 0.026 0.010 0.009

Standardised coefficients Beta

T-statistics

P

0.025 -0.006 -0.007 -0.002 0.157 0.855

1.582 4.512 -1.424 -1.178 -0.317 19.844 94.921

0.114 0.000 0.155 0.239 0.752 0.000 0.000

SE, standard error.

[Geospatial Health 2016; 11:442]

[page 219]


gh-2016_2.qxp_Hrev_master 31/05/16 11:45 Pagina 220

Article

AQ of Tehran the most, and the largest contribution of pollutants was of the PM2.5 kind (coefficient of 0.853). It was therefore not surprising that Tehran’s AQ had been reduced by approximately 11.8% between 2002 and 2012, a strongly decreasing trend that indicates that it will continue in the coming years. The frequency distribution of days with good and average quality reveals that the trend of such days declined severely during the study period. With the strong downward sloping trend of the time series, it is highly likely that the number of good (clean) days will be greatly reduced in the future.

Conclusions According to the classification of AQ standards and the position of each study day in the classification presented in Table 1, it can be said that Tehran is not very good for life in general. Statistical analysis of Tehran air quality index shows that Tehran air quality was not a good condition during the 11-year period from 2002 to 2012 and generally during the studied 4018 days air quality was considered unhealthy in 1190 days (29.6%). In the 2828 remaining days (70.4%), the weather has been in a state of healthy and good qualities. The general trend of daily air quality index of Tehran in the 11-year studied period showed an upward trend which points to a 12% decrease in Tehran’s air quality. The mentioned trend is meaningful and shows that in the coming years, there will probably be a decrease of 12% for Tehran AQI.

References Baily WN, 1993. On the product of two Legendre polinomials. PCPS-P Camb Philos S 29:173-7. Bandeira JM, Coelho MC, Elisa Sá M, Tavares R, Borrego C, 2011. Impact of land use on urban mobility patterns emissions and air quality in a Portuguese medium-sized city. Sci Total Environ 409:1154-63. Banerjee T, Singh SB, Srivastava RK, 2011. Development and performance evaluation of statistical models correlating air pollutants and meteorological variables at Pantnagar. India Atmos Res 99:505-17. Bohler T, Karatzas K, Peinel G, Rose T, San Jose R, 2002. Providing multimodal access to environmental data-customizable information services for disseminating urban air quality information in APNEE. Comput Environ Urban 26:39-61. Branis M, 2009. Air quality of Prague: traffic as a main pollution source. Environ Monit Assess 156:377-90. Chan TC, ML Chen, Lin F, Lee CH, Chiang PH, Wang DW, Chuang JH, 2009. Spatiotemporal analysis of air pollution and asthma patient visits in Taipei, Taiwan. Int J Health Geogr 8:1-10. Chang CS, Lee CT, 2007. Evaluation of the trend of air quality in Taipei, Taiwan from 1994 to 2003. Environ Monit Assess 127:87-96. Chattopadhyay S, Gupta S, Saha RN, 2010. Spatial and temporal variation of urban air quality: a GIS approach. J Envir Protect Sci 1:264-77. Dockery DW, Pope CA, 1994. Acute respiratory effects of particulate air pollution. Annu Rev Publ Health 15:107-32. Firdaus G, Ateeque A, 2011. Changing air quality in Delhi, India: determinants, trends, and policy implications. Reg Environ Change J 11:74352. Gharagozlou A, Tayeba A, Dadashi M, Abdolahi H, 2014. Zoning of CO emissions in Tehran in the medium term by using third quartile as the exposure candidate. J Geogr Inf Syst 6:526-32. [page 220]

Ghavidel Y, 2012. Statistical research on analyzing the oscillations and forecasting the time series of Tehran high extreme temperatures. Geogr Plann 39:109-28. Ghavidel Y, Ahmadi M, 2015. Statistical analysis and temporal trend of annual maximum temperatures of Abadan in Southwestern of Iran. Arab J Geosci 8:8219-28. Gupta AK, Karar K, Ayoob S, John K, 2008. Spatio-temporal characteristics of gaseous and particulate pollutants in an urban region of Kolkata. India Atmos Res 87:103-15. Hyun Shin H, Burnett RT, Stieb DM, Jessiman B, 2009. Measuring public health accountability of air quality management. Air Qual Atmos Health 2:11-20. Junk J R, Helbig A, Lüers J, 2003. Urban climate and air quality in Trier Germany. Int J Biometeorol 47:230-8. Kassomenos P, Papaloukas C, Petrakis M, Karakitsios S, 2008. Assessment and prediction of short term hospital admission: the case of Athens, Greece. Atmos Environ 42:7078-86. Kumar A, Goyal P, 2011. Forecasting of air quality in Delhi using principal component regression technique. Atmos Pollut Res 2:436-44. Lary DJ, Faruque FS, Malakar N, Moore A, Roscoe R, Adams ZL, Eggelston Y, 2014. Estimating the global abundance of ground level presence of particulate matter (PM2.5). Geospat Health 8:611-30. Mage D, Ozolins G, Peterson P, Webster A, Orthofer R, Vandeweerd V, Gwynne M, 1996. Urban air pollution in mega cities of the world. Atmos Environ 30:681-6. Mintz D, 2009. Technical assistance document for the reporting of daily air quality – the air quality index (AQI). US EPA Office of Air Quality Planning and Standards, Research Triangle Park, NC, USA. Mohan M, Kandya A, 2007. An analysis of the annual and seasonal trends of air quality index of Delhi. Environ Monit Assess 131:267-77. Moradi Dashtpagerdi M, Sadatinejad SJ, Zare Bidaki R, Khorsandi E, 2014. Evaluation of air pollution trend using GIS and RS applications in South West of Iran. J Indian Soc Remote Sen 42:179-86. Naddafi K, Hassanvand MS, Momeniha F, Nabizadeh R, Faridi S, 2012. Health impact assessment of air pollution in megacity of Tehran, Iran. Iran J Environ Health Sci Eng 9:28. Nastos PT, Paliatsos AG, Anthracopoulos MB, Roma ES, Priftis KN, 2010. Outdoor particulate matter and childhood asthma admissions in Athens, Greece: a time-series study. Environ Health 45:1-9. Shu J, Dearing JA, Morse AP, Yu LZ, Yuan N, 2001. Determining the sources of atmospheric particles in Shanghai, China, from magnetic and geochemical properties. Atmos Environ 35:2615-25. Singh KP, Gupta S, Rai P, 2013. Identifying pollution sources and predicting urban air quality using ensemble learning methods. Atmos Environ 80:426-37. Vanos JK, Hebbern C, Cakmak S, 2014. Risk assessment for cardiovascular and respiratory mortality due to air pollution and synoptic meteorology in 10 Canadian cities. Environ Pollut 185:322-32. Wang W, Chen X, Shi P, van Gelder PHAJM, 2008. Detecting changes in extreme precipitation and extreme stream flow in the Dongjiang River Basin in southern China. Hydrol Earth Syst Sc 12:207-21. Webster R, Oliver MA, 2001. Geostatistics for environmental scientists. John Wiley and Sons, Brisbane, Australia. Wise EK, Comrie AC, 2005. Meteorologically adjusted urban air quality trends in the Southwestern United States. Atmos Environ 39:2969-80. Yao L, Lu N, 2014. Spatiotemporal distribution and short-term trends of particulate matter concentration over China, 2006-2010. Environ Sci Pollut Res 21:9665-75. Zhao SQ, Da LJ, Tang ZY, Fang HJ, Song K, Fang JY, 2006. Ecological consequences of rapid urban expansion: Shanghai. China Front Ecol Environ 4:341-6.

[Geospatial Health 2016; 11:442]


gh-2016_2.qxp_Hrev_master 01/06/16 14:07 Pagina 221

Geospatial Health 2016; volume 11:436

Topographic distribution of gastritis in heavy pigs investigated by a geographic information system approach Ernesto Pascotto,1 Diego Capraro,2 Paolo Tomè,2 Mauro Spanghero2 1Local Health Unit, Asolo (TV); 2Department of Agri-Food, Environmental and Animal Sciences, University of Udine, Udine, Italy

Abstract

Introduction

The aim of this paper was to determine the topographic distribution of gastritis lesions in pigs through an open source geographic information system (GIS) software analysis. The stomachs of 146 Italian heavy pigs were collected at slaughter and subjected to macroscopic pathological examination of the internal mucosa. A total of 623 lesions were either classified as hyperplastic or follicular (97%) with the remaining minority of lesions categorised as atrophic and simple. The hyperplastic gastritis lesions had an average surface of 77.8 cm2 and were mainly located in an oval shaped area of the fundus region of the stomach near the Curvatura ventriculi major. The follicular gastritis lesions had generally a smaller surface (40.3 cm2) and were concentrated in two distinct small areas of the pyloric region. The GIS analysis provided the opportunity to produce useful maps showing the distribution and characteristics of gastritis in pigs.

The ability to create risk maps with respect to the anatomic location of lesions in organs can be an important tool for the identification and diagnosis of several diseases. Geographic information system (GIS) in the format of computer software tools constitutes a powerful approach to store, analyse and manage epidemiological data. Although GIS is mainly applied for the investigation of the localisation and distribution of cases of various diseases in the environment, it can also be used for the positioning of pathology of organs in the body, as described by Daraban et al. (2014), Pascotto et al. (2014), and Imanieh et al. (2015). The aim of present investigation was to determine the topographic distribution of gastritis lesions in pigs through the application of open source GIS software. The research has a practical impact because pigs are sensitive with regard to damage of the gastric mucosa (due to stress and feeding practices) and are considered a suitable animal model for a similar approach in human medicine (Malfertheiner and Ditschuneit, 2011). The topographic approach to gastritis in human medicine has generally used a regional approach by focusing on the involvement of different areas of the stomach (Hebel, 1949). Thereafter, the topographic analysis of gastritis has progressively specialised by using a sectors and points approach, eventually developing into the modern systems described as the Sydney System (Dixon et al., 1996) or the Operative Link for Gastritis Assessment (Rugge and Genta, 2005). However, accurate research using major search engines of abstracts highlights a limited use of the topographic approach to the description and localisation of gastritis lesions, both in human and animal medicine.

Correspondence: Mauro Spanghero, Department of Agri-Food, Environmental and Animal Sciences, University of Udine, via Sondrio 2/A, 33100 Udine, Italy. Tel: +39.432.558193 - Fax: +39.432.558199. E-mail: mauro.spanghero@uniud.it Key words: Pigs; Gastritis; Topographic distribution; Geographic information system; Italy. Contributions: all authors equally contributed to this article. Conflict of interest: the authors declare no potential conflict of interest. Funding: part of this work was supported by AGER project, grant n°20110280. Received for publication: 2 December 2015. Revision received: 23 February 2016. Accepted for publication: 26 February 2016. ©Copyright E. Pascotto et al., 2016 Licensee PAGEPress, Italy Geospatial Health 2016; 11:436 doi:10.4081/gh.2016.436 This article is distributed under the terms of the Creative Commons Attribution Noncommercial License (CC BY-NC 4.0) which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

Materials and Methods The stomachs of 146 Italian heavy pigs (Italian Large White×Duroc Italian) were collected at slaughter and subjected to a macroscopic pathological examination with the aim of detecting internal mucosa damage. For manipulation of the stomachs and analysis, the methodology described by Mason et al. (2013) was followed with use of histopathology to evaluate questionable lesions. Each pathological relief was recorded and characterised based on its macroscopic aspect and extension, then identified with regard to its topographical limits and reported in a georeferenced original map of the stomach internal mucosae. The lesions referred to as gastritis were classified giving them a score of intensity from 1 (mild lesion) to 5 (severe gastritis) and gathered into four categories (Marcato, 2002): i) hyperplastic: mucosa thickened and covered with mucus; ii) follicular: gray-whitish mucosa with swollen lymphatic follicles; iii) atrophic: mucosa having a flattened epithelium, fewer folds or none at all, plus increased connec-

[Geospatial Health 2016; 11:436]

[page 221]


gh-2016_2.qxp_Hrev_master 01/06/16 14:07 Pagina 222

Article

tive tissue; iv) simple: superficial inflammation of the lamina propria, which is sometimes replaced by connective tissue with no presence of lymphoid follicles. The characteristics and localisation of each lesion were collected into a specific, relational Web-based database, created using open-source software that included the Linux Ubuntu v. 10.10 operating system (http://www.ubuntu.com/); the MySQL AB, v. 5.1 (https://www.mysql. com/) relational management system for creation of the database; a user-friendly graphical interface obtained from a Web application framework (Symfony™ v. 1.2.4; SensioLabs, Clichy, France) written in PHP, a server-side scripting language designed for Web development; an Apache HTTP Web Server v. 2.2 (https://httpd.apache.org/); and a GIS platform integrated into the Web database (GeoServer v. 2.0, http://geoserver.org/; and OpenLayers library v.2.0, http://openlayers. org/). In order to identify spatial patterns, hotspots and clustering of lesions, data were processed in an open-source GIS software (QGIS© 2015; OSGeo, Beaverton, OR, USA) by the following two algorithms. First, Adjust n point to polygon (Adjust n point to polygon© 2015; Victor Olaya), through which a new point layer was created with a given number of points added inside the area covered by each lesion and distributed over a regular grid. The number of points for each lesion (relative frequency) was determined by multiplying the lesion area (cm2) with the score of intensity and a constant (useful to correct possible inaccuracies related to the definition of lesion margins). Second, Kernel density estimation (Anderson, 2009), which was applied to the shapefile containing lesions converted to point format. The kernel bandwidth (search radius), specifying the area around a point where its influence is felt, was 9.7 cm. Based on this value, core areas that included the highest number of injuries were defined and used to calculate: i) the area of injuries; ii) the number of injuries intersecting the core area; and iii) the average weighted intensity inside and outside the core area by the Wilcoxon-Mann-Whitney test (R Core Team, 2015).

Results The pigs under investigation had previously been used in three different feeding trials (Zanfi et al., 2014; Capraro et al., 2014; Galassi et al., 2016) and were fed diets containing fibrous feeds (whole-ear corn silage and corn silage, from 15 to 40% of dry matter) or traditional

Figure 1. Density maps of gastritis in pigs. A) Relative frequency of total gastritis identified (n=623); B) relative frequency and core areas of the hyperplastic gastritis (n=406); C) relative frequency and core areas of the follicular gastritis (n=199).

Table 1. Pig body weight at slaughter, stomach and gastritis characteristics. Variable

Mean

SD

Minimum

Maximum

Pig BW at slaughter (kg) Stomach weight (g) Stomach weight (% of BW) Stomach content Weight (g) Dry matter (%) Gastritis type (number per pig) Hyperplastic Follicular Atrophic Simple Gastritis lesion score° Hyperplastic Follicular Atrophic Simple Gastritis area (cm2) Hyperplastic Follicular

168.0 674.3 0.40

10.5 78.3 0.04

130.5 527.3 0.30

195.5 889.0 0.52

1052 21.9

743 4.7

82 8.4

3929 41.3

2.79 1.37 0.07 0.05

1.79 1.31 0.38 0.34

0.0 0.0 0.0 0.0

9.0 7.0 2.0 2.0

1.5 1.6 1.0 1.3

0.7 0.8 0.0 0.5

1.0 1.0 1.0 1.0

5.0 4.0 1.0 2.0

77.8 40.3

95.3 55.7

3.1 1.8

1135.6 531.3

SD, standard deviation; BW, body weight. °Varying from 1 (mild lesion) to 5 (severe gastritis).

[page 222]

[Geospatial Health 2016; 11:436]


gh-2016_2.qxp_Hrev_master 01/06/16 14:07 Pagina 223

Article

corn-soya meal based diets during the last fattening period [from 80 to 160-170 kg body weight (BW)]. Animals were slaughtered (Table 1) at the typical weight adopted in Italy for the production of cured ham prosciutto (160-170 kg BW). The stomachs represented about 0.40% of the BW at slaughter (Table 1) and the amounts of gastric contents varied strongly depending on the diets used and the different periods of fasting pre-slaughtering. The examination of individual stomachs revealed a total of 623 lesions (Table 1 and Figure 1A), which were mainly classified as hyperplastic or follicular (97%), with atrophic and simple gastritis representing a marginal part of the total. There were also differences with respect to the degree of severity of mucosal damage, with the hyperplastic and follicular lesions showing an ample range of variation of the lesion scores with average score values higher than those ascribed to the other gastritis types. The core areas were only defined in hyperplastic and follicular gastritis and intercepted 98.5% (400/406) and 95.5% (190/199) of the lesions, respectively. Kernel density and core area analysis showed that hyperplastic gastritis was primarily found within a single oval-shaped area located in the fundus region near the Curvatura ventriculi major (Figure 1B). Some minor parts of this area also extended into the cardiac and the pyloric regions. Hyperplastic gastritis occupied an average area of 77.8 cm2 (6.8% of the average area of the examined stomachs). The core area (that occupied 44% of the average area of the examined stomachs) included 61.9% of the damaged areas. Follicular gastritis was concentrated in the pyloric mucosa, primarily on the edge of the fundus area (Figure 1C). The kernel maps showed two distinct, small areas of high relative frequency, symmetrically localised with respect to the mid-sagittal plane of the organ. Compared to the hyperplastic lesions, the average area of follicular injuries was smaller (40.3 cm2) and covered 3.5% of the average area of the examined stomachs. They were more concentrated considering that the core area (that was 32.2% of the average area of the examined stomachs) included 64.1% of damaged areas of the stomachs. Finally, the Wilcoxon-Mann-Whitney test showed that hyperplastic gastritis was more intense when affecting the core area compared to areas outside it (1.70 vs 1.51; P<0.001).

Discussion The aim of present work was to describe the localisation and distribution of different gastric mucosal lesions without investigating the aetiology and relationships among different types of lesions. The GIS analysis allowed the production of detailed maps of the internal stomach surface, which highlighted differences in the distribution and concentration of the lesions in most cases of gastritis found. The hyperplastic type of gastritis was widespread over the stomach inner surface, with large lesions mainly localised in the fundus and cardiac areas, with the follicular type less widespread and restricted to the pyloric mucosa. The analysis showed a clear clustering of the distribution of gastric lesions, especially for the hyperplastic gastritis. The localisation of the different lesions was felt to be due to mechanical stress exerted by the feed mass and the distribution of the different glandular types, as well to other factors not yet investigated.

Conclusions The use of GIS for the mapping of lesions in internal organs introduces a novel approach, which represents an improvement in comparison with other databases used for the collection of clinical data. The creation of density maps of the relative frequency of gastritis lesions can facilitate the forming of new insights about the possible relationship among the different types of gastritis, their severity, position and aetiology. The results are potentially transferable to human medicine, as the pig is widely used as an animal model. Moreover, information given by the present results may allow targeted sampling of stomach tissue for large-scale inspections in the abattoirs to detect gastric mucosal damages as indicators of pre-slaughter animal welfare.

References Anderson TK, 2009. Kernel density estimation and K-means clustering to profile road accident hotspots. Accident Anal Prev 41:359-64. Capraro D, Zanfi C, Bassi M, Pascotto E, Bovolenta S, Spanghero M, 2014. Effect of physical form of whole ear corn silage (coarse vs wet milled) included at high dietary levels (30 vs 40% dry matter) on performance of heavy finishing pigs. Anim Feed Sci Technol 198:271-8. Daraban C, Murino C, Marzatico G, Mennonna G, Fatone G, Auletta L, Miceli F, Vulpe V, Meomartino L, 2014. Using geographical information system for spatial evaluation of canine extruded disc herniation. Geospat Health 9:213-20. Dixon MF, Genta RM, Yardley JH, Correa P, 1996. Classification and grading of gastritis. The updated Sydney system. Am J Surg Pathol 20:1161-81. Galassi G, Malagutti L, Rapetti L, Crovetto GM, Zanfi C, Capraro D, Spanghero M, 2016. Digestibility, metabolic utilization and effects on growth and slaughter traits of diets containing corn silage for heavy pigs. Ital J Anim Sci (in press). Hebel RMD, 1949. The topography of chronic gastritis in otherwise normal stomachs. Am J Pathol 25:125-41. Imanieh MH, Goli A, Imanieh MH, Geramizadeh B, 2015. Spatial modeling of colonic lesions with geographic information systems. Iran Red Crescent Med J 17:e18129. Malfertheiner P, Ditschuneit H, 2011. Helicobacter pylori, gastritis and peptic ulcer. Springer-Verlag Berlin, Germany. Marcato PS, 2002. Sistema gastroenterico e peritoneo. In: Marcato PS, ed. Patologia sistematica veterinaria. Edagricole, Bologna, Italy, pp 611-743. Mason F, Pascotto E, Zanfi C, Spanghero M, 2013. Effect of dietary inclusion of whole ear corn silage on stomach development and gastric mucosa integrity of heavy pigs at slaughter. Vet J 198:717-9. Pascotto E, Tomè P, Comazzi S, Marco G, Cornegliani L, 2014. Use of an opensource Geographic Information System-based method for topographic analysis of nodular cutaneous lesions in dogs. Vet Dermatol 25:55-60. R Core Team, 2015. R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. Rugge M, Genta RM, 2005. Staging and grading of chronic gastritis. Hum Pathol 36:228-33. Zanfi C, Colombini S, Mason F, Galassi G, Rapetti L, Malagutti L, Crovetto GM, Spanghero M, 2014. Digestibility and metabolic utilization of diets containing whole ear corn silage and their effects on growth and slaughter traits of heavy pigs. J Anim Sci 92:211-9.

[Geospatial Health 2016; 11:436]

[page 223]


gh-2016_2_copertina.qxp_Layout 1 31/05/16 11:22 Pagina 2

GEOSPATIAL HEALTH

GEOSPATIAL HEALTH Aims and Scope. Geospatial Health is the official journal of the International Society of Geospatial Health (www.GnosisGIS.org). The journal was founded in 2006 at the University of Naples Federico II by Giuseppe Cringoli, John B. Malone, Robert Bergquist and Laura Rinaldi. The focus of the journal is on all aspects of the application of geographical information systems, remote sensing, global positioning systems, spatial statistics and other geospatial tools in human and veterinary health. The journal publishes two issues per year. Submission of manuscripts Submission to Geospatial Health will be done electronically via the Internet through the website: www.geospatialhealth.net Submission of a manuscript implies that it is original and is not being considered for publication elsewhere. Submission also implies that all authors have approved the paper for release and are in agreement with its content. Preparation of manuscripts - Manuscripts should be written in English. - Manuscripts should have numbered lines, with wide margins and double spacing throughout. Every page of the manuscript, including the title page, should be numbered. - Manuscripts should be organized in the following order: Title Name(s) of author(s) Complete postal address(es) of affiliations Full telephone, Fax no. and E-mail address of the corresponding author Abstract (no longer than 250 words) Keywords (3-5) Introduction Materials and Methods (with subheadings if necessary) Results (with subheadings if necessary) Discussion Conclusions Acknowledgements References Titles of tables and figures Tables (separate file(s)) Figures (separate file(s), in JPG or TIFF format)

Publisher - Editorial Office PAGEPress srl - Via Giuseppe Belli 7 - 27100 Pavia - www.pagepress.org Editorial Staff Lucia Zoppi, Managing Editor Cristiana Poggi, Production Editor Tiziano Taccini, Technical Support Printed by Press Up srl Via La Spezia 118/C 00055 Ladispoli (RM), Italy Logo on front cover by Vincenzo Musella

References - References cited in the text should be presented in a list in the section â&#x20AC;&#x153;Referencesâ&#x20AC;?. - In the text refer to the author's name (without initial) and year of publication. - If reference is made in the text to a publication written by more than two authors the name of the first author should be used followed by "et al.". This indication, however, should never be used in the list of references. In this list names of first author and co-authors should be mentioned. - References cited together in the text should be arranged chronologically. - The references should be listed in alphabetical order and they should contain: surname and initials of each author, year of publication, title of the paper, abbreviated name and volume of the journal (for books, title and publisher), first and last page of the paper. - For example: for journals Cringoli G, Taddei R, Rinaldi L, Veneziano V, Musella V, Cascone C, Sibilio G, Malone JB, 2004. Use of remote sensing and geographical information systems to identify environmental features that influence the distribution of paramphistomosis in sheep from the southern Italian Apennines. Vet Parasitol 122:15-26. for books Elliott P, Wakefield J, Best N, Briggs D, 2000. Spatial epidemiology. Methods and applications. Oxford University Press, Oxford, UK.


gh-2016_2_copertina.qxp_Layout 1 31/05/16 11:22 Pagina 1

Volume 11, Number 2

CONTENTS

May 2016

SHAILVI GUPTA, THOMAS A. GROEN, BARCLAY T. STEWART, SUNIL SHRESTHA, DAVID A. SPIEGEL, BENEDICT C. NWOMEH, REINOU S. GROEN, ADAM L. KUSHNER

KRISTIN MESECK, MARTA M. JANKOWSKA, JASPER SCHIPPERIJN, LOKI NATARAJAN, SUNEETA GODBOLE, JORDAN CARLSON, MICHELLE TAKEMOTO, KATIE CRIST, JACQUELINE KERR

The spatial distribution of injuries in need of surgical intervention in Nepal ...........................................................................77

Is missing geographic positioning system data in accelerometry studies a problem, and is imputation the solution? ............................157

OCKERT LOUIS VAN SCHALKWYK, EVA M. DE CLERCQ, CLAUDIA DE PUS, GUY HENDRICKX, PETER VAN DEN BOSSCHE, DARRYN L. KNOBEL

DUSTIN T. DUNCAN, RYAN R. RUFF, BASILE CHAIX, SEANN D. REGAN, JAMES H. WILLIAMS, JOSEPH RAVENELL, MARIE A. BRAGG, GBENGA OGEDEGBE, BRIAN ELBEL

Heterogeneity in a communal cattle-farming system in a zone endemic for foot and mouth disease in South Africa ............................83

Perceived spatial stigma, body mass index and blood pressure: a global positioning system study among low-income housing residents in New York City...................................................164

DAVID GIKUNGU, JACOB WAKHUNGU, DONALD SIAMBA, EDWARD NEYOLE, RICHARD MUITA, BERNARD BETT Dynamic risk model for Rift Valley fever outbreaks in Kenya based on climate and disease outbreak data ........................................95 YING WANG, YONGLI YANG, XUEZHONG SHI, SAICAI MAO, NIAN SHI, XIAOQING HUI The spatial distribution pattern of human immunodeficiency virus/acquired immune deficiency syndrome in China .......................104 BENN SARTORIUS, KURT SARTORIUS How much incident lung cancer was missed globally in 2012? An ecological country-level study ......................................................110

SARSENBAY K. ABDRAKHMANOV, AKHMETZHAN A. SULTANOV, KANATZHAN K. BEISEMBAYEV, FEDOR I. KORENNOY, DOSYM B. KUSHUBAEV, ABLAIKHAN S. KADYROV Zoning the territory of the Republic of Kazakhstan as to the risk of rabies among various categories of animals .............174 SEONG-YONG PARK, JIN-MI KWAK, EUN-WON SEO, KWANG-SOO LEE Spatial analysis of the regional variation of hypertensive disease mortality and its socio-economic correlates in South Korea...................................................................................182 SU YUN KANG, SUSANNA M. CRAMB, NICOLE M. WHITE, STEPHEN J. BALL, KERRIE L. MENGERSEN

RENKE LÜHKEN, JÖRN MARTIN GETHMANN, PETRA KRANZ, PIA STEFFENHAGEN, CHRISTOPH STAUBACH, FRANZ J. CONRATHS, ELLEN KIEL

Making the most of spatial information in health: a tutorial in Bayesian disease mapping for areal data.........................190

Comparison of single- and multi-scale models for the prediction of the Culicoides biting midge distribution in Germany .....................119

SARSENBAY K. ABDRAKHMANOV, KANATZHAN K. BEISEMBAYEV, FEDOR I. KORENNOY, GULZHAN N. YESSEMBEKOVA, DOSYM B. KUSHUBAEV, ABLAIKHAN S. KADYROV

KYUNGSOO HAN, SEJIN PARK, JÜRGEN SYMANZIK, SOOKHEE CHOI, JEONGYONG AHN

Revealing spatio-temporal patterns of rabies spread among various categories of animals in the Republic of Kazakhstan, 2010-2013..................................................199

Trends in obesity at the national and local level among South Korean adolescents ..................................................................130 RHEA BAGGENSTOS, TOBIAS DAHINDEN, PAUL R. TORGERSON, HANSRUEDI BÄR, CHRISTINA RAPSCH, GABRIELA KNUBBEN-SCHWEIZER Validation of an interactive map assessing the potential spread of Galba truncatula as intermediate host of Fasciola hepatica in Switzerland ....................................................137

ROBERTO CONDOLEO, VINCENZO MUSELLA, MARIA PAOLA MAURELLI, ANTONIO BOSCO, GIUSEPPE CRINGOLI, LAURA RINALDI Mapping, cluster detection and evaluation of risk factors of ovine toxoplasmosis in Southern Italy ...........................................206 RAHELEH SANIEI, ALI ZANGIABADI, MOHAMMAD SHARIFIKIA, YOUSEF GHAVIDEL

TIMOTHY SHIELDS, JESSIE PINCHOFF, JAILOS LUBINDA, HARRY HAMAPUMBU, KELLY SEARLE, TAMAKI KOBAYASHI, PHILIP E. THUMA, WILLIAM J. MOSS, FRANK C. CURRIERO

Air quality classification and its temporal trend in Tehran, Iran, 2002-2012 ......................................................213

Spatial and temporal changes in household structure locations using high-resolution satellite imagery for population assessment: an analysis in southern Zambia, 2006-2011 ......................................144

ERNESTO PASCOTTO, DIEGO CAPRARO, PAOLO TOMÈ, MAURO SPANGHERO

CRISTIAN DOMșA, ATTILA D. SÁNDOR, ANDREI D. MIHALCA

Topographic distribution of gastritis in heavy pigs investigated by a geographic information system approach ...................................221

Climate change and species distribution: possible scenarios for thermophilic ticks in Romania .....................................................151

ISSN 1827-1987

Health Applications in Geospatial Science

Geospatial Health Vol. 11, 2 (2016) 77-224

Volume 11, Number 2

May 2016

Available online www.geospatialhealth.net

Official Journal of the International Society of Geospatial Health www.GnosisGIS.org

ISSN 1827-1987

Geospatial Health 2016 Vol. 11 N. 2  
Read more
Read more
Similar to
Popular now
Just for you