Glacial Flooding & Disaster Risk Management Knowledge Exchange and Field Training July 11-24, 2013 in Huaraz, Peru HighMountains.org/workshop/peru-2013
The Great Himalaya Trail Pilot Study on Plant Species Distribution – A Citizen Science Initiative Paribesh Pradhan1, Rajan Bajracharya2 1. Annapurna Foundation, email@example.com 2. International Center for Integrated Mountain Development (ICIMOD), firstname.lastname@example.org Abstract: The Hindu Kush-‐Himalayan (HKH) region is rich in global biodiversity hotspots, eco-‐ regions, bird areas, plant areas, and Ramsar Sites. The conservation of species begins with an understanding of the distribution, abundance, habitat preferences and movements of organisms across wide geographic areas and over long periods of time, apart from its association with the lives of people, their culture and traditions, and goods and services offered. We propose a citizen science initiative that combines human observations along the trail with geo-‐spatial informatics to inform environmental changes in the HKH region. This paper explores different aspects of developing a crowd-‐sourcing application for environmental monitoring of HKH region. This initiative is intended to address not only biodiversity issues but also other thematic issues in future such as hazards, vulnerability, and adaptation and coping case studies. Keywords: Hindu Kush Himalaya, the Great Himalaya Trail, Biodiversity, Crowd Sourcing, Citizen Science Introduction: The Great Himalaya Trail “The Great Himalaya Trail (GHT) – My Climate Initiative” was initiated in 2012 by Paribesh Pradhan with financial support from the Global Programme Climate Change (GPCC), Swiss Agency for Development and Cooperation (SDC). The project entailed walking from east to west of Nepal, a distance of 1555 KM, in 98 days along the Great Himalaya Trail (GHT) to document communities’ perception of change and stories of sustainable adaptation practices, vulnerabilities and impacts of climate change. Given the nature of the journey, it opened up possibilities to photograph large number of species and habitat. Over 500 unique high resolution geo-‐tagged photographs of plant species were thus taken as a voluntarily initiative over the stretch of GHT. These photographs are now being used to develop a pilot geospatial database application on species distribution along the GHT together with the International Center for Integrated Mountain Development (ICIMOD). Such application is of critical importance particularly in the Hindu Kush-‐Himalayan (HKH) region where biodiversity has not been fully documented.
Importance of Biodiversity in the Hindu Kush Himalaya Almost one-‐third of the HKH region is covered by all or part of 4 global biodiversity hotspots, 6 UNESCO Natural World Heritage Sites, 60 eco-‐regions, 330 important bird areas, 53 important plant areas for medicinal plants, and 29 Ramsar Sites. [ICIMOD, 2009] A wide variety of ecosystems support specialized biodiversity with many globally threatened, endemic, and migratory species. The conservation of species begins with an understanding of the distribution, abundance, habitat preferences and movements of organisms across wide geographic areas and over long periods of time, apart from its association with the lives of people, their culture, and traditions, goods and services offered. This pilot citizen science initiative will help in understanding different aspects involved to develop crowd sourcing application for environmental monitoring of HKH region to address various other thematic issues such as hazards, vulnerability, and adaptation and coping case studies to climate change. It will also help understand different aspects involved to develop crowd sourcing application for environmental monitoring of HKH region. Approach: A Citizen Science Initiative using Crowd Sourcing Tools Engaging communities and citizens to take photography in mass scale for the purpose of understanding science and nature is still a relatively new and evolving approach. However, a lot of initiatives are taking place all over the world to this effect. Galaxy Zoo has involved more than 200,000 participants to classify more than 100 million galaxies through web enabled interface (Wood et al. 2011). Game-‐based engagement FoldIt attempts to predict protein structure by utilizing humans’ puzzle solving abilities (ibid). Wikipedia, which allows users to add or edit definitions of any article, is an example of a successful model on a large scale. eBird collects about 5000 checklists and 75,000 observations each day that all go into a single standard database; in 2011, eBird contributors volunteered more than 1.3 million hours collecting bird observations (Hardin, 2012). Similarly, National Biodiversity Network in the UK has over 31 million records of plant and animal species largely submitted by amateur naturalists (Stafford et al., 2010). In Australia there are large-‐scale citizen science projects mapping distributions of species as diverse as possums, whale sharks and frogs (ibid). National Science Foundation’s Data ONE, EURING bird ringing and recovery scheme, India Biodiversity Portal, Mongabay.com, SavingSpecies.org, Plantwise, geowiki are a few more examples. Popular photo data collection websites such as Flickr and Pinterest along with generic ones also provide categories such as science and nature. Citizen science projects stem from the pervasive access to the geospatial informatics comprising of remote sensing, Geographical Information System (GIS) and information technology consisting of internet and mobile systems. It leverages the potential of these technologies in data collection, data-‐management, quality control, data processing, analysis, serving the information and applications and to develop Human/Computer Learning Networks (HCLN). These networks can leverage the contributions of broad recruitment of human observers and process their contributed data with Artificial
Intelligence (AI) algorithms for a resulting total computational power far exceeding the sum of their individual parts. A wide variety of ecosystems support specialized biodiversity with many globally threatened, endemic, and migratory species, but the biodiversity has not been fully documented in the HKH region. There is a limited availability of data and to fill this data gap, citizen science initiatives using crowd-‐sourcing techniques could be a cost effective and the most efficient approach. The term crowd-‐sourcing was first coined by Jeff Howe in an article of Wired magazine (J. Howe., 2006). However, crowd-‐sourcing data are difficult to structure as disorganized crowd based content such as text, images and video hinders in the management of ecological information system. In addition, there are few well-‐established repositories or standard protocols for their archiving and retrieval of ecological observation data (Madin et al., 2011). That means a researcher investigating a particular case has to struggle with retrieval from disorganized crowd base content and similarly has to investigate heterogeneous repositories in order to obtain the data needed. Hence there is a great administrative effort for knowledge extraction (data discovery), consolidation from unstructured data and integration processes with other data repositories which hinders research activities. The ontology provides a convenient basis for adding detailed semantic annotations to scientific data, and extended with specialized domain vocabularies, making it both broadly applicable and highly customizable (Madin el al., 2011). The development of semantic platform can address these problems by creating a smart content using semantic (ontology) based knowledge management and retrieval system. To develop such semantic platform that captures semantically rich crowd source ecological datasets (text, image, and video), the following steps have to be considered: • A mechanism or development of engine to identify the concept or ‘domain patterns’ (e.g. topology, dryness, landcover) in the crowd source data and map semantics to those concepts. •
Add value to existing foundational ontologies (domain/ecological patterns) and enrich the semantic in the knowledge schema.
Development of the wrappers which will lift data from the original sources to the meaningful, machine-‐readable level. Example are the Google Art wrapper (C. Guéret, 2011)
Smart content and advance user interface for presenting ecological knowledge base. This includes merging of the several ontologies provides one semantically rich access point for the entire domain crowd sourced data and relevant data repositories in order to achieve an integration of the resources.
The following conceptual pillars are necessary to implement this citizen science initiative.
Semantic Platform: The semantic web will provide a way to package data with its meaning as smart content using the Topic Maps technology. Figure 1 shows the conceptual framework of the semantic web platform. Topic Maps is an international industry standard (ISO/IEC 13250, 2003) for knowledge representation and information integration. It provides the ability to store, together with the data, complex meta-‐data that represents the semantics i.e. record the meaning of the data stored. Unlike other technologies (e.g. RDF/OWL), Topic Maps provides the ability to represent knowledge in a natural way – the way humans grasp knowledge. This is a more natural approach that can be extremely powerful especially when humans must interact with information systems. All the types in a topic map – the topic types, the occurrence types, the association types and the role types – are defined as topics. These topics provide the conceptual skeleton of the topic map. These topics together with the scoping topics are referred to, in the Topic Maps community, as the Topic Maps Ontology. Ontologies are very useful when authoring topic maps as they help to identify the borders of the domain of knowledge that the topic map represents. The Topic Maps standard provides the ability to merge topic maps in order to achieve an integration of the resources (Bleier et. Al., 2010). The proposed platform integrate/federate crowd-‐sourced data to self organize and wrapper application wraps other heterogeneous data to extend ontologies and content from different distributed sources providing one access point using domain vocabularies. This will transfer the data into Smart Content. Smart Content Layer: In order to allow the data to present itself according to its meaning and within context, the layer should able to provide a Topic Map Application Programming Interface (TMAPI, 2010) for creating and using Semantically Active Components (SACs). A SAC is a component that enables the presentation or the processing of data by the platform. Examples for SACs are a component that presents data in a table, component that presents a graph or a component that sends an email when certain condition related to the data is
met. By implementing the API the layer defines, any SAC provide the ability to be configured by the semantics of the data and/or the context in which the data is being accessed (the role of the user, his activity, or objectives). Moreover, it will be possible to nest SACs within other SACs. This allows, for example, creating a web presentation for all the topics of certain type accessed by certain user in certain situation. The fact that the data is self explanatory, and the effect that the data semantics and the context in which it is accessed has over its presentation or processing make it Smart Content. Advanced User Interfaces Layer: The semantic platform will include a natural language user interface and will provide a way to access the data by asking questions and conducting dialogs with the system. This enables the users to easily perform semantically rich queries. While the natural language user interface lets the user ask queries, a graphical user interface will allow the user to browse through the available knowledge and data. This graphical user interface will integrate the Semantically Active Components in order to visualize different types of data in different ways. In other ways it synergizes existing informatics resources using more user-‐friendly integrative UIs. Pilot Study from the Great Himalaya Trail All the photographs from GHT were taken in RAW format using Canon 7D camera. The photographs had to be preprocessed and converted into low resolution JPEG format compatible for web publishing purposes. The preprocessing also included creating a database and analysis to associate the photos with the data received from the GPS device. A category of attributes were also identified while preparing this database. The second step will involve identification of all plant species in the photos by a taxonomist, thus providing added value information to the database. As an example, the following is the data information of Darimpate plant, also known scientifically as Rosa sericea. 1 Photo ID IMG_4294 2 Scientific Name Rosa sericea 3 Local Name/s Darimpate 4 Family Name 5 Photographed Date 7 May 2012 6 Time 1:12:08PM 7 Altitude 2825 m 8 Altitudinal Range 1820 -‐ 4850 m 9 Longitude 87.9058861 10 Latitude 27.4832412607 11 Photograph By Paribesh Pradhan 12 District Taplejung 13 Plant Features Deciduous shrub, 1-‐2 m tall; Stems smooth or bristly or with robust red thorns, sometimes wing-‐like, paired below leaves or scattered along branches; leaves pinnate, leaflets ovate-‐obovate; margin entire at base, serrate towards
14 Common Habitat 15 Regional Distribution 16 Remarks 17 Economic use 18 Endemic value
apex; flowers solitary on short side shoots, white to creamy-‐yellow, 4 petaled; fruit a hip, red, obovoid – globose Open woods, forest margins, scrub, dry sunny places India (Sikkim, Assam), Nepal, Bhutan, Myanmar, Tibet/China -‐ Generis/medicinal/… -‐
The third and most crucial step will be the development of semantic web platform and integration of this database. The system will be developed as described in the proposed conceptual frame of semantic web platform above.
Figure 2. Geotagged photographs from the Great Himalaya Trail. Map Courtesy: ICIMOD Conclusion An application based on semantic web platform for GPS enabled smart phones and similar other device to collect photo data of different plant species is under development as a part of the pilot project. Some of the technical challenges constraining the current development
are automatic pre annotation of the data by the system before being fed to the central database framework and also the interoperability among the systems. It is also a technical challenge to incorporate AI that will automatically identify the plant species thus eliminating the need for taxonomist to identify and approve each data every time. However, this project has the potential to bridge the data gap on biodiversity in the Himalayas for researchers and scientists in future. This project could also be replicated to other thematic areas as mentioned previously and may also be useful as a smart application for trekkers and hikers to identify the plant species instantly in real time while they are in the mountains. However, it will require more research, financial funding and time to develop this into fully functional application whereby crowd could feed in the data and also use it to get immediate information about the plants they photographed. References Bleier, Arnim, Patrick Jähnichen, Uta Schulze, and Lutz Maicher. 2010. “The Praxis of Social Knowledge Federation”, Presentation on Topic Maps services held at the Second International Workshop on Knowledge Federation, in Dubrovnik, Croatia. Guéret, Christopher. 2011. “GoogleArt — Semantic Data Wrapper (Technical Update)”, SemanticWeb.com, March 25, 2011. Accessed on 10 June, 2013: http://semanticweb.com/googleart-‐semantic-‐data-‐wrapper-‐technical-‐update_b18726. Hardin, Steve. 2012. “How to Identify Ducks in Flight: A Crowdsourcing Approach to Biodiversity Research and Conservation“, The Information Association for the Information Age, Bulletin February/ March 2012. Accessed on 14 June, 2013: http://www.asis.org/Bulletin/Feb-‐12/FebMar12_Hardin_Kelling.html Howe, Jeff. 2006. “The rise of crowd sourcing”, Wired, Issue 14.06. Accessed on 14 June, 2013: http://www.wired.com/wired/archive/14.06/crowds.html ICIMOD. 2009. “Mountain Biodiversity and Climate Change”, International Center for Integrated Mountain Development (ICIMOD), Kathmandu, Nepal. ISO. 2003. “Information Technology Document Description and Processing Languages Topic Maps”, International Organization for Standardization (IOS), Geneva, Switzerland. http://www.y12.doe.gov/sgml/sc34/document/0322_files/iso13250-‐2nd-‐ed-‐v2.pdf Madin, Joshua, Shawn Bowers, Mark Schildhauer, Sergeui Krivov, Deana Pennington, and Ferdinando Villa. 2007. “An ontology for describing and synthesizing ecological observation data”, Ecological Informatics, Volume 2, Issue 3, Pages 279-‐296, ISSN 1574-‐ 9541, Available at: http://dx.doi.org/10.1016/j.ecoinf.2007.05.004.
Published on Sep 25, 2013
The Hindu Kush-Himalayan (HKH) region is rich in global biodiversity hotspots, eco-regions, bird areas, plant areas, and Ramsar Sites. The c...