Natural Resource/Fish and Wildlife Research Data Management: The Foothills Model Forest Approach Christian Weik, GIS Coordinator, Foothills Model Forest FOOTHILLS MODEL FOREST OVERVIEW The Foothills Model Forest is one of eleven Model Forests across Canada that make up the Model Forest Network . The Network is funded and administered by Natural Resources Canada and the Canadian Forest Service with other cash and in-kind contributions provided by program partners. The Foothills Model Forest has three principle partners/sponsors who represent the agencies with vested land management authority for the land base covered by the Model Forest. The Foothills Model Forest is located in west-central Alberta and covers an area of approximately 2.75 million hectares. Included in the landbase are Weldwood of Canada Limited's Forest Management Agreement area, Jasper National Park, Willmore Wilderness Park, various provincial Crown Forest Management Units and the Environmental Training Centre's Cache Percotte Training Forest. The area itself lies within the boreal, montane, and sub-alpine forest regions of Canada. The Model Forest is centered in the town of Hinton, Alberta, a resource-based community of approximately 10,000 located 285 kilometres west of Edmonton and 85 kilometres east of Jasper townsite. The Foothills Model Forest (FMF) conducts research into sustainable forest management. ** Add ecological, economic etc. Individual program areas conduct research based on the direction given them from the board of directors and at a technical level from activity teams. Some of the on-site projects at the FMF include, but are not limited to: • Grizzly bear research project • Natural disturbance research project • Fish and watershed research project • Foothills Growth and Yield Association
THE FOOTHILLS MODEL FOREST BUSINESS Each project area has a life span ranging in length from 1-6 years with some projects, such as the Foothills Growth and Yield, having no determinate end date. Most projects are very independent, from a business perspective, sharing few resources and rarely encountering overlap in terms of data collection. A high percentage of field staff are seasonal therefore there is generally high turnover amongst those who handle data collected at the FMF.
ISSUES WITH TRADITIONAL DATA MANAGEMENT METHODOLOGIES Traditionally each project had no planning component data management. Data collection was driven by immediate research objectives field sheet formats. There were steps taken to improve quality control but most were manual, and performed well after collection had taken place. Data were stored in many different spreadsheet, database or even ASCII format files. This approach over time resulted in compounding problems. They included: • Extensive duplication of data. This occurred most often with the same data residing in multiple copies of the same file in different locations on the network. This also occurred when data files were taken off-site for data loading or updating purposes.
• • • • •
Frequent data entry errors resulting in illogical data. Little or no knowledge of who or when data were entered, modified etc. That is little or no metadata was captured. Inability to perform analysis due to poor table design. This was a function of a lack of training but also little foresight when planning data storage. Breakdown of existing databases due to constant design modification to meet addition objectives. Incorrect spatial placement of field locations. This was the result of not correctly tying the sample points to a GIS hydrography layer.
These problems existed at varying level across all projects areas. In most cases there was acknowledgement that there were problems but there was little awareness of how to address them. Generally the longer a project had been operating, the more entrenched and problematic the issues appeared. Also though these project researchers were more willing to work with us (the GIS staff) to approach the problems in more systematic and proactive way.
APPROACH TO BETTER DATA MANAGEMENT The GIS staff (currently two personnel) at the FMF have expanded their role beyond their initial traditional responsibilities. They have proactively taken on a role of collaborating with researcher program areas to manage all research data. There were three main factors driving this change: After several years of managing data in an ad-hoc manner it became clear to researchers that a more proactive, integrated approach was necessary in order to avoid inefficiencies when analyzing the data and maximizing the correctness of the data. The GIS staff consistently spent addition time fixing problems prior to performing further analysis. It was thought that taking a proactive approach would, in the long, minimize or eliminate this requirement. The technological integration of GIS and relational database management systems. Advances in GIS technology enable storage of spatial data directly in RDBMSs. It was thought that developing database skills to complement GIS skills and ensuring high data quality would better position the organization to realize the true benefits if these advances in the future. The expanded of GIS staff placed them in a position of working proactively with each research team to work towards better data management. They engaged traditional database system implementation tactics with each individual project. The general process for development was as follows: 1. Needs analysis: This involves one or more meetings with individual project researchers. The result of the meeting was a clear identification of the scope and objectives of the database. 2. Database design: This involves applying feedback from the needs analysis, output reports, historical databases and field forms to design tables and columns in an entity relationship (ER) diagram. The tool used for database design at the FMF is Microsoft Visio Enterprise. 3. Development: This refers to the phase of generating a physical database and developing data entry forms, queries, and reports.. 4. Loading: This refers to the loading of historical data (if it exists) into the database.
5. Training: This involves training users on the new database system. This involved encouragement of researchers to take industry training and supplement that with in-house training using the researchers own data. The process was, in practice, iterative in that there were constant checks to ensure the database system met expected needs. The result is individual Microsoft Access databases that serve the following objectives: A mechanism to check field data for logical correctness A mechanism to load field data through data entry forms A single location to store all research related from which all analysis would be initiated.
ROLES AND RESPONSIBILITIES The roles and responsibilities beyond the completion of the formal development process remain shared between the data facilitators and the users. It is strongly encouraged that the users take full responsibility of the content of the database, that is they are the owners of the database and its contents. They are also responsible for mining the data from the database. In some instances though it is necessary for the data facilitators to develop complex queries, most commonly those requiring custom programming. The diagram below illustrates the roles and responsibilities at the FMF.
It is important to note that the GIS staff do not force users to adopt a particular type of software. The intent is to ensure the core data are stored in a system build to store and manage data (i.e. RDBMSs). The users are encourage to use software they see fit to analyse their data. The training process shows how users can export (or import to) their data to analysis tools including spreadsheets or statistical packages. To date though, based on the feedback from the
development process and the consideration of cost and availability all databases have been developed using Microsoft Access 2000. In a traditional database management system these steps would normally close the loop and end the process. It was realized though that in a research environment the data facilitators must be engaged in a maintenance role to meet the changes in the research objectives. This key point sets the research business apart from traditional environments.
BENEFITS AND SUCCESSES To date there have been two program areas that have progressed through the full development, training, data loading, training and adoption process. The first is the Grizzly Bear project. The main database contents range from capture data to blood, DNA analysis results and GPS collar locations. All users, at several locations, mine data from the same database to perform their analysis. To date data from more than 30 individual bears and over 30,000 GPS locations are store in the database. The second is the Foothills Growth and Yield Association (FGYA) Regenerated Lodgepole Pine (RPL) database. The contents range from ecological, site index, competition and photographs. There are more than 130,000 tree assessment records after two years. All models developed at the FMF are available for public distribution should other groups see the potential for commonalities with projects of their own.
THE FUTURE The Fish and Watershed program area is currently in the database development stage. It is hoped the database will be ready for data loading in by March of 2003. The data facilitators will investigate storing all spatial research data along side the non-spatial data in the Microsoft Access databases. In the case of the Fish and Watershed initiative the spatial components will be based on a standard water resources spatial data model developed through Environmental Systems Research Institute (ESRI) in the United States. Work will also be done to better streamline the process of moving data to and from field data logger instruments. To date the process of loading date collected in a very â€˜flatâ€™ structure to a correctly normalized database has been very time consuming.
/******************************************************************* Check the stats on number of bears and number of GPS locations Maybe use the word data facilitator instead of data manager A recommendations section?? Or why are we successful? This is primarily due to the many problems brought to light by loading data from an older system that did not have checks for invalid data. Elaborate? In-house expertise Small, focused organization Young culture, eager to learn Integrity is ensured using database constraints and logical checks
Also there is a high level of territoriality with respect to the research and the publications that result. This last point is relevant as it adds a layer of security required from a data management perspective. . ** Check database text book for development process Make sure we are clear that development occurred project-by-project This talk is going to be database design 101 stuff for a database design audience There is no difference between spatial and non-spatial data, the line is being blurred more and more. Consultants have been telling us this for years but thought it was never worth the value I can remember asking a mining client instead of just spitting out maps of where wildlife had been sighted in one year they might want to set up a database to monitor it. They saw no value in this, and eventually the EIA demanded they do exactly that. Make sure the tense is consistent – I think we should use past, as in this is the experience of the model forest We are doing work that has to be done regardless of what system is being implemented. The data cleaning and loading process is needed regardless. From a data management perspective most projects are very fluid in nature. That is it is very difficult to identify long-term needs. There is a general high level of turn-over with field crews changing season to season. This is relevant in regards to data management in that poorly documented or poorly entered data make it difficult for replacement employees to usee data to its full value. Each of these program areas has a field data collection component of some kind. •
Another form of data duplication was in the form of ‘flat’ table design
In some cases data were stored in columns intended for data to represent different parameter measurements. Examples of poor data management and the problems associated. There were several glaring examples of how these compounded problems result in huge amounts of addition effort required before being able to us the data for analysis of any kind. In 2000 the FMF embarked on compiling a Local Level Indicators Report under the ospices of the Federal Criteria and Indicators Initiative. The intent of the report was to compile indicators of sustainability on the FMF landbase and report on them. This seemingly simple excersize required and enormous amount of time to report on relatively simple datasets. The key problems were encountered when it came to combining datasets from several partner organizations that stored their data using very different methods. How did we address the resource shortage if GIS staff ar upping their workload?