E-proceedings of EFGS 2010 Conference

Page 1

e – Proceedings of European Forum for Geostatistics Conference 5 - 7 October, 2010 Tallinn, Estonia

www.efgs.info

The e- proceedings was produced with support from Eurostat under Contract N 50502.2009.004 – 2009.004 – 2009. 860 for the Geostat 1 A – representing Census data in a European population grid and delivered under WP4 - Dissemination, Distribution and Exploitation.


E– Proceedings of European Forum for Geostatistics Conference 5 - 7 October, 2010 Tallinn, Estonia

www.efgs.info

Foreword

Acknowledgments

In today's hi - tech world with widely spread spatial data infrastructures and extensive use of geodata, many government organisation, research communities and business companies are increasingly reliant on the use of statistics linked with location in order to ensure the success on their particular field of work. Gridded socioeconomic statistics are a basis for demography research, environmental planning, sustainable development, infrastructure planning. Moreover, this is a foundation for an innovative development as location based services. Therefore, the need for the advancing methods for production of gridded statistics is recognised also by statistical organisations in Europe.

It is a pleasure to thank all those great persons who made the conference and the e-proceedings possible.

Driven by a need for high– quality gridded statistics , Eurostat has launched an ESSNet Grant „ Geostat 1A Representing Census data in a European population grid” in 2010. The goals of this project are to develop technically sound guidelines for datasets and methods to link 2010/11 Population and Housing Census results to a common harmonised grid. Under the lead of statistics Norway, this project is carried out by GIS professionals from NSI of Austria, Estonia, Finland, France, the Netherlands, Poland , Portugal, Slovenia and MD Mapping. Most of the consortium partners have been contributing actively to the development of European Forum of Geostatistics over 2 decades. Since 1998 the EFGS has succeeded in developing a strong expert group of geostatistcis and holding EFGS yearly conferences. This has resulted in linking geo statisticians from 30 countries into strong professional body. On behalf of the European Forum of Geostatistics and the Geostat project, Statistics Estonia hosted the conference “European forum for geostatistics conference 2010” on 5–7 October, 2010 in Tallinn, Estonia.

We owe our deepest gratitude to the keynote speakers: Prof. David Martin, Mr Eddie Bright, Mr Alex de Sherbinin and Mr Lars H. Backer, for their flourishing ideas and visions. The EFGS 2010 conference would not have been possible unless the great work of Session Chairs, passionate speakers and excellent contributions of all 65 participants, from 29 countries from different continents. Contributions by submitting the papers to the conference e –proceedings are gratefully acknowledged. The conference took place as a part of the Essnet Geostat 1A grant — Representing Census data in a European population grid , financed mainly by the Eurostat under the grant and was hosted by Statistics Estonia. The e- proceedings was produced with support from Eurostat under Contract N 50502.2009.004 – 2009.004 – 2009. 860 for the Geostat 1 A – representing Census data in a European population grid. This e- proceedings is one deliverable of the Geostat 1 A WP4 Dissemination, Distribution and Exploitation. Thank you all for the wonderful and interesting time. It is our honour to meet you in EFGS 2011 conference in Portugal.

Conference Programme Committee

This conference e- proceedings is intended to share geostatsitics community experiences with others. The objective of this publication is twofold: 1. To present current situation and perspectives on the geostatistics in Europe; and 2. To present some of the challenges of Geostat 1 A project deals with. On behalf of the Conference Programme Committee

Mrs Diana Makarenko-Piirsalu European Forum For Geostatistics (EFGS) 2

Tallinn, Estonia 19th of January 2011

Legal Notice Neither the European Forum for Geostatistics nor any person acting on behalf of the European Forum for Geostatistics is responsible for the use which might be made of the following information. The views expressed in this publication are the sole responsibility of the authors and do not necessarily reflect the views of the European Forum for Geostatistics. A great deal of additional information on the EFGS is available on the Internet. It can be accessed through the server (http:// www.efgs.info). Reproduction is authorized provided that the source is acknowledged. © European Forum for Geostatistics, 2010


www.efgs.info

E– Proceedings of European Forum for Geostatistics Conference 5 - 7 October, 2010 Tallinn, Estonia

Contents Foreword Acknowledgments Legal Notice Progress report: 24-hour gridded population models by David Martin, Samantha Cockings and Samuel Leung Characterizing High—Resolution Population Distributions: a LandScan Experience by Eddie Bright and Budhendra Bhaduri Construction of Gridded Population and Poverty Data Sets from Different Data Sources by Deborah Balk, Gregory Yetman and Alex de Sherbinin

2 2 2 3 9

12

Geospatial data in EEA assessments – status and requirements by Andrus Meiner,

20

A revised urban – rural typology, European Commission by Hugo Poelman

23

European Forum for Geostatistics as a Market Place by Erik Sommer

34

A state of the art Side effects from the Geostat project - Part I by Jean-Luc, Lipatz Gridded data, no matter what - Part II by Jean-Luc, Lipatz Gridded Population – new data sets for an improved disaggregation approach by K. Steinnocher, I. Kaminger, M. Köstl and J. Weichselbaum Accuracy of built-up area mapping in Europe from the perspective of population surface modelling by Pavol Hurbanek, Peter Atkinson, Konstantin Rosina,Robert Pazur A Population grid for Spain: Experiences is assembling population and cartographic data from publicly available sources by Francisco J. Goerlich* and Ivie. Isidro Cantarino Rules, techniques and processes to design the enumeration areas to collect and disseminate Italian census data by Grazia Ticca , ISTAT New spatial dimension and applicability of old census and administrative data by Igor Kuzma

35 38 42

46

48

54

56

A Proposal for the dissemination of Brazilian 2010 Population Census by Maria do Carmo Dias Bueno,

58

Statistics on small areas in Denmark with special attention to urban areas by Michael Berg Rasmussen

61

INSPIRE and the process of the development of Data Specifications by Udo Maack,

63

3


E– Proceedings of European Forum for Geostatistics Conference 5 - 7 October, 2010 Tallinn, Estonia

www.efgs.info

Progress report: 24-hour gridded population models David Martin*, Samantha Cockings* and Samuel Leung* * School of Geography, University of Southampton, Southampton, SO17 1BJ, UK

Introduction In a paper presented at the 2009 EFGS Conference in The Hague (Martin et al., 2009) we set out the early stages of a UK research project focused on the production of time-specific gridded population models. The present paper reviews overall progress with this work and develops in more detail some aspects of the modelling principles and available data sources which make this possible. The paper seeks to demonstrate how readily accessible data are beginning to provide many of the key inputs required to make time-specific representations of population. Time-specific models have the potential to overcome many key weaknesses of current representations which are generally based on dated population statistics about population numbers at place of residence and which fail to reflect massive cyclical spatial redistributions of population. In many application areas, current analyses persist in using population maps with major deficiencies due to the absence of plausible alternative models. The remainder of this paper is structured as follows: the following section provides a brief review of the case for more sophisticated time-space population distributions. The third section reviews issues arising from the challenge of data source acquisition and integration and the final section indicates the progress made in the project to date with regard to addressing these challenges. The need for better space-time population distributions

4

Conventional population mapping, whether based on regular grids or irregular areal units, relies on ‘night-time’ residential population counts. This is essentially due to the residential basis used by most censuses and population registration systems, whereby an individual’s reported place of residence is taken to represent their geographical location for all purposes. Whether geographical referencing takes place at the address, postal code or small area level, the result is to create a geographical representation of population at residential addresses and for most purposes this therefore generates a map of population at their night time locations (when most population members are at home). One interpretation of this mapping in time-space is presented in Figure 1(a) in which the complete population remains associated with residential locations through all time periods, while the many other locations associated with human activity remain entirely unpopulated at all times. These distributions are entirely appropriate for many types of use, especially the planning of services and resource allocation which are essentially based on residence assumptions such as demand for primary school places or community -based health care services. There is nevertheless widespread demand for population maps which are more temporally appropriate for specific purposes and this can

be separated into two elements. Firstly, there is demand for population maps which are as up to date as possible in chronological time: in this respect systems based on continuously-maintained population registration systems are far superior to those based on decennial census data, in which the conventional population map may represent the population distribution at night, up to 11 years ago (for example, with 2011-based small area data not replacing published 2001-based counts until late 2012). The second and more important aspect of temporal validity is demand for maps which realistically represent the distribution of population at specific times, reflecting the massive redistribution of population on daily, weekly, termly and seasonal timecycles through travel to work, school, college, business, leisure and vacation etc. These considerations are particularly pertinent to any application where population ‘exposure’ is a relevant consideration. This may be interpreted as direct exposure to hazard, such as an incident which occurs at a specific time and place – for example an explosion at an oil refinery requiring the mass evacuation of local populations; or assessment of the long-term exposure of population to hazards such as flooding or atmospheric pollution. A very different interpretation of the same process would be predicting the potential catchment population for a retail outlet, which will be massively affected by population movements during the day and, for the most part, unrelated to residential patterns, particularly in major commercial centres. There has been steady and growing interest in the development of population maps which attempt to adjust night-time populations to reflect ‘day-time’ distributions, particularly for the purposes of emergency planning.

Figure 1(a): interpretation of population time-space in a conventional population map, with all population recorded at place of residence across all times.


www.efgs.info

E– Proceedings of European Forum for Geostatistics Conference 5 - 7 October, 2010 Tallinn, Estonia

Examples include Sleeter and Wood (2006), who propose a method aimed at providing more realistic daytime population estimates for emergency planning, as outlined above. They use US census data for small areas and transfer working populations out of residential areas during the daytime and redistribute these onto schools and workplace locations derived from a business directory, subdividing the population between school and working age groups. McPherson et al. (2006) present a national model for the US, this time based on a gridded model and again using the binary division of daytime and night-time. McPherson and Brown (2004) present static daytime and night-time models by allocating population to night-time and daytime residential locations, and to employment locations in the daytime model. McPherson et al. (2006) recognise the importance of time spent in travel and attempt to model population numbers in the transportation system. Their particular application example is that of planning for population transfer to hospital following a hypothetical airborne release of a hazardous substance in a densely populated area. In the UK, Smith and Fairburn (2008) provide a GIS database approach, which is not an explicit population model, but rather a geographical feature database with associated population capacities and annotation about temporal usage. In this approach, it is anticipated that the analyst will use the database to inform judgements about the population likely to be present in an area under any particular set of circumstances, for example following the commencement of a serious incident or in order to explore specific emergency planning scenarios.

order to measure population presence at different times, for example using night-time lights, and this is essentially the approach adopted by the Landscan USA project (Bhadhuri et al., 2007), a major initiative which recognises the importance of the temporal dimension in population representation. The approach used in our work has been to develop the population grid modelling method originally proposed for static residential populations by Martin (1989) and to extend this to operate on a database of activity locations at which the presence of population is described in both time and space, as outlined in Martin et al. (2009). Our conceptual model has much in common with that of Ahola et al. (2007) which sets out the relationships between geographical objects and their occupation at different times by different population sub-groups. The recent explosion in available data has begun to make such modelling a viable option over large areas. Our implementation treats all population locations as centroids (single x,y coordinate pairs) with an associated time profile for the presence of population, which may be applied separately to different population sub-groups (such as age groups). Gridded models are then assembled from this centroid database which use the specific values associated with a target time. A major focus of our work has therefore been on the assembly of a suitable dataset which could be interpreted in the terms of Figure 1(b) as representing the proportional shift of population over time between the spatial locations of a range of different activity types. The diagram shows only a single day, but weekly and longer timecycles may readily be used.

Data source acquisition and integration A key limitation to all published attempts to produce more sophisticated time-space representations of population distribution has been the relative paucity of nonresidential population location data, and particularly the temporal component. Whereas official statistical sources provide good coverage of residential locations and to a certain extent cover workplace locations (either through census questions about place of employment or by linkage of population and employment registers, depending on country), there has until recently been very little formal recording (and even less publication) of population numbers associated with the numerous other types of population activity. This situation is changing rapidly with the explosion of relevant data on the web – most recently through deliberately constructed national data portals such as http://data.gov.uk and http://data.gov. Additional impetus to the exposure of these data has been the growth of the linked data community (Shadbolt et al., 2006) but there had already been a steadily growing tendency for (especially publicly-funded) organizations to use the web to publish service performance figures, which often take the form of user or visitor numbers. The time profiles of activities such as service opening hours, school term dates, etc. are also widely available on the web. One approach to the estimation of time-specific population models is to work from remotely sensed data in

Figure 1(b): alternative interpretation of population time-space, whereby each non-residential population activity is associated with different time profiles.

5


E– Proceedings of European Forum for Geostatistics Conference 5 - 7 October, 2010 Tallinn, Estonia Our data sources can be divided into three principal types: those relating to residential, non-residential and transportation locations of population. The decennial census and annual mid-year population estimates provide conventional residential population counts and a clear system of geographical referencing, although not always at the smallest areal unit level. We have used a national directory of georeferenced postal codes as the basis for allocation of these counts to exact locations and their population totals define the total population to be included in the model. Some categories of communal establishments, such as prisons and student halls of residence are available from independent sources and allow us to remove these sub-populations from the general residential base and directly assign exact counts and locations.

6

The second type of data are those relating to nonresidential locations and in this category are locations of education, employment, health care, retail, leisure, etc. Recent data sources have begun to provide quite comprehensive information on schools, colleges, universities and hospitals and the national statistical agency runs an annual survey of businesses which provides an employment dataset known as the Annual Business Inquiry (ABI) as a subscription service. The decennial census also provides information on places of employment, but these may be expected to become rapidly out of date as spatial patterns of employment will change much more quickly than residential patterns. Again, some of these such as ABI are not available for the lowest geographical units and some modelling is required. However, others such as educational and health care locations are identifiable at specific postal codes and can therefore be located with a high degree of precision. Further, most of these activity types can be assigned ‘standard’ time profiles based on survey data and published opening or service times. For example, working hours can be assigned to categories in the standard industrial classification based on the Quarterly Labour Force Survey (QFLS) while school opening times and university termtimes are widely published on the web. For the purposes of a general model, single time profiles can be used for each activity type although we have adopted a data description model whereby unique time profiles can be assigned to individual centroids, if such data were available. Areas of influence, equating to catchment areas, can also be associated with these locations and these may be expressed as a variety of catchment area maps, travel-to-work distances or distance decay curves. Again, standard models may be applied to an entire category of centroids or individual data may be assigned where available. Census information contains a question on distance of travel to work which provides a useful starting point for the characterisation of standard areas of influence. A current area of deficiency is detailed counts for visitor numbers to residential and leisure facilities. These data are widely collected but considered to be commercially sensitive and our initial implementation therefore relies on modelled counts using employment numbers and reference data.

www.efgs.info

The third type of data are those relating to populations in the transport system. Here, most of the data available is a proxy for the actual population counts and tends to take the form of vehicle counts or timetable data associated with the movement of population through predefined networks (road, rail, etc.) Various data sources from Ordnance Survey, Britain’s national mapping agency, have recently been published as open data, which allows us to identify key aspects of the transportation network and to associate these with typical traffic levels published by the Department for Transport. A very similar approach to UK transportation data has been adopted by Smith and Fairburn (2008). Population 24/7 project progress The Population 24/7 project has established a methodology for the redistribution of population from centroid locations into the cells of a regular geographical grid and described more fully elsewhere. This specific gridding algorithm could be replaced by alternatives while still retaining the integrity of the overall approach. The timespace implementation is based on the existence of an extensive database of spatial locations with associated time profiles indicating their population capacities. Centroids are divided into residential and non-residential locations and the transportation network is represented as a gridded model of population capacities generated at the resolution of the desired output model. For a specified target time, the time profile of each non-residential centroid in the study area is examined and a series of time-specific population estimates extracted. Populations are then redistributed from the residential centroids across the non-residential and transport locations so as to best meet the populations at each location. Again, a variety of algorithmic approaches would be possible, including complex spatial interaction modelling or microsimulation. In these initial implementations a simple distance-decay weighting is used whereby the total population summed across all residential locations is preserved within the model. The model is run separately on each age group identified in the data, allowing, preschool, school, college, working and retired populations to be handled separately. Global adjustments may be made for visitors in or out of the study area. A substantial database of time-profiled centroid locations for England and Wales has now been prepared for the reference year 2006 and a range of tests and demonstrators prepared for the Solent region on the south coast of England. Figure 2(a-c) shows three time slices covering an area of approximately 80 x 40km at 200m resolution, including the cities of Southampton and Portsmouth. The next phase of the work involves roll-out of these models to cover England and Wales (Scotland and Northern Ireland have some differently recorded data and are not covered in the current work) and consolidation and evaluation of the current models, using available data. The proposed methods are readily extensible in the light of more detailed data, providing that adjustments are made to any existing data sources to accommodate more specific sources, thereby avoiding double


www.efgs.info

E– Proceedings of European Forum for Geostatistics Conference 5 - 7 October, 2010 Tallinn, Estonia

Figure 2(a-c): Weekday time sequence for the Solent Region, UK, showing temporal redistribution of population during the day (three sample time slices – 02:00 ‘night-time’ and 09:00 and 18:00 subdivisions of ‘day-time’

7


E– Proceedings of European Forum for Geostatistics Conference 5 - 7 October, 2010 Tallinn, Estonia counting of population. Obvious areas for extension include modelling of non-road transportation from digital timetable data (Martin et al. 2008). An important area of future development will be to evaluate the possibility of drawing linked data dynamically from appropriate data services, which could in due course include the introduction of near-real time information on monitored population movements such as traffic flows. It seems probable that although the detailed algorithmic and data-specific details will continue to differ between countries and researchers (as between Landscan USA and the work described here), the most important conceptual issues will be consolidated and become more widely accepted, as the availability of more time-specific population data fosters new and more powerful application examples.

Acknowledgement and data sources This research is supported by Economic and Social Research Council Award RES-062-23-0081. Employee data from the Annual Business Inquiry Service, National Online Manpower Information Service, licence NTC/ ABI07-P3020. Office for National Statistics 2001 Census: Standard Area Statistics (England and Wales): ESRC Census Programme, Census Dissemination Unit, Mimas (University of Manchester). National Statistics Postcode Directory Data: Office for National Statistics, Postcode Directories: ESRC Census Programme, Census Geography Data Unit (UKBORDERS), EDINA (University of Edinburgh). Quarterly Labour Force Survey, Economic and Social Data Service, usage number 40023. Mastermap ITN layer: © Crown Copyright/ database right 2009, an Ordnance Survey/EDINA supplied service.

References Ahola, T., Virrantaus, K., Krisp, J. M. and Hunter, G. J. (2007) A spatio-temporal population model to support risk assessment and damage analysis for decisionmaking. International Journal of Geographical Information Science 21, 935-953 Bhaduri, N., Bright, E., Coleman, P. and Urban, M. L. (2007) LandScan USA: a high-resolution geospatial and temporal modeling approach for population distribution and dynamics. Geojournal 69, 103–117

8

McPherson, T. N. and Brown, M.J. (2004) Estimating daytime and nighttime population distributions in U.S. cities for emergency response activities. Presented at Symposium on Planning, Nowcasting, and Forecasting in the Urban Zone, Seattle: American Meteorological Soci-

www.efgs.info

ety http://ams.confex.com/ams/pdfpapers/74017.pdf McPherson, T. N., Rush, J. F., Khalsa, H., Ivey, A. and Brown, M. J. (2006) A day-night population exchange model for better exposure and consequence management assessments. Presented at the Sixth Symposium on the Urban Environment, Atlanta: American Meteorological Society http://ams.confex.com/ams/ pdfpapers/105209.pdf Martin, D. (1989) Mapping population data from zone centroid locations Transactions of the Institute of British Geographers NS, 14, 90-97 Martin, D., Jordan, H. and Roderick, P. (2008) Taking the bus: incorporating public transport timetable data into health care accessibility modelling Environment and Planning A 40, 2510-2525 Martin, D., Cockings, S. and Leung, S. (2009) Population 24/7: building time-specific population grid models Paper presented at the European Forum for Geostatistics Conference, Statistics Netherlands, The Hague, 5-7 October Shadbolt, N., Berners-Lee, T. and Hall, W. (2006) The Semantic Web Revisited. IEEE Intelligent Systems, 21 (3). pp. 96-101. Sleeter, R. and Wood, N. (2006) Estimating daytime and nighttime population density for coastal communites in Oregon. Proceedings of the Annual Conference of the Urban and Regional Information Systems Association Smith, G. and Fairburn, J. (2008) Updating and improving the National Population Database to National Population Database 2. Research Report 678, London: Health and Safety Executive


www.efgs.info

E– Proceedings of European Forum for Geostatistics Conference 5 - 7 October, 2010 Tallinn, Estonia

. Characterizing High—Resolution Population Distributions: a LandScan Experience Eddie Bright* and Budhendra Bhaduri* * Geographical Information Science and Technology Group, Computational Sciences and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831

Introduction Accurate representations of population distribution are critical for a wide variety of research needs including resource management, policy analysis, risk analysis, and emergency preparedness. Multivariate, dasymetric population distribution models developed at Oak Ridge National Laboratory (ORNL) apply GIS and Remote Sensing data and technologies to spatially and temporally disaggregate census counts producing a nonuniform distribution of the population. The integration of demographic and transportation models with LandScan allow the unique capability of estimating time-variant population dynamics which can greatly enhance the analytical capability for numerous applications. Scientific and technological advances and lessons learned through Oak Ridge National Laboratory’s LandScan population distribution and dynamics modelling research program address identifying population distribution in space and time, delineating geographic variability of population densities with respect to settlement structures, and demographic and activity characterization through image driven analysis.

weighted for the possible occurrence of population during a day. Since no single population distribution model can account for the differences in spatial data availability, quality, scale, and accuracy as well as the differences in cultural settlement practices, LandScan population distribution models are tailored to match the data conditions and geographical nature of each individual country and region. LandScan makes annual improvements to the population distribution data using new spatial data, imagery, census information, and algorithm improvements. Analysts must reconcile spatial inconsistencies due to data scale, accuracy and currency for each data layer to coincide with the local settlement characteristics.

LandScan: A Global High-Resolution Population Distribution Database Using an innovative approach with Geographic Information System and Remote Sensing, ORNL’s LandScan is the community standard for global population distribution. At 30 arc-second resolution (~1km2 at equator), LandScan is the finest resolution global population distribution data available and represents an “ambient population” (average over 24 hours). An ambient population integrates diurnal movements and collective travel habits into a single measure. Since natural or man-made emergencies may occur at any time of the day, the goal of the LandScan model is to develop a population distribution surface in totality, not just the locations of where people sleep. Because of this ambient nature, care should be taken with direct comparisons of LandScan data with other population distribution surfaces. The LandScan algorithm, an R&D 100 Award Winner, uses spatial data and imagery analysis technologies and a multivariate dasymetric modelling approach to disaggregate census counts within an administrative boundary. The modelling process uses sub-national level census counts for each country and primary geospatial input or ancillary datasets, including land cover, roads, slope, urban areas, village locations, and high resolution imagery analysis; all of which are key indicators of population distribution. Based upon the spatial data and the socioeconomic, cultural understanding, and settlement characteristics of an area, cells are preferentially

High resolution imagery is employed in every phase of the LandScan population distribution modeling process. At the outset, high resolution imagery is used to identify settlement patterns and building characteristics. Imagery is used to evaluate the accuracy and precision of the different spatial data layers used in the models as well as to adapt the weighting factor for each layer in the model algorithms. Preliminary model output is superimposed on high resolution imagery to verify relative population distributions and magnitude. High resolution imagery is used to refine population distributions and correct spatial data errors. Many modifications are made to urban areas and urban extents as derived land cover data often do not reveal urban properties such as building densities or building heights that can be readily inferred with analysis using high resolution imagery.

9


E– Proceedings of European Forum for Geostatistics Conference 5 - 7 October, 2010 Tallinn, Estonia Imagery analysis includes novel image processing algorithms developed at ORNL to exploit high resolution imagery using high-performance computers.

www.efgs.info

the input data for inaccurate, missing, or outdated data. Population model output is compared to high resolution imagery and spatial errors are corrected. LandScan USA, produced for the Department of Homeland Security, is an integral component of numerous geospatial models and analyses. LandScan USA is updated annually by incorporating new census estimates, labor force assessments, population dynamics and mobility appraisals, and an abundance of refined spatial data. LandScan USA is used in a variety of applications including emergency planning and management, rapid risk assessment, evacuation planning, consequence assessment, and mitigation implementations. LandScan USA Day and Night

LandScan USA: Very High-Resolution Population Distribution for the US LandScan USA is a dasymetric population distribution model developed at Oak Ridge National Laboratory (ORNL) depicting the location of people in both space and time. The model is an extension and enhancement of the methodology used in the LandScan global project, but including diurnal variations at a much finer spatial resolution. Nighttime (residential) and daytime population data are produced at 3 arc seconds (approximately 90 m). However, enhancing the spatial and temporal resolution of population distribution poses a greater challenge including the research issues of disparate and misaligned spatial data and modeling to develop a very high-resolution database at a national scale. Human Population distribution behaves as a function of both space and time. However, the spatial aspect of population distribution has received the most attention of the interpolation methods. Population distribution, as it directly relates to various human activities, can be functionally described by the various demographic groups representing those activities. Mobility of population from their residences results from temporary relocation to places of daytime activities that include places of education (schools, colleges, and universities), employment, businesses (shopping, post offices, restaurants, and others), or recreational areas (parks, museums, and other tourist attractions) during the day.

10

The population distribution model incorporates census data at the block-level and uses demographic attributes (age, sex, race), as well as other socioeconomic data including places and journey to work. Other vital information used to enhance the LandScan USA output includes transportation networks, cultural attractions, academic institutions, prisons, shopping malls, and commercial areas. Spatial precision is enhanced by incorporating local parcel address points and individual building footprint and heights derived from high-resolution Lidar data. Validation and verification is accomplished on a county by county basis by assessing

Transitional LandScan USA: Extending LandScan USA population distribution model to include mobile populations LandScan USA, developed at Oak Ridge National Laboratory, seeks to overcome the limitations of spatial and temporal aggregation of census data and to realistically distribute population. High resolution population data have proven useful as inputs to studies relating to environmental, public health, and disaster events. LandScan USA data is considered a baseline population distribution product. That is, everyone is sleeping in their own bed and going to the normal daytime locations such as schools and workplaces. However, for certain areas or for specific temporal conditions, the baseline population


www.efgs.info

E– Proceedings of European Forum for Geostatistics Conference 5 - 7 October, 2010 Tallinn, Estonia

distributions may inadequately define the distribution and magnitude of local populations. A temporally refined population distribution must also include “transitional populations” – these mobile populations include business/leisure travelers, seasonally migratory populations, and crowds at special events. For example, numerous areas (e.g. Orlando, Las Vegas, and New York) have a significant daily influx of tourists and business travelers that are not accounted for in typical population analysis. Other regions or towns may incur large seasonal variations in local populations (e.g. Southern states and ski resort towns in winter, and coastal areas and tourist attractions in summer). Certain facilities (e.g. hotels, airport terminals, theme parks, and sporting arenas) may experience both extreme diurnal transitional population variations and large seasonal variations in populations. The incorporation of these dynamics enable more detailed analyses of populations. Current efforts include creating geodatabases for identifying populations during holidays, celebrations, and special events, accounting for business/leisure traveler population distributions, discovering spatial/temporal patterns and trends of business and leisure travelers. These data are used to construct and spatial mobility models to better understand population dynamics and to incorporate transitional populations with the baseline population distribution of LandScan USA. .

LandScan development references Cheriyadat, A.M., Vatsavai, R.R., Bright E.A., “Modeling Spatial Dependencies in High-Resolution Overhead Imagery”, Proceedings of 2010 IEEE Proceedings of the 39th Applied Imagery Pattern Recognition Workshop, AIPR 2010. Vijayaraj V., Cheriyadat, A.M., Sallee, P., Colder, B., Vatsavai, R.R., Bright E.A., and Bhaduri B.L., “Overhead Image Statistics”, Proceedings of 2008 IEEE Proceedings of the 37th Applied Imagery Pattern Recognition Workshop, AIPR 2008. Potere, D., Feierabend, N., Bright, E., Strahler, A. “Walmart from Space: A New Source for Land Cover Change Validation” Photogrametric Engineering and Remote Sensing. Vol 74. July 2008. Vijayaraj V., Bright E.A., and Bhaduri B.L., “High Resolution Urban Feature Extraction for Global Population Mapping using High Performance Computing”, Proceedings of 2007 IEEE International geosciences and remote sensing symposium, IGARSS 2007. Bhaduri, B., Bright, E., Coleman, P., Urban, M. “LandScan USA: A High Resolution Geospatial and Temporal Modeling Approach for Population Distribution and Dynamics” GeoJournal. 2007. 69: 103-117. Cheriyadat, A., Bright, E.A., Bhaduri, B.,and D. Potere. “Mapping of Settlements in High Resolution Satellite Imagery using High Performance Computing” GeoJournal. 2007 69:119-129. Patterson, L, et al. “Assessing Spatial and Attribute Errors in Large National Datasets for Population Distribution Models,” GeoJournal. 2007. 69:93-102. Cai, Q., G. Rushton, B. Bhaduri, E. Bright, and P. Coleman. “Estimating Small-area Populations by Age and Sex Using Spatial Interpolation and Statistical Inference Methods.” Transactions in GIS, 2006, 10(4): 577–598.

Acknowledgment Prepared by Oak Ridge National Laboratory, P.O. Box 2008, Oak Ridge, Tennessee 37831-6285, managed by UT-Battelle, LLC for the U. S. Department of Energy under contract no. DEAC05-00OR22725. Copyright This manuscript has been authored by employees of UT-Battelle, LLC, under contract DE-AC05-00OR22725 with the U.S. Department of Energy. Accordingly, the United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes.

Tobin, K., B. Bhaduri, E.Bright, A. Cheriydat, T. Karnowski, P. Palathingal, T. Potok, J. Price. “Automated Feature Generation in Large-Scale Geospatial Libraries for Content-Based Indexing.” Photogrammetric Engineering & Remote Sensing. Vol. 72 May 2006. Dobson, J. E., E. A. Bright, P. R. Coleman, and B.L. Bhaduri. “LandScan: a global population database for estimating populations at risk.” Remotely Sensed Cities Ed. V. Mesev, London: Taylor & Francis. 2003. 267-281. Bhaduri, B.L., Bright, E.A., Coleman, P.R., and Dobson, J.E. 2002. LandScan: Locating People is What Matters. Geoinformatics Vol. 5, No. 2, pp. 34-37. Dobson, J. E., E. A. Bright, P. R. Coleman, R. C. Durfee, B. A. Worley. 2000. “A Global Population database for Estimating Populations at Risk”, Photogrammetric Engineering & Remote Sensing Vol. 66, No. 7, July, 2000.

11


E– Proceedings of European Forum for Geostatistics Conference 5 - 7 October, 2010 Tallinn, Estonia

www.efgs.info

Construction of Gridded Population and Poverty Data Sets from Different Data Sources Deborah Balk*, Gregory Yetman** and Alex de Sherbinin** *Baruch College, City University of New York,**CIESIN, The Earth Institute at Columbia University

The Center for International Earth Science Information Network (CIESIN) was, together with partners at the National Center for Geographic Information Analysis (NCGIA), the first group to produce and distribute gridded census products. CIESIN and NCGIA released Gridded Population of the World (GPW) v.1 in 1995. Since that date, CIESIN has produced a large number of gridded socioeconomic data products, including two updates to GPW (versions 2 and 3), a 1km resolution urban-rural grid (GRUMP), US census grids, global infant mortality and malnutrition grids, and gross domestic product grids. The baseline gridding is done using a proportional allocation algorithm on the highest spatial resolution census or survey data available, and for GRUMP, population are reallocated based on urban-rural boundaries as defined by night time lights and estimates of settlement population size. This paper describes the products and the methods used to produce them. Introduction Global or broad-scale inquiry on the relationship between population and the environment is intrinsically spatial, however, much of the analysis occurs in a spatial vacuum. While notable exceptions exist, especially at the local scale two key barriers have contributed to the lack of spatially-oriented analysis: (1) the methods of analysis require some knowledge of geographic data and tools for analysis; and (2) population data, at a global scale, tend to be recorded in national units rather than those that would permit cross-national, sub-national analysis. These barriers have been slowly eroding. On the demand side, demographers are becoming more familiar with geographic constructs, data and technology (and the technologies are becoming more relevant - e.g., in terms of spatial analysis - to demographers). On the supply side, data and tools are becoming increasing available. This paper describes recent developments in rendering global population and poverty data at the scale and extent require to facility broad-scale humanenvironment inquiry. It first describes the Gridded Population of the World (GPW), then turns to the Global Rural - Urban Mapping Project (GRUMP), US Census Grids, and two poverty mapping products: infant mortality rates and child malnutrition.

12

Nearly twenty years have passed since the first efforts to render population data, primarily from censuses, on a latitude-longitude grid on a global scale (Tobler et al., 1997; Clark and Rind, 1992). In those years, several key advances have been made: The spatial resolution of administrative boundary data is improving; national statistical offices and spatial data providers and related institutions are becoming more open with their data; population and spatial data providers are increasingly

aware of (or collaborate with) one another; and lastly, computing capacity to manage, manipulate, and process increasingly large data sets is continually expanding. This bodes well for the future of gridding. Gridded data are useful for a range of application areas related to human-environment interactions, natural hazards, and environmental health (see for example Balk et al. 2006 and 2005, de Sherbinin 2009, Dilley et al. 2005), and CIESIN has collected more than 400 citations to journal articles citing GPW alone. But the focus of this paper is on construction of these data sets. Gridded Population of the World (GPW) The basic methods, developed for GPW v1 (Tobler et al., 1997) and modified slightly for GPW v2 (Deichmann et al., 2001), remained more or less the same in the development of GPW v3. Population data are transformed from their native spatial units, which are usually administrative and of varying resolutions, to a global grid of quadrilateral latitude-longitude cells at a resolution of 2.5 arc minutes, which equates to approximately 4km on a side at the equator (Figure 1). Slight modifications have been made to the processing, and the increases in input resolution have meant that GPW v3 relied more heavily of interpolations of population data that rely on spatial hybrids (e.g., growth rates between states in 1990 and 2000 are applied to the spatial distribution of population in municipalities in the year 2000). The steps used to develop GPW include the following: 1. Find tabular population counts 2. Match these to geographic boundaries (census or administrative units) 3. Estimate the population for target years (e.g. 1990, 1995 and 2000) 4. Transform to grids Depending on the country, steps 1 and 2 often requires substantial effort to reconcile tabular counts with boundaries that may have been produced by a third party. The matching can be especially difficult where new census or administrative units are created from subdivisions of units from earlier time periods. Any assumptions that went into the assignment of population counts into subdivided units are specified on the individual country pages of the GPW web site (go to “Data and Maps� and click on the country name to access these pages).


www.efgs.info

E– Proceedings of European Forum for Geostatistics Conference 5 - 7 October, 2010 Tallinn, Estonia

Figure 1: Transforming census units to a 2.5min grid

As for Step 3, estimating the population to target years, the method is relatively straightforward. Annual rates of change are calculated as follows:

 P2  log e   P1  r t Population estimates are adjusted to target years: Px = P2 ert Adjustment factors for matching national estimates to UN estimates are calculated as follows: a = (Pun - Px) / Pun Adjustment factors are applied at the national level : Padj = Px * a Where r is the annual rate of growth, P1..2 is the census estimate, t is the number of years between census enumerations, Px is the year of estimate, Pun is the UN Estimate, Padj is the adjusted estimate, and a is the adjustment factor. The last step (Step 4) entails a proportional allocation algorithm, as illustrated in Figure 2. Assuming the administrative unit has an average population density of 628.5 persons per sq. km, each grid square is allocated a population count from that unit proportional to the area of the unit located in that grid cell. If the grid cell covers another unit, then the population in proportion to the area covered in that unit is added to the grid cell.

Figure 2: The proportional allocation algorithm

GPW is an effort to amass information on the distribution of human population without modelling. However, there are many good reasons for modelling. For example, census data typically represent a decennial, residential picture of population distribution. It does not indicate daytime or seasonal distribution, non-residential patterns such as transportation zones, or built-up industrial and commercial areas. Another reason for modelling is that GPW’s accuracy is closely related to that of the accuracy of census data. If these data are old (i.e., no new census in many years), coarse (national or coarse-level only), or believed to otherwise be of poor quality, additional information may be very useful in estimating the distribution of human population. Thus, over the past decade, many efforts have focused on efforts to model population distribution. These have ranged from lightly modelled approaches, with urban areas (addressed below; CIESIN et al., 2004) or roads (UNEP et al., 2001) or heavily modelled with these and other inputs to reallocation population (e.g., LandScan, see Dobson et al., 2000). We argue that these modelled datasets are complementarily to GPW’s heuristic method. Over time, the greatest investment has been made in increasing the number of input units. Table 1 describes the number of input units by continent for GPW v3. In 1994, the first GPW database was developed using about 19,000 units, and rendered at an output resolution of 5 minutes; whereas the second version had nearly 120,000 input units, about half of which were due to the inclusion of tract-level data for the United States. The third version has over 375,000 inputs units, with no improvement to the resolution of the inputs for the United States (although higher resolution data are available) , but substantial improvements for other countries including both geographically large and small entities: South Africa (80,000), Indonesia (60,000), France (36,000), Malawi (9,000) and Brazil (5,500) . These along with the U.S., account for 70% of the units in the database, 17% of the global land area and roughly 13% of the population.

13


E– Proceedings of European Forum for Geostatistics Conference 5 - 7 October, 2010 Tallinn, Estonia

www.efgs.info

Table 1: Summary information on input units for GPW v3, by continent

Continent Africa Asia Europe North America Oceania South America Global

Modal Level* 2 2 2 2 1 2 2

Total Number of Average Average Persons per Units Resolution Unit 109,138 73 166 88,782 53 276 91,086 25 112 74,421 29 83 2,153 25 27 10,919 68 49 376,499 46 144

Earlier versions of GPW had less motivation to gather higher resolution inputs because the output resolution of 2.5 minutes rendered finer input resolution redundant. GPW v3, however, was also used as an input to the Global Rural Urban Mapping Project (GRUMP) population grid (see below) that includes reallocations towards urban area and whose output resolution is 30 arc seconds; at this resolution, the effort to find higher resolution spatial inputs was justified. Often, these new inputs had to be heads-up digitized, since digital versions of these data were not available. For countries that are comprised of island chains, the improvements consisted of collecting island-level population data, and then assigning population to existing spatial inputs. GPW v2 had 41 level-0 countries, 31 of which were islands, which had an average resolution of 46. In version 3, fewer than half of these countries remain (with a slightly smaller share of them being islands) with an average resolution of 22. The ideal resolution for GPW administrative units is somewhere close to the size of a few grid cells (i.e., for a 2.5 arc-minute cell at the equator, this would be an administrative unit area of 85 square km). For GRUMP, which has a resolution of 30 arc-seconds, the ideal administrative unit would have an area of only 4 square km (CIESIN et. al., 2004). Where high-level boundary data (level 4 or greater) are available, the area of administrative units in densely populated areas exceeds the GPW ideal resolution and, in some areas, even that of GRUMP. In low-density areas, even where the highestlevel boundary data are available, the administrative units are much larger than these ideal sizes. However, administrative units this detailed over sparsely inhabited regions would be inefficient to process (they would comprise over 2 million units for GPW), they would add little or no additional information to the distribution of population, and they would be infeasible to maintain.

14

In terms of temporal resolution, GPW v3 provides estimates for 1990, 1995, 2000, 2005, and 2015. Most countries of the world have now experienced two census in their recent history and with the exception of Africa and some parts of the middle East, West Asia and East Europe, most countries have had a census taken recently, since or in the year 2000. Figure 3 describes the relative reliability for 1995, with countries in darker col-

ours having more recent censuses and those in lighter colours having older data only. Figure 3: Recency of data for GPW v3

When higher resolution data become available, often the associated population are only available for a single (recent) time period, although in some exceptional cases population (e.g., France) estimates are given for a range of dates. It is not uncommon for the relevant statistical offices to not know how the current thematic population map matches to one from a prior time period. Thus, much of the work of preparing this database is to reconcile such differences in geographies resulting from temporal change. Aside from war torn countries, which often to lack current data altogether, countries undergoing periodic and medium to large-scale political or administrative reorganization pose the greatest challenge. This is a more general issue, however, because it is a normal part of geographic and administrative change, and it tends to occur most commonly at a fine-scale (i.e., state boundaries change much less frequently than higherresolution boundaries like municipios or counties). To the extent future efforts to amass data at the current scale are undertaken, it will persist. In terms of methodological advances, all information is couched on correspondence between geographic units, which means if there were large changes in spatial units (e.g., Namibia or the former Soviet Republics) that some of the spatial specificity of population change over time may be lost. For example, boundaries in 2001 that differ from most of those for in 1991 require construction of artificial regions to generate growth rates to interpolate and extrapolate to the target years. Transformations of this nature are clearly documented on the GPW web site on a country-by-country basis. Although we create a correspondence between the two geographies (where


www.efgs.info

E– Proceedings of European Forum for Geostatistics Conference 5 - 7 October, 2010 Tallinn, Estonia

available) for interpolating population values to target years, we only use one year of boundary data for creating the population grids. In this manner, the best spatial resolution can be retained while incorporating subnational population change information via the correspondence. In cases where the two geographies are at the same level (e.g., Canada and the United States), only the most recent geography is used for gridding. This reduces the labour in preparing the data and the amount of processing time required for gridding. Because countries vary between each other and internally on the size of the administrative areas, analysis of the data may benefit from more information about the administrative area underlying each unit in the output grid. Thus, for GPW version 3 we constructed a population-weighted administrative unit area layer. This layer allows the determination, on a pixel-by-pixel basis, of the mean administrative unit area that was used as an input for the population count and density grids. For grid cells (pixels) that are wholly comprised of one input unit, the output value is the total area of the input unit. Where grid cells are comprised of multiple input units, the output value is the population-weighted mean of all of the inputs. There have also been improvements in production methods. Quality in production has become more standardized, thus allowing for the identification of anomalies and errors introduced in processing. There are several barriers that limited improvements for GPW v3. Most of the former Soviet republics underwent redistricting in the 1990s, but few of them make their spatial data available, either freely or for a fee. Recently war-torn countries take a while to implement new censuses, although they may be the places most susceptible to population movements. In some instances, official population data are available while official boundary information are not. In such instances, if unofficial boundary information is available (e.g., Bosnia Herzegovina) is incorporated, if at all possible. Several countries were just too expensive to purchase census or spatial data. Many of the former British colonies sell licenses to use their fine-resolution census data rather than release it freely. This meant that it would have cost thousands of dollars to update Australia and New Zealand at the level that we had undertaken for GPW v2. Because the last reference year for population data for version 2 were in 1996 at high resolution for these countries, they were updated at a coarser resolution—using the hybrid method described above—for which the data were publicly available.

with populations >5,000 persons, and a population grid with urban reallocation at 30 arc-second resolution. It is the latter that is the focus of this paper. The GRUMP population grid is a 30-arc second population distribution raster dataset that was developed by combining population data from the census administrative units and from the urban extent mask. To create the population surface, we developed a mass - conserving algorithm called GRUMPe (Global Rural Urban Mapping Programme) that reallocates people into urban areas, within each administrative unit. In particular we used data inputs from two vector sources: (1) Administrative polygons, containing the total population for each administrative unit; (2) Urban areas, containing the urban population for each area. These two data sets are combined in such as way that an intermediate (polygon) data set representing the urban and rural areas, but which does not assign populations into those areas, is produced. This intermediate dataset is then passed to GRUMPe, a stand-alone model written in C, that assigns population to each new polygon and labels it as rural or urban. Typically, the algorithm works on a country-by-country basis and uses the following pieces of information: The size and population of each urban area, denoted by a unique urban area identifier, the size and population of each administrative area denoted by a unique administrative identifier, the size of the intersect areas where the urban and administrative areas overlap, and the UN national estimates for the percentage of the population in urban and rural areas (UN, 2002). The goal of the algorithm is to reallocate the total population in each administrative unit into rural and urban areas while reflecting the UN national rural-urban percentage estimates closely as possible. The algorithm was designed to have few constraints and to make the constraints simple and reflect common sense. There are 6 constraints in total: (1) The total population (urban + rural) within any given administrative units remains constant; (2) The urban population density in any given administrative unit must be greater than the rural population density in that administrative unit; (3) The rural population density in any given administrative unit cannot be lower than a national minimum rural population density threshold; (4) The rural population density in any given administrative unit cannot be greater than a national maximum rural population density threshold; (5) The urban population density in any given administrative unit cannot not be greater than a national maximum urban population density threshold; (6) The urban population density in any given administrative unit cannot not be lower than a national minimum urban population density threshold.

Global Rural Urban Mapping Project (GRUMP) There are actually three products included in GRUMP – an urban settlements points data set of more than 70,000 unites with populations >1,000 persons, an urban extents “mask’ of more than 27,500 urban areas

The algorithm works on each administrative unit in turn, and checks the urban and rural populations within that administrative unit against constraints 2 to 5. If any of the constraints are not met, then the rural and/or urban populations are adjusted literately to meet them while ensuring that constraint 1 is met. These constraints

15


E– Proceedings of European Forum for Geostatistics Conference 5 - 7 October, 2010 Tallinn, Estonia and the national population density thresholds are controlled by parameters that are passed to the algorithm. If no parameters are specified then the algorithm will assign fixed values that have been empirically determined to be good first estimates. The adjustment in population is trivial when there are no or one urban area per administrative unit, and where the urban area lies wholly within the administrative unit. It becomes increasingly complex however when there are more than one urban area, and urban areas overlap more than one administrative area (e.g., Cali, Colombia), and large urban areas contain more than one administrative area (e.g., Quito, Ecuador). All of these are common situations, and may require successive iterations to meet all the constraints. The algorithm can also be run on a region-by-region basis (such as states or other firstlevel administrative units), such that the national constraints (3 to 6) now become regional constraints and will better reflect the state-level variation in rural/urban population percentages in large countries like the USA. This approach was employed for most of the largest countries or countries with very large numbers of administrative units (e.g., South Africa). The resulting map, Figure 4 - a close-up of Cali, Colombia - shows the data before and after running GRUMPe. Note how, where urban areas are present in a given administrative unit, the density of the GPW administrative units decreases after GRUMPe because people are reallocated into their respective urban areas. The final results from each country are then compared to the UN urban population estimates. Although the UN totals are useful as a benchmark, they are only that. Not only have recent studies shown the uncertainty associated with UN urban estimates (NRC, 2003), there are many reasons why our estimates may differ considerably from that of the UN’s. For example, our data stream may have included many more small settlements, including those below the urban threshold either given by the country, or implied by the region, in which case we would expect the comparison between percentages of the population living in urban areas to be quite different between the two. We estimate that in X% of the countries, we had a priori reasons to expect much different outcomes from the UN estimates (mostly but not always for the better), and in another Y% for them to match rather closely because our data streams matched closely those which they also report. In the remainder of the countries, we had no information either way to predict the closely to those estimates. The final stage is to convert the output coverage from GRUMPe into a grid, at 30 arc-seconds resolution. US Census Grids

16

The U.S. Census Grids are created by taking population and housing counts at the block level and proportionally allocating the counts in the census blocks to a latitude-longitude quadrilateral grid (Seirup et al. 2006).

www.efgs.info

Figure 4: Demonstration of the urban reallocation for an area near Cali, Colombia

Grid cell size for the country is 30 arc-seconds (approximately 1km), and for metropolitan areas it is 7.5 arc-seconds (250 meters). It uses the same proportional allocation algorithm as GPW v.3. If a grid cell contains 40% of the area of one census block and 30% of the area of a second census block, the population count for that grid cell will be 40% of the population of the first census block and 30% of the population of the second census block. U.S. Census Grids, 2000 uses the 2000 TIGER/Line files for the census block boundaries and 2000 SF1 and SF3 tables for the demographic and socioeconomic characteristics of each census block. SF1 data are based on the census short form and therefore include counts for the total population. The SF3 data are based on the census long form, which is sent to approximately one out of every six households. For the 30 arc-second grids using SF1 data, the relevant fields are extracted from the SF1 tables. Some of the grids contain data from a single field, such as the number of non-Hispanic whites, which is taken from SF1 table P8, field P008003. Other grids use data from several fields. This is true of all the age grids, which are derived from SF1 tables P12 and P14. These tables report the number of people in each age category by gender. The grid for the population under age one uses SF1 table P14, field P14003 (the number of males under age one) plus SF1 table P14, field P14024 (the number of females under age one). The Variable Catalog contains a list of the fields used for each census grid. The TIGER/Line files are converted to ArcInfo coverages and joined to the SF1 field variables. The density of the variable being gridded is calculated for each census block, for example the number of foreign born residents per square kilometer. A 30 arc-second quadrilateral grid is intersected with the census block coverage. This divides each census block into pieces that fit into the grid cells. The total count for the grid cell is calculated by taking the area of each census block piece within the grid cell, multiplying it by the density of the variable


www.efgs.info

E– Proceedings of European Forum for Geostatistics Conference 5 - 7 October, 2010 Tallinn, Estonia

being gridded, and summing these values for all census block pieces in the grid cell. The lowest level of geography for which SF3 data are released is the census block group. For the SF3 grids, these data are proportionately allocated to census blocks using the distribution of the underlying SF1 population. For instance, if 35% of the block group’s population aged 25 and older lives in a given census block, as reported in the SF1 tables, 35% of the block group’s population aged 25 and older with a high school diploma, as reported in the SF3 tables, is assigned to that census block. Once the SF3 data have been allocated to the census block level, the gridding process is the same as described above. The metropolitan statistical area (MSA) grids are created by selecting all census blocks within the MSA and gridding those blocks using a 7.5 arc-second quadrilateral grid. An example of a US Grid map is found in Figure 5. Figure 5: Foreign born population, 2000

Poverty Mapping CIESIN’s poverty mapping work was developed in connection with the Millennium Development Project (MDP), a research project led by Jeffrey Sachs at the Earth Institute at Columbia University that was designed to help inform governments on how to best meet the Millennium Development Goals (MDGs). To assist the MDP poverty and hunger task forces, CIESIN developed two datasets: a global map of infant mortality rates (Figure 6) (Storeygard et al. 2008), and a global map of child malnutrition (Figure 7). Infant mortality rates (IMRs) were chosen because they serve as a useful proxy for overall poverty levels and are highly correlated with metrics such as income, education levels, and health status of the population (Balk et al., 2006). This metric is particularly good for distinguishing poverty levels at the lower end of the income ladder.

Unlike the Census gridding, the sources of data for infant mortality rates were largely survey data. Sample sizes dictated that the sub - national units are generally larger in size, since results can only be reported at the geographic scale at which they are still robust. The sources for the IMR map include Demographic and Health Surveys (DHS) (39 countries), Multiple Indicator Cluster Surveys (MICS) (5 countries), National Human Development Reports (14 countries), and National Statistical Offices (18 countries). There are only 6,494 spatial units in the global data base, 82 percent of which are in Brazil and Mexico (5,372 units). There are 74 other countries with subnational data, with an average of 22 subnational units per country. Finally, for 115 countries we only had national level data from UNICEF, and 36 countries had no data. For each country the subnational IMR values were adjusted to be consistent with national UNICEF 2000 IMR values. The data were gridded using the same proportional allocation algorithm as the population grids. Note that the grids are actually at a higher spatial resolution than the shape files because for some countries subnational administrative boundaries could not be distributed. We also converted rates to counts of infant deaths. To do this, for each subnational unit we estimated live births and infant deaths. These were calculated based on gridded population, national fertility data, and subnational IMR data. For child malnutrition, we used anthropometric data found in household surveys. The metric chosen is percent of children underweight, with underweight defined as being two standard deviations or more below the mean weight for a given age when compared to an international reference population. Although there are alternative measures of malnutrion, such as stunting (low height for age) and wasting (low weight for height), the percentage of children underweight was chosen because the MDG Target is to “halve the prevalence of underweight children by 2015.” DHS and MICS data were aggregated to the spatial units at which the surveys report, based on raw data where it was available, and published reports by UNICEF otherwise. There were a total of only 369 spatial units, but it should me mentioned that developed countries were omitted because of very low levels of child malnutrition. These spatial units are typically equivalent to first level administrative regions or aggregations thereof. Geospatial boundary files that match those spatial units were located or created in order to match the reporting regions of the surveys as closely as possible. In many cases, the survey reports contained maps detailing the survey regions. Elsewhere, matches were purely name-based. Note that the dates of the surveys varied and no effort was made to standardize to a specific year (see Table 2).

17


E– Proceedings of European Forum for Geostatistics Conference 5 - 7 October, 2010 Tallinn, Estonia

www.efgs.info

Table 2 continues:

Figure 6: Infant mortality rates of the World, 2000

Figure 7: Child malnutrition, circa 2000

Egypt

2000

DHS

26

Equatorial Guinea

2000

MICS

2

Eritrea

2002

DHS

6

Ethiopia

2000

DHS

11

Gabon

2000

DHS

4

Gambia

2000

MICS

8

Ghana

1998

DHS

10

Guinea

1999

DHS

5

Guinea-Bissau

2000

MICS

9

Kenya

2000

MICS

7

Lesotho

2000

MICS

10

Libya

1995

ANDI

7

Madagascar

1997

DHS

6

Malawi

2000

DHS

3

Mali

2001

DHS

6

2000-01

DHS

5

Mauritius

1995

UNICEF

1

Morocco

1992

DHS

7

Mozambique

1997

DHS

11

Namibia

2000

DHS

13

Niger

2000

MICS

6

Nigeria

1999

DHS

5

Rwanda

2000

MICS

12

Sao Tome and Prin-

1996

UNICEF

1

2000

MICS

10

No data

UNICEF

1

Mauritania

Table 2: Source data for child malnutrition map

Country

Units

Source

with

Abbre-

Hunger

viation

Data

Algeria

2000

MICS

4

cipe

Angola

2001

MICS

6

Senegal

Benin

2001

DHS

6

Seychelles

Botswana

2000

MICS

10

Sierra Leone

2000

MICS

4

1998-99

DHS

4

Somalia

2000

MICS

3

Burundi

2000

MICS

5

South Africa

1995

ANDI

9

Cameroon

1998

DHS

4

Sudan

2000

MICS

16

Cape Verde

1994

UNICEF

1

Swaziland

2000

MICS

4

Central African Rep.

2000

MICS

17

Togo

1998

DHS

5

1996-97

DHS

1

Tunisia

2000

MICS

7

2000

MICS

3

Uganda

2000-01

DHS

4

1998-99

No data

1

Tanzania

1996

DHS

22

Congo, Dem. Rep.

2001

MICS

11

Zambia

2000-01

DHS

9

Côte d'Ivoire

1994

DHS

10

Zimbabwe

1999

DHS

10

Djibouti

1996

No data

1

Burkina Faso

Chad Comoros Congo

18

Year

Hunger


www.efgs.info

E– Proceedings of European Forum for Geostatistics Conference 5 - 7 October, 2010 Tallinn, Estonia

Conclusion Efforts to grid socioeconomic data have progressed substantially since the first efforts at gridding census data by CIESIN and the National Center for Geospatial Information and Analysis (NCGIA) in the early 1990s. CIESIN has been at the forefront of these efforts, but it is heartening to see many other groups taking up the cause. The number of gridded census products has increased markedly in the past decade. New thrusts include adding the temporal dimension – both in terms of diurnal and seasonal population movements, and longitudinal data of night – time population (as measured by standard censuses). Additional census and survey variables are also being gridded. Looking to the near future, for GPW v4 we expect to make the following improvements. There will be a continued emphasis on higher resolution inputs. We will collect and grid more census variables, including age and sex distribution and urban/rural distribution. The proposed output resolution will be 30 arc-second grids. CIESIN may also create a time series back to 1980. Many barriers to data collection and processing have been overcome since the early versions of GPW to enhance our understanding of population distribution. The role of international technical assistance for population census taking and georeferencing enumerator area maps, has no doubt played an important part in improving spatial accuracy in geolocating populations. Along with these improvements come the possibility of new data streams and integrations, such as using satellite information to detect urban areas along with population information from censuses on human settlements. Such new efforts (see Balk et al., 2004) build strongly on GPW’s efforts. Undoubtedly, there will continue to be the need for information at different scales, extents, and resolutions, and that which is simple and that which is modelled. GPW – and its underlying data infrastructure – are critical foundations for future efforts. CIESIN is always looking for collaborators and data sharing partners to lighten the work load, so feel free to contact the authors if you feel you have something to offer. Acknowledgements This paper draws heavily from earlier papers, including Deichmann, Balk, and Yetman (2001) and Balk et al. (2004), as well as documentation available on the US Census Grids Web site (Seirup et al. 2006). The paper was produced with support from National Aeronautics and Space Administration under Contract NNG08HZ11C for the Continued Operation of the Socioeconomic Data and Applications Center (SEDAC) at CIESIN at Columbia University. For more information visit http:// sedac.ciesin.columbia.edu

References Balk, Deborah. 2009. “More than a name: Why is Global Urban Population Mapping a GRUMPy proposition?” In P. Gamba and M. Herold, (eds.) Global Mapping of Human Settlement: Experiences, Data Sets, and Prospects, (Taylor and Francis): 145-161. http:// www.crcnetbase.com/doi/abs/10.1201/9781420083408c7 Balk, Deborah, Glenn Deane, Marc Levy, Adam Storeygard, and Sonya Ahamed. 2006. The Biophysical Determinants of Global Poverty: Insights from an Analysis of Spatially Explicit Data. Paper presented at the 2006 Annual Meeting of the Population Association of America, Los Angeles, USA. Balk, Deborah, Adam Storeygard, Marc Levy, et al., 2005. “Child hunger in the developing world: An analysis of environmental and social correlates,” Food Policy, 30: 5-6 (2005) 584–611. Available at: http:// www. scienc edirect.com/science/article/B6VCB4HHWWG9-2/2/2f25e9cce26e94fa5b9aff7f6b95db62 Balk, Deborah, Francesca Pozzi, Gregory Yetman, Uwe Deichmann, and Andy Nelson. 2004. The “Distribution of People and the Dimension of Place: Methodologies to Improve the Global Estimation of Urban Extents,” Available at http:// sedac.ciesin.columbia.edu/ gpw/docs/ UR_paper_webdraft1.pdf Balk, Deborah and Gregory Yetman. 2004. Transforming Population Data for Interdisciplinary Usages: From census to grid. Available at: http:// sedac.ciesin.columbia.edu/gpw/docs/ gpw3_documentation_final.pdf Center for International Earth Science Information Network (CIESIN), Columbia University; and Centro Internacional de Agricultura Tropical (CIAT), 2004. Gridded Population of the World (GPW), Version 3. Palisades, NY: Columbia University. Available at http:// beta.sedac.ciesin.columbia.edu/gpw. Center for International Earth Science Information Network (CIESIN), Columbia University; International Food Policy Research Institute (IPFRI), the World Bank; and Centro Internacional de Agricultura Tropical (CIAT), 2004c. Global Rural-Urban Mapping Project (GRUMP): Gridded Population of the World, version 3, with Urban Reallocation (GPW-UR). Palisades, NY: CIESIN, Columbia University. Available at: http:// beta.sedac.ciesin.columbia.edu/gpw . Clark, John and David Rind, 1992. Population Data and Global Environmental Change. The International Social Science Council with the assistance of UNESCO, ISSC/UNESCO Series 5. 19


E– Proceedings of European Forum for Geostatistics Conference 5 - 7 October, 2010 Tallinn, Estonia de Sherbinin, Alex. 2009. “The Biophysical and Geographical Correlates of Child Malnutrition in Africa” Population, Space and Place Vol.15, available at http:// dx.doi.org/10.1002/psp.599. Deichmann, Uwe, Deborah Balk and Gregory Yetman, Oct. 2001. “Transforming Population Data for Interdisciplinary Usages: From Census to Grid,” available at http://sedac.ciesin.columbia.edu/plue/gpw/ GPWdocumentation.pdf. Dilley, Max, Robert Chen, Uwe Deichmann, Arthur L. Lerner-Lam and Margaret Arnold, with Jonathan Agwe, Piet Buys, Oddvar Kjekstad, Bradfield Lyon and Gregory Yetman. 2005. “Natural Disaster Hotspots: A Global Risk Analysis.” Available at: http://sedac.ciesin.columbia.edu/ hazards/hotspots/synthesisreport.pdf Seirup, Lynn, Greg Yetman, and CIESIN at Columbia University. 2006. U.S. Census Grids, 2000. Palisades, NY: Socioeconomic Data and Applications Center (SEDAC), Columbia University. Storeygard, Adam, Deborah Balk, Marc A. Levy and Glenn Deane. (2008). “The global distribution of infant mortality: A subnational spatial view,” Population, Space and Place, 14 (3): 209-229. http:// onlinelibrary.wiley.com/doi/10.1002/psp.484/pdf Tobler, Waldo, Uwe Deichmann, Jon Gottsegen and Kelly Maloy. 1997. "World Population in a Grid of Spherical Quadrilaterals," International Journal of Population Geography, 3:203-225.

www.efgs.info

Geospatial data in EEA assessments – status and requirements Andrus Meiner European Environment Agency The analysis and views presented in this paper should be taken as the personal perspective of the author and cannot be regarded as the official position of the European Environment

Introduction The European Environment Agency (EEA) is an agency of the European Union. It has a task to provide sound, independent information on the environment and provide a major information source for those involved in developing, adopting, implementing and evaluating environmental policy, and also the general public. Currently, the EEA has 32 member countries. EEA has supported the EU Sixth Environment Action Programme (6th EAP) across its four priority areas for action: climate change; biodiversity; environment and health; and sustainable management of resources and wastes (EC, 2001), including development and analysis of geospatial information related to priority actions of the EAP. EEA strategy for 2009-2013 targets improving of our knowledge base. In particular, the Information and communication technology strategy towards 2013 is relevant to overall development of spatial data management. The main types of actions are: • Enhancing the EEA’s capabilities around spatial data, assuring INSPIRE implementation and development of Shared Environmental Information System; • Increasing EEA capacity to handle new types of data, such as near-real time data, satellite data, citizen observations (through mobile devices), models; • Strengthen role of EEA as European Environmental Data Centre and contribute to the European Spatial Data Infrastructure. Following these principles the EEA is managing its spatial data by applying a wider concept of Spatial Data Infrastructure distinguishing between four focus areas:

20

Institutional framework and organization

Technical standards and specifications

Geospatial data sets and metadata

Spatial information services

Besides producing some geospatial data EEA is a user of data produced by other organisations. To get maximum benefit from integration and assimilation of these data sources, the EEA has continuously worked on its user requirements.


www.efgs.info

E– Proceedings of European Forum for Geostatistics Conference 5 - 7 October, 2010 Tallinn, Estonia

Status of geospatial data Thematic environmental spatial data is needed for natural background conditions e.g. land cover, climatology and provide context for specific assessments. EEA also still needs more data on spatial distribution of essential environmental variables, despite good progress under implementation of several environmental directives e.g. Water Framework Directive, Habitats directive and some others. Often spatial analysis needs certain analytical/reporting units with environmental meaning, that are not always available. For example, work on harmonized geospatial data set of European surface waters and catchments. Increasingly challenging is the need for data on variability and state of ecosystems. EEA defines various non-environmental data as geospatial reference data. This data is needed for integrated assessment of socio-economic drivers and resulting pressures on natural systems. Most notably there is a need for geospatial data on human population patterns, several socio-economic variables and location of infrastructure. Such reference data should have higher relevance to regional and local conditions by improved spatial resolution i.e. for regional statistics. EEA has also its requirements for key properties of the data. This will involve update frequency, spatial representation of features i.e. discrete or continuous as well as origin of data i.e. Earth observation, in-situ, modelling. These data properties still provide a hurdle for sufficient integration of environmental data with other types of data mentioned above. EEA user requirements are not driven only by technical considerations. Policy relevancy of EEA products makes EEA interested in thematic geospatial data that is specifically useful for key legislation acts. Spatial data collection on freshwater features e.g. characterization of river catchments and water bodies (incl. groundwater) is guided by Water Framework Directive, Nitrates directive, Floods directive etc. Data on biodiversity such as species, habitats and ecosystem assessment should be relevant to EU biodiversity action plan, species and habitats reporting under Birds and Habitats directives. EU White paper on adaptation to climate change impacts, vulnerability and adaptation is also determining what geospatial thematic data should be collected. In some cases EEA is taking regional assessment approach e.g. to report on mountain or urban areas. Another example is marine and coastal area where data and analysis of coast and sea should support policy processes such as Water Framework Directive (transitional and coastal waters), Marine Strategy Framework Directive, EU Recommendation for Integrated Coastal Zone Management, Common Fisheries Policy and EU Integrated Maritime Policy, including the roadmap for Maritime Spatial Planning. Since 1990s the EEA has developed CORINE Land

Cover (CLC) data– a spatial representation of European land features with 44 classes across 36 countries (most recent change data for 2000-2006). Data has free access and presents land types as 100 m grid or polygons (minimum size 25 ha). A data base for change detection has been created by dissecting seamless land cover data layers using European reference grid at 1 km2 resolution and storing information of land cover types area of each hectare in a grid cell. For thematic analysis several European land cover layers are constructed: artificial surfaces, agricultural land, forests, wetlands etc. CLC has its normal hierarchical class nomenclature. Alternative aggregations of land cover types in this system allow construction of targeted thematic products e.g. green potential background is composition of all classes of natural land cover and some semi-natural agricultural classes, and is relevant to European Commission’s initiative on green infrastructure — http://green-infrastructure-europe.org/. Spatial data analysis requires availability and good quality of spatial analytical units based on spatial data that represents administrative units (NUTS 3, LAU2), but also sea catchments, river basins, elevation zones, biogeographical regions etc. Land accounting Availability of comparable land cover change data from CLC data sets facilitates the land and ecosystem accounting (LEAC). First, mapped land types are recorded as stocks. Accounting concept will then foresee determining gains and losses of land that will describe the land transitions over the time from one stock to another (EEA, 2006). This methodology is following the approach based on Handbook for Integrated environmental and economic accounting (SEEA, 2003). Land cover change accounts allow moving from maps to statistics. For example, land cover change 1990-2000 and/ or 2000-2006 are first converted to a grid (e.g. 1x1 km); then individual changes are grouped by land cover flows that describe land use processes, formulated as land cover flows (LCF). Below is the list of 1st level (most aggregated) LCF: •

LCF1 Urban land management

LCF2 Urban residential sprawl

• LCF3 Sprawl of economic sites and infrastructures •

LCF4 Agriculture internal conversions

• LCF5 Conversion from other land cover to agriculture •

LCF6 Withdrawal of farming 21


E– Proceedings of European Forum for Geostatistics Conference 5 - 7 October, 2010 Tallinn, Estonia

www.efgs.info

LCF7 Forests creation and management

trative units.

LCF8 Water bodies creation and management

LCF9 Changes due to natural & multiple causes

EEA is aiming at land and ecosystem assessments and modelling. To carry out this task EEA is focusing on needs for monitoring and environmental assessment and links to EU policies e.g. 6th EAP Strategies and GMES Fast Track Services e.g. land monitoring. It also links to sub-European regional governance agreements such as for sea regions, mountain conventions, trans-boundary river basins and seeks for involvement of statistical geospatial grid data on population, economic performance, transport, trade, environmental pressures, agriculture/ forestry, territorial development potentials, etc.

Example of possible statistics is main annual conversions between agriculture and forest/dry (semi)natural land (in ha/year) with relevant flows e.g. withdrawal of farming with woodland creation. LEAC analysis also allows constructing indicators for Europe, for individual countries or freely defined regions (e.g. a trans-boundary river basin). Example of such indicators could be ‘land cover stock proportions in reporting territory’ or ‘net change in land cover (percent change from initial year)’.

References EEA 2006. Land cover accounts for Europe. EEA report 11/2006. European Environment Agency, Copenhagen.

Use of geospatial grids EEA does not perform primary data collection on grids, but grids are used to describe the environmental issues (e.g. EMEP, UTM, long/lat, equal-area and other grids) and variables. EEA is looking to use grids for accounting for stocks and flows of natural resources and ecosystem capital. Due it its mandate EEA is interested in grid-systems with Europe-wide coverage. EEA practice of using geospatial grids is firmly following INSPIRE directive, in particular its Annex I, theme 2 ‘Geographical grid systems’. Data specification from May 2010 for this theme follows the recommendations of European grid coding system (JRC, 2004) and defines grids as ’Harmonized multi-resolution grid with a common point of origin and standardized location and size of grid cells’. The purpose and parameters of reference grid have defined requirements, that grid is: • For indirect geo-referencing of themes with typically coarse resolution and wide (pan-European) geographical extent. • As two-dimensional and mainly used for Spatial Analysis or Reporting • ETRS89, Lambert Azimuthal Equal Area coordinate reference system with the centre of the projection at the point 52º N, 10º E • Defined as a hierarchical one with resolutions of 1m, 10m, 100m, 1000m, 10,000m, 100,000m. Conclusion

22

It can be concluded that plurality of data sources and modes (formats) of data require more and more attention for spatial data aggregation and assimilation i.e. of Earth observation, environmental and socio-economic data. This is were geostatistical grids could serve as a basis for combining imagery, maps, site monitoring, regular frame sampling, socio-economic statistics per adminis-

EC, 2001. Communication from the Commission to the Council, the European Parliament, the Economic and Social Committee and the Committee of the Regions on the Sixth Environment Action Programme of the European Community, "Environment 2010: Our future, Our choice" [COM (2001) 31 final - Not published in the Official Journal]. JRC 2004. Short Proceedings of the 1st European Workshop on Reference Grids, Ispra, 27-29 October 2003. JRC-Institute for Environment and Sustainability, Ispra, Accessed 17 Dec 2010 http:// eusoils.jrc.ec.europa.eu/esdb_archive/ etrs_laea_raster_archive/ref_grid_sh_proc_draft.pdf SEEA 2003. Handbook of National Accounting. Integrated Environmental and Economic Accounting 2003. United Nations, European Commission, International Monetary Fund, Organisation for Economic Co-operation and Development and World Bank. Accessed 17 Dec 2010 http://unstats.un.org/unsd/envaccounting/ seea2003.pdf


www.efgs.info

E– Proceedings of European Forum for Geostatistics Conference 5 - 7 October, 2010 Tallinn, Estonia

A revised urban – rural typology Hugo Poelman European Commission, DG Regional Policy, REGIO-GIS The paper is first published by Eurostat in Eurostat regional yearbook 2010. http://epp.eurostat.ec.europa.eu/cache/ITY_OFFPUB/KSHA-10-001/EN/KS-HA-10-001-EN.PDF .

Introduction This paper presents a new typology of predominantly rural, intermediate and predominantly urban regions based on a variation of the OECD methodology (see Map 1). The aim of this new typology is to provide a consistent basis for the description of predominantly rural, intermediate and predominantly urban regions in all Commission communications, reports and publications. This typology has been developed jointly by the following four different Directorates- General within the European Commission over the past two years: the Directorate-General for Agriculture and Rural Development, Eurostat, the Joint Research Centre (JRC) and the Directorate-General for Regional Policy. The authors would like to acknowledge in particular the contribution of Guido Castellano, Josefine Loriz-Hoffmann, Christine Mason, Lorenzo Orlandini, Rob Peters and Thierry Vard from the Agriculture and Rural Development DG, Berthold Feldmann and Oliver Heiden from Eurostat, Javier Gallego from the JRC, and Nicola De Michelis, Lewis Dijkstra and Hugo Poelman from the Regional Policy DG. Why a new typology? Using the current OECD methodology to classify NUTS 3 regions in the EU creates two types of distortions that undermine its comparability within the EU. The first distortion is due the large variation in the area of local administrative units level 2 (LAU2). The second distortion is due to the large variation in the surface area of NUTS 3 regions and the practice in some countries to separate a (small) city centre from the surrounding region. This chapter first describes the OECD methodology briefly. Secondly it shows how this new typology seeks to remediate these two issues with the existing OECD approach. The OECD methodology The OECD methodology (See OECD Regional Typology. GOV/TDPC/TI(2007)8. 2007. Paris, OECD.) for defining the typology involves two main steps: (1) defining rural local administrative units level 2 (2)

based on the population share in rural LAU2s, classifying regions.

Identifying Rural Local Administrative Units Level 2 The OECD methodology classifies LAU2s with a population density below 150 inhabitants per km² as

rural. Due to heterogeneity of the size in area of LAU2s, some LAU2s will be incorrectly classified. • Small villages which are very tightly circumscribed by their administrative boundary have a sufficiently high density and therefore will be classified as urban despite having a very small total population. For example, Aldea de Trujillo in Spain is classified as urban despite having a population of only 439 inhabitants. • Cities or towns in very large LAU2s will be classified as rural due to a low population density, even when the city is fairly large and the vast majority of the population of the LAU2 lives in that city. For example, Badajoz and Cáceres in Spain and Uppsala in Sweden are classified as rural despite all three having a population of 150 000 or more. Classifying the regional level The OECD approach classifies regions as predominantly urban, intermediate or predominantly rural based on the percentage of population living in local rural units. A NUTS 3 region is classified as: • predominantly Urban (PU), if the share of population living in rural LAU2 is below 15%; • intermediate (IN), if the share of population living in rural LAU2 is between 15% and 50%; • predominantly Rural (PR), if the share of population living in rural LAU2 is higher than 50% In a third step, the size of the urban centres in the region is considered: • A region classified as predominantly rural by steps 1 and 2 becomes intermediate if it contains an urban centre of more than 200 000 inhabitants representing at least 25% of the regional population. • A region classified as intermediate by steps 1 and 2 becomes predominantly urban if it contains an urban centre of more than 500 000 inhabitants representing at least 25% of the regional population. The result of this approach can be seen on Map 2. The OECD is also aware of the problems caused by the difference in surface area of NUTS 3 regions. To avoid these issues, the OECD uses NUTS 2 regions for this classification in Belgium, the Netherlands and Greece and spatial planning regions in Germany and NUTS 3 in all other OECD countries in the EU.

23


E– Proceedings of European Forum for Geostatistics Conference 5 - 7 October, 2010 Tallinn, Estonia

Map 1: A new urban—rural typology for NUTS 3 regions (1)

24

www.efgs.info


www.efgs.info

E– Proceedings of European Forum for Geostatistics Conference 5 - 7 October, 2010 Tallinn, Estonia

Map 2: The original OECD urban-rural typology applied to NUTS 3 regions (1)

25


E– Proceedings of European Forum for Geostatistics Conference 5 - 7 October, 2010 Tallinn, Estonia The new typology Definition based on a population grid The new typology builds on a simple two-step approach to identify population in urban areas: (1) a population density threshold (300 inhabitants per km²) applied to grid cells of 1 km²; (2) a minimum size threshold (5 000 inhabitants ) applied to grouped grid cells above the density threshold. The population living in rural areas is the population living outside the urban areas identified through the method described above. To determine the population size, the grid cells are grouped based on contiguity (including the diagonals); see Figure 1. If the central square in Figure 1 is above the density threshold, it will be grouped with each of other surrounding eight cells that exceed the density threshold.

Figure 15.1:

2

4 6

3 5

7

8

The 1 km² grid is already available (For more information see the European Forum for GeoStatistics (EFGS) http://www.efgs.ssb.no/) for Denmark, Sweden, Finland, Austria and the Netherlands and the new typology is based on the real grid in these Member States. For the remaining Member States, the new typology relies on the population disaggregation grid created by the JRC (version 5) http://ec.europa.eu/dgs/jrc/ index.cfm?id=1410&obj_id=5310&dt_code=N WS&lang=en and http://www.eea.europa.eu/data-andmaps/data/population-density-disaggregated-with-corine -land-cover-2000-2 ) based on LAU2 population and CORINE land cover. The 1 km² grid is likely to become the future standard and has the benefit that it can easily be reproduced in countries outside the EU. For example, this typology can also be applied to Switzerland, Norway and Croatia following the exact same approach. 26

four French overseas regions and Madeira and Açores in Portugal, the population disaggregation grid does not cover these regions. Therefore, the OECD classification for these regions remains unchanged. The approach based on the 1km² population grid classifies 68% of the EU-27 population as living in urban areas and 32% as living in rural areas (see Table 15.1). This share is 5 percentage points higher than the original OECD definition. However, the share of population in rural LAU2s (defined as LAU2s with at least 50% of the residents living in rural areas) is 28%, i.e. very similar to that of the OECD. This classification will be further refined in the future. This approach has the benefit that it creates a more balanced distribution of population. For example, the Member States with a very low share of population in rural areas see an increase of their population share in rural areas, such as in Germany, the Netherlands and Belgium. The Member States with very high shares of their population in rural areas and very large LAU2s see a reduction of their population in rural areas, particularly in Sweden, Finland and Denmark (see Table 1). Definition at the regional level

Figure 1: Contiguous grid cells Contiguous grid cells

1

www.efgs.info

Because the CORINE land cover map does not cover the

How to define the regional level using the share of population in rural grid cells This new typology uses the same threshold (50%) to define a predominantly rural region, but uses the population share of rural grid cells and not rural LAU2s. By going straight from the grid to the regional level, the distortion of the variable size of the LAU2s is circumvented. To ensure that the population share in predominantly urban regions does not differ too much from the original OECD classification applied to NUTS 3 regions, the threshold distinguishing predominantly urban from intermediate has been adjusted from 15% to 20% (see Table 2 and Figure .2). Researchers with a rural focus sometimes combine predominantly rural and intermediate and call them rural regions, in part because the OECD used the term "significantly rural" before they replaced it with "intermediate" in 1997. Researchers with an urban focus sometimes combine predominantly urban regions with intermediate and call it urban regions, based on the argument that in both regions more than half the population lives in urban LAU2s. Unfortunately, this approach leads to very conflicting statements where both 80% of the EU population lives in an urban region and 55% live in a rural region because the intermediate regions are included in both groups. This chapter proposes to avoid these problems by consistently presenting data for the three groups separately.


E– Proceedings of European Forum for Geostatistics Conference 5 - 7 October, 2010 Tallinn, Estonia

www.efgs.info

15.1:of Share of population land area in Administrative Units level 2 (LAU2), and newOECD typology (1) new topology (1) Table 1:Table Share population andand land area in rural ruralLocal Local Administrative Units level OECD 2 (LAU2), and Share of population

Share of land area

OECD rural LAU2

Rural LAU2

Difference LAU2

Rural grid cells

OECD rural LAU2

Rural LAU2

Difference LAU2

Rural grid cells

8.7 36.2 30.0 41.0 19.1 32.0 44.2 38.6 26.9 29.0 20.8 22.2 34.3 36.2 28.0 43.3 0.1 6.8 41.4 40.3 26.9 48.3 55.5 40.7 53.6 69.3 12.2 27.1

16.3 36.2 36.0 29.8 22.4 40.2 47.5 38.2 26.9 34.3 23.2 25.5 36.7 55.3 35.1 35.1 1.7 9.1 39.5 40.1 31.7 43.7 44.8 41.9 34.5 25.7 14.0 27.9

7.7 0.0 5.9 -11.2 3.3 8.2 3.3 -0.4 -0.1 5.3 2.4 3.3 2.4 19.1 7.1 -8.2 1.7 2.3 -1.9 -0.2 4.8 -4.6 -10.7 1.2 -19.1 -43.6 1.7 0.8

21.6 40.9 40.9 37.5 28.2 38.9 49.2 39.9 31.1 37.0 30.2 29.3 37.8 55.4 39.4 42.5 5.3 15.6 43.0 40.6 34.9 47.2 51.6 47.1 41.2 35.7 15.8 32.1

40.7 93.3 83.0 85.3 64.8 98.5 96.8 94.9 91.9 90.3 70.9 91.1 98.2 96.9 75.5 87.8 1.6 29.5 90.4 90.5 87.1 93.6 88.1 86.2 98.3 99.0 81.7 87.6

53.2 91.1 85.2 69.5 66.4 98.7 96.3 93.6 90.2 90.5 69.5 91.5 97.1 98.0 79.3 76.8 13.1 32.9 85.0 87.9 89.3 89.0 75.3 85.3 89.8 69.0 79.9 82.8

12.5 -2.2 2.2 -15.8 1.6 0.1 -0.6 -1.4 -1.7 0.3 -1.4 0.5 -1.1 1.1 3.8 -11.0 11.5 3.3 -5.4 -2.6 2.2 -4.6 -12.8 -0.9 -8.6 -30.1 -1.8 -4.8

74.3 98.5 95.4 95.9 90.2 99.2 98.7 98.8 98.2 96.5 93.2 96.9 99.1 99.0 91.8 96.5 61.0 85.0 96.4 96.4 96.0 97.9 96.3 96.6 99.4 99.2 91.5 96.2

Belgium Bulgaria Czech Republic Denmark Germany Estonia Ireland Greece (2) Spain France Italy Cyprus Latvia Lithuania Luxembourg Hungary Malta Netherlands Austria Poland Portugal Romania Slovenia Slovakia Finland Sweden United Kingdom EU-27

(1) LAU2 = Local Administrative Unit level 2. (2) Greece is LAU1. Data does not cover Départements d'outre-mer (FR9), Região Autónoma dos Açores (PT20) and Região Autónoma da Madeira (PT30). Source: Eurostat, JRC, EFGS, REGIO-GIS

Figure 2: Share of population by type of region, OECD and the new typology Predominantly urban regions 100 Share of population in %

90 80 70

OECD

60

New proposal

50 40 30 20 10 0 Latvia

France

Greece

Spain

Portugal

Italy

Germany

United Kingdom

Netherlands

Belgium

Malta

Romania

Hungary

Slovenia

France

Lithuania

Bulgaria

Slovakia

Estonia

Czech Republic

Cyprus

Luxembourg

Latvia

Hungary

Denmark

Poland

Austria

Sweden

Romania

Slovenia

Finland

Ireland

Ireland

Denmark

Finland

Lithuania

Poland

Austria

Sweden

Hungary

Bulgaria

Estonia

Czech Republic

Slovakia

Romania

Slovenia

Luxembourg

Cyprus

European Union

Share of population in %

Intermediate regions 100 90 80 70

OECD

60 50 40

New proposal

30 20 10 0 Italy

Spain

Austria

Poland

Sweden

Latvia

Germany

United Kingdom

Denmark

Greece

Portugal

Netherlands

Finland

Belgium

Malta

Ireland

European Union

Predominantly rural regions Share of population in %

100 90 80

OECD

70

New proposal

60 50 40 30 20 10 0

Greece

Slovakia

Bulgaria

Portugal

Lithuania

France

Spain

Germany

Estonia

Italy

Belgium

Czech Republic

United Kingdom

Netherlands

Malta

Luxembourg

Cyprus

European Union

27


E– Proceedings of European Forum for Geostatistics Conference 5 - 7 October, 2010 Tallinn, Estonia

www.efgs.info

TableTable 2 : 15.2: Share of population according to the original OECD classification and the new urban-rural typology (1) Share of population according to the original OECD classification and the new urban-rural typology (1) OECD methodology at NUTS 3 % of Population Belgium Bulgaria Czech Republic Denmark Germany Estonia Ireland Greece Spain France Italy Cyprus Latvia Lithuania Luxembourg Hungary Malta Netherlands Austria Poland Portugal Romania Slovenia Slovakia Finland Sweden United Kingdom EU-27

Predominantly urban 84.7 14.9 11.4 29.3 57.4 13.1 29.5 35.7 48.2 34.5 52.1 0.0 32.0 24.4 0.0 17.4 100.0 83.1 21.2 22.7 51.7 8.5 0.0 11.4 25.4 20.9 69.6 44.5

Intermediate 10.1 61.4 83.6 27.7 29.3 76.3 0.0 26.9 37.8 48.4 38.5 100.0 29.7 55.7 100.0 41.0 0.0 15.6 31.6 31.1 25.5 39.2 42.4 63.1 12.2 29.7 28.4 35.4

Predominantly rural 5.2 23.7 5.0 43.0 13.3 10.6 70.5 37.4 13.9 17.0 9.4 0.0 38.3 20.0 0.0 41.6 0.0 1.3 47.1 46.2 22.8 52.3 57.6 25.5 62.4 49.4 2.0 20.1

New urban-rural typology Predominantly urban 67.5 14.9 22.4 21.0 41.8 0.0 29.5 45.5 48.2 34.6 35.4 0.0 47.2 24.4 0.0 17.4 100.0 71.1 33.0 28.3 47.7 9.9 0.0 11.4 25.4 20.9 71.3 40.3

Intermediate 23.9 44.7 44.0 36.0 40.5 51.5 0.0 10.3 38.1 36.2 43.7 100.0 13.5 31.2 100.0 34.7 0.0 28.3 26.5 33.6 13.5 43.9 71.0 38.3 30.7 56.1 25.8 35.6

Predominantly rural 8.6 40.4 33.6 43.0 17.6 48.5 70.5 44.2 13.8 29.3 20.9 0.0 39.3 44.4 0.0 47.9 0.0 0.7 40.5 38.0 38.8 46.2 29.0 50.3 43.9 23.0 2.9 24.1

Difference Predominantly urban -17.2 0.0 11.0 -8.3 -15.6 -13.1 0.0 9.9 -0.1 0.0 -16.7 0.0 15.2 0.0 0.0 0.0 0.0 -12.1 11.8 5.6 -4.0 1.4 0.0 0.0 0.0 0.0 1.7 -4.2

Intermediate 13.7 -16.7 -39.6 8.3 11.2 -24.8 0.0 -16.7 0.2 -12.3 5.2 0.0 -16.1 -24.4 0.0 -6.3 0.0 12.7 -5.1 2.6 -12.0 4.7 28.7 -24.8 18.5 26.4 -2.6 0.2

Predominantly rural 3.5 16.7 28.6 0.0 4.3 37.9 0.0 6.8 -0.2 12.2 11.5 0.0 1.0 24.4 0.0 6.3 0.0 -0.6 -6.7 -8.2 16.0 -6.1 -28.7 24.8 -18.5 -26.4 0.9 4.0

(1) Data does not cover Départements d'outre-mer (FR9), Região Autónoma dos Açores (PT20) and Região Autónoma da Madeira (PT30). Source: Eurostat, JRC, EFGS, REGIO-GIS

Table 3 : Share of land area according to the original OECD classification and the new urban-rural typology (1) Table 15.3: Share of land area according to the original OECD classification and the new urban-rural typology (1) OECD methodology at NUTS 3 % of Land area Belgium Bulgaria Czech Republic Denmark Germany Estonia Ireland Greece Spain France Italy Cyprus Latvia Lithuania Luxembourg Hungary Malta Netherlands Austria Poland Portugal Romania Slovenia Slovakia Finland Sweden United Kingdom EU-27

Predominantly urban 54.9 1.1 0.6 4.5 19.4 7.7 1.3 2.9 14.4 8.7 24.0 0.0 0.5 15.0 0.0 0.6 100.0 61.8 1.3 2.5 7.9 0.1 0.0 4.2 2.1 1.5 21.6 9.5

Intermediate 18.5 65.5 90.8 23.6 44.1 71.5 0.0 23.2 40.2 50.4 49.2 100.0 43.6 51.9 100.0 41.4 0.0 34.9 20.2 25.4 19.9 34.9 29.6 63.6 5.0 8.3 54.1 36.1

Predominantly rural 26.6 33.4 8.6 71.9 36.5 20.9 98.7 73.9 45.4 40.8 26.8 0.0 55.9 33.1 0.0 58.0 0.0 3.3 78.5 72.1 72.2 65.0 70.4 32.2 92.9 90.2 24.4 54.4

New urban-rural typology Predominantly urban 34.7 1.1 14.6 1.2 11.7 0.0 1.3 5.7 14.4 8.7 12.2 0.0 16.2 15.0 0.0 0.6 100.0 46.5 8.8 9.3 6.5 0.8 0.0 4.2 2.1 1.5 25.6 9.1

Intermediate 31.8 45.1 37.0 26.9 48.5 17.7 0.0 12.1 39.5 31.4 42.4 100.0 21.1 19.8 100.0 33.3 0.0 51.3 19.0 34.4 6.4 39.4 52.1 36.8 14.9 45.6 46.8 34.9

Predominantly rural 33.5 53.8 48.4 71.9 39.8 82.3 98.7 82.3 46.1 59.8 45.5 0.0 62.8 65.2 0.0 66.1 0.0 2.1 72.2 56.3 87.1 59.8 47.9 59.0 83.0 52.9 27.7 56.0

Difference Predominantly urban -20.2 0.0 14.0 -3.3 -7.7 -7.7 0.0 2.8 0.0 0.0 -11.9 0.0 15.7 0.0 0.0 0.0 0.0 -15.3 7.5 6.9 -1.4 0.7 0.0 0.0 0.0 0.0 4.0 -0.4

Intermediate 13.3 -20.3 -53.7 3.3 4.4 -53.8 0.0 -11.1 -0.7 -19.0 -6.8 0.0 -22.5 -32.1 0.0 -8.1 0.0 16.4 -1.3 9.0 -13.5 4.6 22.5 -26.8 9.9 37.2 -7.3 -1.2

Predominantly rural 6.9 20.3 39.8 0.0 3.3 61.5 0.0 8.3 0.7 19.0 18.7 0.0 6.8 32.1 0.0 8.1 0.0 -1.2 -6.3 -15.9 14.9 -5.2 -22.5 26.8 -9.9 -37.2 3.3 1.6

(1) Data does not cover Départements d'outre-mer (FR9), Região Autónoma dos Açores (PT20) and Região Autónoma da Madeira (PT30). Source: Eurostat, JRC, EFGS, REGIO-GIS

The new typology also changes the distribution of land area in each of the typologies (see Table 3), but less so than population at the EU level. In a number of countries the shifts between intermediate and predominantly rural are quite significant, as for example in the Czech Republic, Estonia and Sweden. A classification of NUTS 3 regions and groupings of NUTS 3 regions

28

This methodology proposes a different approach to solve the problem of too small NUTS 3 regions. It combines NUTS 3 regions smaller than 500 km2 (The threshold of 500 km² was selected to ensure that the egregious errors would be eliminated. Reducing the threshold to 400 km² would reduce the number of small NUTS 3 regions by 35 and increasing the threshold to 600 km² would increase the number by 39.) with their neighbouring NUTS 3 regions. This is an approach

which can uniformly be applied to all NUTS 3 regions in the EU. Of the 1303 NUTS 3 regions, 247 are smaller than 500 km². Some 142 were combined with their neighbours to ensure that the grouped NUTS 3 regions had a size of at least 500 km². The approach to combine them can be broken down into the followed categories: (1) Forty-six small NUTS 3 regions were combined with their only neighbour. (2) Fifty small NUTS 3 regions were combined with one or two neighbours with whom they shared the longest border and not with the remaining neighbouring regions. (3) For 18 small NUTS 3 regions the border length did not allow a clear distinction between neighbours; in


www.efgs.info

E– Proceedings of European Forum for Geostatistics Conference 5 - 7 October, 2010 Tallinn, Estonia

this situation they were combined with all neighbours. (4) Twenty-eight small NUTS 3 regions were combined with other small NUTS 3 regions and a few main neighbours. Of the 247 NUTS 3 regions, 105 were not grouped for the following four reasons. (1) Nine are island regions and thus have no direct neighbours. (2) Forty-three NUTS 3 regions have the same classification as all their neighbours and therefore combining them would not make a difference to their classification. (3) Forty-one NUTS 3 regions are adjacent to a group of NUTS 3 regions with the same classification. (4) For 12 Belgian NUTS 3 regions, mostly in West Vlaanderen, there was no obvious way of grouping as most of the regions fell below the threshold. They were not grouped to maintain diversity in a region with a high overall population density. Therefore, 142 NUTS 3 regions have been grouped into 114 NUTS 3 groupings. The impact of these groupings on the classifications is shown in Maps 5 and 6. The goal of these groupings is purely to facilitate a more comparable classification within the EU. These groupings are not used for any other purpose and are dissolved as soon as the classification has been done. As a result, the outcome is a classification for each individual NUTS 3 region.

Comparing the OECD to the new typology Maps 3 and 4 show the change in classification between the OECD approach applied to NUTS 3 regions and the new typology applied to the NUTS 3 groupings. Overall, the population share in intermediate regions at the EU level does not change (see Figure 2). However, the share of population in predominantly rural regions increases by 4 percentage points (a relative increase of 20%) and the share of population in predominantly urban regions drops by 4 percentage points. At the country level, changes follow the changes at the local level, with the Netherlands and Belgium becoming less urban and Sweden and Finland becoming more intermediate and less rural. In the Baltic States, Slovenia, the Czech Republic and Slovakia, between 15% and 25% of the population shifts between categories. Also in Italy, Greece and Portugal, 17% of the population shifts between categories. Other regional levels Although in principle this methodology can also be applied at higher geographical levels such as NUTS 2 or NUTS 1 regions, this chapter argues against this. An application at higher geographical levels would in some cases hide significant differences between regions behind the global average for the aggregated level. This effect is not due to the methodology per se, but a result of the geographical level applied. It may occur for the methodology presented here as well as for the OECD methodology.

Presence of cities As with the OECD methodology, this new typology also considers the presence of a city in exactly the same way. The population figures are based on the census data for the year 2001 for the Urban Audit Cities. This leads to seven NUTS 3 groupings to move from predominantly rural to intermediate due to the presence of a city of over 200 000 inhabitants. This concerns: Córdoba in Spain, Maine-et-Loire, Finistère and Ille-etVilaine in France, Radomski in Poland and Bihor and Dolj in Romania. Due to the presence of a city of over 500 000 inhabitants, 16 NUTS 3 regions move from intermediate to predominantly urban. This is the case for: Praha and its surrounding region in the Czech Republic, Zaragoza, València, Málaga and Sevilla in Spain, Gironde (with Bordeaux), Haute-Garonne (with Toulouse) and LoireAtlantique (with the communauté urbaine de Nantes) in France and Vilnius in Lithuania. In Poland it is also the case for Kraków, Poznań and Wrocław and their surrounding region.

The loss of differentiated results can be shown by comparing results at NUTS 2 and NUTS 3 level based on the OECD methodology. The share of population in predominantly rural regions at NUTS 2 level is about one third lower than the share identified at NUTS 3 level. The problem is further illustrated by the fact that under the OECD methodology only half of the population in a predominantly rural NUTS 3 region lives in a predominantly rural NUTS 2 region. Moving to a classification of NUTS 2 regions would change the typology so substantially that it undermines the greater precision of results obtained through the new approach. One of the reasons for this mixed use of classification at NUTS 2 and NUTS 3 has been the limited data availability at NUTS 3 level. Fortunately, an increasing number of indicators at NUTS 3 level is available through Eurostat. In addition, for some of the indicators only available at aggregated geographical level, small area estimation techniques can help to estimate the NUTS 3 values based on NUTS 2 data and auxiliary data at NUTS 3. However, for certain indicators these estimation techniques are not immediately available or have to be further developed. 29


E– Proceedings of European Forum for Geostatistics Conference 5 - 7 October, 2010 Tallinn, Estonia

Map 3: NUTS 3 regions classified as more urban in comparison to the original OECD typology

30

www.efgs.info


www.efgs.info

E– Proceedings of European Forum for Geostatistics Conference 5 - 7 October, 2010 Tallinn, Estonia

Map 4: NUTS 3 regions classified as more rural in comparison to the original OECD typology

31


E– Proceedings of European Forum for Geostatistics Conference 5 - 7 October, 2010 Tallinn, Estonia

Map 5: NUTS 3 regions classified as more urban when grouping regions of less than 500 km 2

32

www.efgs.info


www.efgs.info

E– Proceedings of European Forum for Geostatistics Conference 5 - 7 October, 2010 Tallinn, Estonia

Map 6: NUTS 3 regions classified as more rural when grouping regions of less than 500 km 2

33


E– Proceedings of European Forum for Geostatistics Conference 5 - 7 October, 2010 Tallinn, Estonia Conclusion This new typology successfully addresses two main constraints of the OECD methodology applied to NUTS 3 regions in the EU: the variation in surface area of LAU2 and NUTS 3 regions. It does this in a consistent manner throughout the union in three main steps: (1) It creates clusters of urban grid cells with a minimum population density of 300 inhabitants per km² and a minimum population of 5 000. All the cells outside these urban clusters are considered as rural. (2) It groups NUTS 3 regions of less than 500 km² with one or more of its neighbours solely for classification purposes, i.e. all the NUTS 3 regions in a grouping are classified in the same way. (3) It classifies NUTS 3 regions based on the share of population in rural grid cells. More than 50% of the total population in rural grid cells = predominantly rural, between 20% and 50% in rural grid cells = intermediate (The change in classification due to the presence of a city is done in an identical manner as for the OECD methodology.) and less than 20% = predominantly urban. This new typology will be updated after every NUTS modification and after each major update of the population grid based on new census data and new land cover data. The current and future updates of this classification as well as information on which NUTS 3 regions have been grouped for classification purposes can be found here: https://circabc.europa.eu/d/a/workspace/ SpacesStore/da816923-58b7-49f6-9dbe-7b8c5bc70284/ nuts3_typology.xls

European Forum for Geostatistics as a Market Place Erik Sommer EFGS Expert Group for Business Models

Business Model. What are the best solutions for distributing the grid based statistics to the customers. What strategy should the European Forum for Geostatistics (EFGS) have for distributing the grid based statistics to the users both for the general public purpose and for the customers willing to pay for special deliveries. Since we have established the EFGS what role should it have in promoting the use of grid data?

34

It could very well be that we prefer that it is the National Statistical Agencies that on their own promote the

www.efgs.info

contribution of grid data on their own website at their own discretion. If so the EFGS should not interfere at all with the activities of the national members. In that case the main focus for the EFGS is to coordinate the general activities in regards to the usage of grid data including hosting conferences, providing relevant documentation, establishing expert groups and participating in projects such as those hosted by the EFGS. The EFGS could still decide to promote common datasets used for example for Eurostat projects such as the population dataset on 1x1 km grids and provide it for download free of charge. A more involving step for the EFGS would be to take a more active role in promoting more datasets available in the member states including taking a coordinating role in regards to maintaining updated information about contact persons and documentation etc. A further step for the EFGS would then be to begin selling datasets which could be either unified European joint datasets, cross-national boarders or national datasets. Selling data could of course be done in different ways. It could be a simple reference to the national providers or it could include the marketing of joint datasets. At least it would be worthwhile at this conference to consider what role the EFGS should have in regards to the market place. If not an active role for the EFGS who should/would then take this role in the future? Customer Case Major Swedish Bank operating in the Nordic Market demanding grid data variables on grids for Sweden, Norway, Finland and Denmark. Request made September/ October 2010.The Swedish bank want their approach for Sweden to apply for the other countries specification calls for 125x125 meter grid cells in urban areas and 1x1 km grid cells in rural areas. Data to be used in Microsoft Map Point so request for a special delivery in database (SQL server). The Swedish bank underlines that it’s important that data is provided with longitude/latitude coordinates and projection WGS84 to be used. The contact in the first phase coordinated by Statistics Sweden in cooperation with the statistical agenices of Denmark, Finland and Norway. The joint challenge is to clarify if we want to coordinate our offers and if so how. One question to be asked is whether it is important that we offer similar solutions? Domestic challenge in Denmark would be the following: We use the National Danish Grid with clusters of 100x100 meter grid cells with cell_ID as lower left coordinates. Projection UTM32. Datum ETRS89. Or we can alternatively choose to deliver data using the European Grid. A challenge but also an opportunity.


www.efgs.info

E– Proceedings of European Forum for Geostatistics Conference 5 - 7 October, 2010 Tallinn, Estonia

. A state of the art Side effects from the Geostat project - Part I Jean-Luc, Lipatz Statistics France INSEE

Introduction - Geolocalized data in France History of grids within the French public statistical system started in 2006, mostly driven by the need of such data within the French national institute itself. Unfortunately, the technical environment available to produce gridded data could only give geographically partial solutions. Being member of the Geostat project since 2009 stressed the need of a possibly totally different approach. This was made concrete during the year 2010 whose climax was summer with the first French map of gridded population covering France. Two major steps were done: getting access to a data source geolocalized on a broad extent and completing it with an original estimation process. France is different Every country is different, but the French national statistical institute (Insee) has a feature that is possibly not so usual. Together with usual missions relative to data production, collection or dissemination, the NSI is used to make studies by itself to bring answers to questions of general concern. These analyses, which result in most cases from some partnership with another public body, are done either at the national level or, thanks to the network of regional delegations of the Insee, at more local ones. In the later case, this provides a very powerful way to discover what are the practical needs of the users. Applied to the specific case of geolocalized data, such partnerships are not only the source of the kind of available data, but also the source of any geolocalized data in France. Another mission that is more common to all NSIs is coordination of the statistical system. In France, this coordination is done through the CNIS (“Commission National pour l’Information Statistique”), in which the Insee plays the part of a facilitator and which regularly gathers users of statistical information from various domains, public or not. The commission is used to create task forces about such or such subject. In 2009, one of these task forces, directed by the director of the townplanning agency for the city of Bordeaux, released a report about spatial distribution of population that contains a lot of suggestions about localized statistics. Among them, appears the first official reference to gridded data together with a strong suggestion to generalize their production and to make them a standard goal in data dissemination. Added to the experience acquired from the partnerships with local users, the report achieves to set up a very standard position for data that is not so standard. History of gridded data in France is recent. Several

researchers have brought into light the many advantages of such data before this date, but until 2006, there was no regular production of gridded data. As mentioned before, the start of the production of gridded data was strongly linked to a practical need and to a practical partnership with a public body. The need was the design in 2006 of a new set of official neighborhoods dedicated to public policies toward deprived populations. The public body was the ministry for urban affairs but the whole operation went far over this only partner, involving regional governments and local authorities in cities. From this starting point 2006, the production and the use of gridded data has been a permanent task in Insee, but the available data has always had a strong limitation: it is available only for municipalities above 10 000 inhabitants. Due to the administrative structure in France, where the very numerous municipalities (36 700) are today no more the actual level of organization of the territory, having a restriction on for which municipality data is available and for which it is not implies having unnatural boundaries within the geographical extent of analyses that are feasible. The restriction to municipalities above 10 000 inhabitants is the result of the conjunction of several features in France. First, due to historical reasons coming from the WW II, there is in France no register of population and of course no register of buildings - and even dreaming of such ones is not politically correct. So everything that is needed for local statistics purpose has to be built from scratch, mostly by turning the postal address labels contained in the administrative files into geographic references. But such operations need some sort of reference register of the existing addresses and a second difficulty exists here: there is no such reference register and possibly there will never be a complete one because of the large parts of the French territory where postal addresses are fuzzy things like the only name of a village. The third problem, directly leading to the 10 000 inhabitants restriction, was the way the Insee tried to bypass the lack of reference register of addresses for the needs of the census data collection: making a register, but a partial one. French census: a wonderful opportunity and a dead end The new rolling French census was the one that made gridded data feasible but that also put boundaries in the extent where it what feasible. Traditionally the French census has always been organized differently according to the size of municipalities, relatively to the 10 000 inhabitants threshold. Above this threshold, Insee plays the main part in the practical organization of data

35


E– Proceedings of European Forum for Geostatistics Conference 5 - 7 October, 2010 Tallinn, Estonia collection, while under the threshold municipalities are requested to take the main role. In the new organization of the rolling census, methods now also strongly differ. In the larger municipalities, Insee makes a limited data collection of about 8 % of the population each year. The interesting feature is that this data collection is based on a sample of addresses and for purpose of sampling Insee maintains a specific register of addresses together with their geographical position. This register plays a decisive part in making any geolocalized data, from census or from any other source. On the other side, below the 10 000 threshold, the census data collection remains mainly traditional, using census districts designed by municipalities themselves, with the tools they have or they need (many municipalities have less than 500 inhabitants, so the needs are there rather small). This means that no standard system exists for referencing census data and that even the geometrical boundaries of the census districts are not available. Consequences for gridded data from census are clear: above 10 000 inhabitants, with some additional work to correct the sampling effect, geolocalized data is at hand, but below this threshold nothing can be done and it must be stressed here that there is no kind of plan to improve the situation by changing the way the data is collected.

www.efgs.info

dimension were an argument. Additionally, Insee was about to start to delineate urban zones, which in the French definition requires population count for each candidate area. In this task being able to easily get geolocalized population is a bonus that cannot be rejected. Fiscal data geolocalized through the cadastral register today doesn’t cover the whole French territory even in its European part. The reason is that in some parts, the management of the cadastre - that is always a local responsibility - doesn’t involve using SIG tools. But the digitized part covers about the 2/3 of the municipalities with a strong tendency to cover the more populated areas: 82 % percent of the population. The local dimension is not a factor allowing expecting to have a full coverage during the next few years. But actually, what is practically needed for getting coordinates is not a full digitization of the whole register and making the amount of work that is really needed to transform the existing resources into something suitable is probably something that could be done by Insee. Map 1: The areas where population can be geolocalized Population distribution obtained by simple aggregation of fiscal data. The grey zones are the places that are still not covered by a digitized cadastral register.

Administrative sources: followers then leaders Making geolocalized data from administrative sources is very linked to the census case because of the need to work by transforming address labels into coordinates through the match between the address labels and some reference base of addresses. Here the census address register offers a good opportunity although a partial one. During the past several some other solutions based on addresses files from private providers have been tested as possible complements. In each case the conclusion was almost the same: some extension to the census 10 000 inhabitants domain was indeed possible but only on a partial basis and at some cost to complete manually the match between addresses and the reference register.

36

As everything sounded as a dead end, a complete different path had to be found. Among the administrative sources exists a special case: the tax registers. They build up a separate world that includes geographical references because the land where stand buildings used by people for living in has to be recorded for tax purpose. So it is possible to link information about people to a geographical reference as know in the cadastral register. Insee did the final step towards a new system of geolocalized statistics at the very end of 2009 with an agreement with the cadastral services, which allows the NSI to work with their register. At this time several uses of the register itself were discussed, the explicit needs for statistical data at grid level within a pan European

The new environment for geolocalized data: nothing will be same As a direct consequence of the impulse given by the Geostat project, the transition from a relatively small number of municipalities where geolocalized data can be produced to a much more large one is a tremendous revolution for the French statistical system about cities. It gives the opportunity to stop describing only a small part of the city territory and to start analyses on consistent territories: a city in its whole without regarding size of the municipalities that compose it.


www.efgs.info

E– Proceedings of European Forum for Geostatistics Conference 5 - 7 October, 2010 Tallinn, Estonia

In some ways, the statistical system using grids that was set up in 2006 to bypass the limits of a statistical approach using zonings was still using this approach because the target territory for the analysis had to be defined using administrative boundaries. The limited objectives of the current Geostat project sum up as filling the current gaps in the coverage of the geolocalized fiscal data, putting some color (population distribution) into the grey areas of the previous map. Although the size of the cells of the 1km² wide grid is much more larger than the one that is needed for urban analysis and that is in use in the Insee (100 m), the complete map allows a narrative that is no more centred on a specific location like it was the case when only this or this municipality could be analyzed. It actually provides a way to describe the insertion of the city in its environment, perhaps in a limited way but in a way that makes more acceptable to make detailed analysis using many other variables on a limited territory and that is consistent with the non zonal methods used at this more detailed level.

Map 2 a and b of the same phenomenon: aged dwellings, with the new 1Km² grid data (left) and with old data at municipality level (right): red areas are those with a high presence of aged dwellings. The same process was applied to both datasets, but the left one gives a fully unambiguous message of the spatial distribution of aged dwellings especially around the cities.

Map 3 a, b and c: The first step of an analysis: from the general to the local

Map 2 a and b of the same phenomenon: aged dwellings, with the new 1Km² grid data (left) and with old data at municipality level (right): red areas are those with a high presence of aged dwellings. The same process was applied to both datasets, but the left one gives a fully unambiguous message of the spatial distribution of aged dwellings especially around the cities

Today all the consequences of the integration between statistical data and a large geographic extent where it can be geolocalize have not been fully explored. But there are possible consequences even for the census itself because of the prominent place of the census in the local statistics: being restricted when other sources are not could begin really difficult to explain.

Map 2 a and b: The end of a zoned approach: not only at infra urban level

.

37


E– Proceedings of European Forum for Geostatistics Conference 5 - 7 October, 2010 Tallinn, Estonia Map 3 a, b, c: Standard analysis of a city using the old tools and the new data Map 3 a: Simplified population distribution

www.efgs.info

Three examples using the standard analysis processes used in Insee since 2006. First one (Map 3 a) is just a simplified population distribution. The simplification uses non-parametric probability density estimation that could be used to produce distributions for various variables that could in turn be combined in a spatial analysis of some kind of risk as pictured in the second map (here the risk of being in the poorer part of the population). Such analyses are used to drive the delineation of the official zones that will be used in the public policies. The last map (Map 3 c) shows where the larger amounts of the poorer population live (red boundaries). Comparison with the actual official zones (blue boundaries) provides both an evaluation of their relevance and an alert about areas that the public action doesn’t take care of.

Gridded data, no matter what Part II Jean-Luc, Lipatz Statistics France INSEE Map 3 b: Risk analysis

The map obtained with the communes for whom population can be located at parcel level looks like a picture that would have been damaged. This suggest not to estimate the whole map but to try to repair it where it is needed. An idea comes from the techniques that are in use to repair old paintings: its principle is to use the colours surrounding the damaged parts to put there some mean colour that will not be noticed looking at the painting from some distance. These techniques were applied after the flooding of Firenze in 1966, so the name of the city gives the name of the method.

Map 1: The “damaged” map Map 3 c: Automatic delineation of potential areas for public action

38

Since we have, by using auxiliary sources, actually more information than just holes in the missing parts of the picture, it would be possible to fill the gaps with something that will not be noticed even at close distance.


www.efgs.info

E– Proceedings of European Forum for Geostatistics Conference 5 - 7 October, 2010 Tallinn, Estonia

Three auxiliary sources Auxiliary used to make the required estimations are from three sources: the well-know Corine Land Cover (CLC), a less know by product of CLC, Soil sealing, and a specifically French source the Référentiel à Grande Echelle (RGE), mainly used for its component relative to the streets and roads map. Corine Land Cover provides descriptions of the territories that goes far beyond the municipality level and is a natural candidate for disaggregating data available only at municipality level. In France it is regularly used by the ministry of ecology to estimate regional amounts of population subject to flooding hazard. The level of geographical detail in the CLC zones can suggest that estimating population at a 1km wide grid level is at hand. Nevertheless, the threshold of 25 ha used to trigger the existence of a specific CLC zone or not also suggests that some problems can occur. The availability of detailed population data for a large part of the municipalities allows checking this. Several problems actually occur:

Map 2 represents Commune of Vannes (Brittany). According to the CLC classification, green zones are agricultural ones, yellow zones economic ones and pink ones the urban areas. Dots are the populated buildings as they can be seen in the fiscal data. The blue boundaries delineate the deprived neighbourhood, which are mostly zones of high buildings scarce but with large amounts of population. Soil sealing ( see Map 3) is a by product of Corine Land Cover that doesn’t gives a lot of descriptive information but do it on much more detailed geographical level, as it is disseminated on a 100 m wide grid. This provides an opportunity to see inside the rather rough zones of CLC. But on another hand the provided information consist in a single index that may be interpreted as a level of artificialization but which doesn’t say why: houses, plants, road or dumps are treated the same way. This means that Soil sealing can only be used as a refinement of Corine Land Cover.

Map 3: CLC and Soil sealing

1) Small populated areas are not covered by the adequate classification in CLC: there are clusters of population in agricultural areas, industrial or commercial areas. Neglecting these zones in the estimation process will underestimate population in rural areas, but including them will probably have the reverse effect with an overspread population. 2) Urban areas reflect zones that may have very different densities of buildings. As an additional difficulty, density of buildings is easy to map but don’t reflects the actual density of population because a building can have a large population and a small surface on land balanced with a large height. So measuring population with surface based only elements will necessarily have strong limitations. It can be easily thought that trying to measure something else than population will only make things worse. Map 2: CLC and the populated buildings

The RGE (see Map 4) is a project to provide a consistent set of geographical elements to be used in the various public bodies. Among them the map of the streets and the roads can be useful, because it is unlikely that a house will be in a place without any mean to reach it. The RGE can be used to drive the estimation of population in scarce areas to avoid overspreading population. In dense areas, it also could give hints because a large density of small streets if often related to a residential area with individual houses. Also the RGE can bring information about the motorways or large roads that can be visible in soil sealing and have to be excluded of the potentially populated areas.

39


E– Proceedings of European Forum for Geostatistics Conference 5 - 7 October, 2010 Tallinn, Estonia Map 4: CLC and the RGE

The estimation process: full hybrid and fourwheel drive Usually, making gridded data belongs to one of two approaches: bottom-up, aggregating individual data that is already geolocalized or top-down disaggregating by the way of some estimation process data that is only available for some larger zoning. The estimation process used in France actually falls in none of these categories but rather mixes them into some full hybrid strategy. The second approach, disaggregation, was the source of the estimates made by the French ministry of ecology or the works done by the JRC for the prototype map of European population in 1km² grid cells. Not speaking about the accuracy of the estimates, results of the methods coming from this approach must be read with caution, since the automatic processes can easily fall into the “ecological fallacy” : doing wrong assumption on the parts when looking at the whole (for example inducing from communes with both large population and large industrial zones that industrial zones are populated). In the French case there is an additional reason to use more sophisticated methods: 2/3 of the communes don’t need to be estimated but also represent a considerable amount of information about how the land is covered and how it is populated: to be brief the lessons that one can learn from the previous pictures. The first approach, aggregation, is obviously not enough because not applicable on 1/3 of the municipalities, but also because the final target is the distribution of the population is the population as measured in the census and not using some administrative concept like in the fiscal data. But in the current state of the art, census will not be available at a geolocalized level for such a part of the municipalities than in the fiscal data, so census figures will have to be disaggregated to make the estimates fitting the census concepts.

40

Purposes for an estimation process of population figures below the municipality level are multiple: gridded data within the Geostat project frame, but also estimates

www.efgs.info

for the delineation of urban areas for the Insee itself, or estimates for external users (flooding hazard, zones subject to noise etc.). For this reason, the estimation process was designed to produce the raw material to allow estimation on various zones. Also, in its current implementation, it produces a population figure, but it is quite easy to make it producing estimates for something else that can be measured in the input source. As an experiment, the process was run for making estimates of a number of dwellings and a number of dwellings with a reference person above 65. For these two variables that still are related in some way with the morphology of land, estimates seems being usable. The principle of the estimation process is, basically, to use the relationship between the three sources about the soil occupation and the population, reading it where it is possible and then using it to extrapolate the population where the population figure is missing. The nature of the relationship may be different according to the place it is established: in the eastern part of France a municipality consists in a unique urban cluster with almost no population elsewhere, in the western part many municipalities have no centre, consisting only in various scarce small villages. Like in the Geographically Weighted Regressions, the estimation process tries to discover these specificities by establishing the relationship within small clusters of geographically related municipalities. The estimation process runs commune by commune (needing estimation) and for each kind of soil occupation that is present. Soil occupation characteristics and population figures (if available) are put together into a database of elementary pixels 100m wide receiving unique characteristics. The kind of soil occupation results from the analysis that was described previously and consists in a selection of CLC classification items that can be populated (for example water zones are supposed not to be), refined by Soil sealing and the RGE, giving birth to an extended CLC classification. Some classes are not supposed to contain population; the process gathers them in a special class for which density coefficient will not be estimated. This is also the case of the pixels crossed by a motorway: if they are not of the urban class they are said to be empty of people. The soil sealing index is combined with the CLC index to give much more territory types, at least where an increase of the soil sealing index can indicate a higher population density, which was effectively measured on urban (non industrial) and agricultural CLC classes. For the remaining CLC classes, the soil-sealing index is not used. Presence of a road and of a road crossing is only used for pixels not detected as artificial using soil sealing, because these are not artificial enough. In these case the pixels are considered to have two new special


www.efgs.info

E– Proceedings of European Forum for Geostatistics Conference 5 - 7 October, 2010 Tallinn, Estonia

soil sealing index values and these additional values are used together with the CLC classes as mentioned before. Basically the estimation process first tries to estimate a local density coefficient for each kind of soil occupation according to the extended classification using neighbouring pixels with same kind. The actual computation is done repeatedly for different samples of such pixels and the actual value of the density coefficient is the mean of all the obtained values. This is to minimize the effect of incorrect classification relative to CLC. The dispersion of the values plays a part in the next step but also indicates a possible local weakness of the estimate. In the second step, the set of initial density coefficients are used to compute a predicted population for the municipality. Again the distance of the predicted population to the actual one gives a hint on the weakness of the estimate. To insure a global consistency of the data at the different geographical levels, the density coefficient is then corrected to make the predicted population fitting the actual one. This is done by modifying the value obtained before, but the more accurate the estimates seemed to be the less they are corrected. Quality is unknown Estimated data quality is difficult to estimate, some parts of the process are supposed to correct the misclassification effects that are detected with the dispersion of the initial coefficient or the disagreement between predicted population and actual one. Using nevertheless this criterion, a weakness indicator was computed for each NUTS3 by:

 P  Pˆ P This indicator ranges from 2 % to 98 % but has most of its values in the range 20% - 30%. The problem of this kind of measurement mainly lies in municipalities with small populations for which an estimation error of 100 persons may represent an extremely large relative error. The extreme value of 98 % is actually obtained when estimating the only 4 small missing municipalities of the NUTS3 “Côte d’Or” (Dijon) and the mean absolute error here is only 100. Among the NUTS3 that need the larger part of estimation, the worst ones belongs to the region “Champagne Ardennes”, and particularly the one around Reims where the weakness indicator reaches 65 %, with a mean absolute error of 237. This weakness comes from a search for similar pixels that has to be larger than the region including zones that have probably very few things in common. This situation doesn’t occur in the NUTS3 “Ariège” although it has to be almost fully estimated: the surrounding NUTS3 are well provided with communes usable for the extrapolation and that,

apparently, have the wanted characteristics so the weakness indicator is 25 %. It is difficult to expect a better primary estimation from pure soil type characteristics on account of so much things being not available: the height of the buildings, the rate of unoccupied houses or houses used for holidays, the size of the households, etc. But it is also strongly expected that the steps that fit estimates to the actual municipality total will introduce the local specificities that are missing in the first step. Another important element to judge the quality of the estimates is the quality of the data they use. Two problems may occur: 1) A problem in localization. The accuracy of the boundaries of CLC zones is not know, but there seems to have been some “short cuts” that have included in some zones territories belonging to the classification of a neighbouring one. Soil sealing pixels are not used as provided but converted to the Lambert II pixels, inducing a slight move of the target square. But in any case the soil sealing values are only means that refers to zones wider than the available pixels. 2) A problem of classification. The main problem comes from the very basic specification of CLC saying that areas smaller than 25ha have not to be identified. Because of this rule the populated areas of small rural communes are very often coded as agricultural areas. In the same way zones of high buildings but with a limited geographical extent can be coded as industrial of commercial zones. Map 5: An example of an incorrect estimation (Strasbourg and neighbouring municipalities)

The map 6 the right pictures is the result of the estimation process; grey boundaries are those of the municipalities that have to be estimated. The map on the left was obtained through the geolocalizing process (analyze of the address labels), because most of the missing communes have actually more than 10 000 inhabitants. For the main commune (Strasbourg) the two distributions are significantly different. Missing black spots in the left map (in the South and in the West) correspond to deprived neighbourhood with high buildings caught in the middle of industrial zones. Also the estimation process gives a predicted population that is far under the actual figure, the fitting phase is possibly

41


E– Proceedings of European Forum for Geostatistics Conference 5 - 7 October, 2010 Tallinn, Estonia responsible for the overpopulated areas in the very south of the municipality. In the final dataset, the figures from the geolocalizing process replaced the estimated figures. Further steps - none! The actual accuracy of the figures is not known but they begin to prove to be useful as they are used in the delineation of urban zones in Insee, giving at least an estimate of the rough range for the population of the candidate areas. This is an element of quality. If more specific needs have to be fulfilled, additional refinement could be added using other information that are present in the RGE. Among them is the height of the buildings, which appears as an apparent obvious solution to a repeated problem. But actually, this information of height is not sufficient since a same volume can be used for housing or for offices and direct estimates using the volume of buildings doesn’t give accurate estimates. The best linear model found for the municipality Vannes is:

pop   V  S   but with a R² of only 0.85 but with a significant spatial autocorrelation too that prevents using this model! A classification of the territory the buildings belongs to would help, but that would mean going back into the classification issues. The kind of use of the buildings would certainly be a plus, but at this point it must be stressed that it would not solves some other objections: characteristics of the population inside would be unreachable and even the population count would be subject to the alias coming from the simple fact that the existence of a house doesn’t mean that it is occupied. So the next step will be to stop adding complexity to the estimation process at least for a simple reason: the

Map 6: The “repaired” map

www.efgs.info

availability of a digitized cadastre increases every year so in a distant future estimation will not be needed. Before this date, partial digitizing of the not digitized cadastre could be a more worthy investment. Fiscal population fat the end of year 2007. Actual data will be disseminated on the web site of Insee (www.insee.fr). They also can be found on www.efgs.info.

Gridded Population – new data sets for an improved disaggregation approach K. Steinnocher.*, I. Kaminger **, M. Köstl * and J. Weichselbaum*** * AIT Austrian Institute of Technology, ** Statistik Austria, *** GeoVille Information Systems

Introduction There is a demand for population data that are independent from administrative areas. Raster representations meet this demand but are not yet available for all European countries. Spatial disaggregation of population data can overcome this gap and has been performed on a Europeen scale based on CORINE land cover data (CLC). Gallego & Peedell (2001) applied a probabilistic disaggregation method, estimating population density weights for most CLC classes, while Steinnocher et al. (2006) used only residential classes of CLC for disaggregation. The drawback of both approaches is the limited spatial resolution of the CLC data set that leads to over-/ underestimation of sparsely populated areas respectively. With the recently published EEA Fast Track Service Precursor on Land Monitoring a new data set is now available that provides the degree of soil sealing for EU27+ countries. Applying this data set as a proxy for population density the spatial disaggregation can be improved significantly. Since the degree of soil sealing is not directly corresponding to residential building density a number of pre-processing steps are required beforehand. These steps are defined as simple rules and require CLC data as additional input. Degree of Soil sealing

42

The EEA Fast Track Service Precursor on Land Monitoring is a raster dataset for built-up areas including continuous degree of imperviousness ranging from 0100% in spatial resolution of 20x20m. The data set is based on orthorectified high resolution satellite imagery (Image2006), acquired primarily in the reference year 2006 (+/- 1 year). Supervised classification techniques were used to automatically map built-up areas, followed by visual improvement of the classification results. The degree of soil sealing for the classified built-up was derived from calibrated NDVI (normalised difference vegetation index). The data set covers EU27 and neighbouring countries, in total 38 countries (Kopecky & Kahabka, 2009), and is currently being updated in the frame of geoland2 for the reference year 2009. In the following we will address the data set as HR (high resolution) soil sealing layer.


www.efgs.info

E– Proceedings of European Forum for Geostatistics Conference 5 - 7 October, 2010 Tallinn, Estonia

Population disaggregation The core method applied in this study is spatial disaggregration. It is based on the assumption that data, provided globally for an entire region, can be distributed within the region by means of local parameters. The spatial re-distribution is normally performed by a weighted sum. A clear dependency between the global and the local parameter is a prerequisite for this approach. We will use population data, available in administrative units, and spatial information on housing, derived from remote sensing. In terms of spatial disaggregation the global parameter is the total population of the region while the local parameter is the housing density derived from EO. Applying housing density as a proxy for population density allows estimating the local population distribution. This approach can be formalised as follows:

Pdens  k  Hdens Pop

(1)

A i  k * Hdens

i

(2)

i

where Pdens and Hdens are the population and housing density respectively, Pop is the total population of the region and Ai corresponds to the area of the housing density i. The factor k, representing the relationship between population and housing density, can be derived by solving equation (2). The local population density is then calculated from equation (1). The following assumptions were made when applying this approach:



the population density is proportional to housing density,



no population resides outside housing areas, and



dependency between population and housing density is constant within a region.

Soil sealing to building density The housing density required for the disaggregation approach is derived from the HR soil sealing layer, assuming that the degree of soil sealing is proportional to housing density. Since this assumption does not hold for all cases the soil sealing layer requires further processing. In order to get a representation of housing densities it is necessary to mask out all sealed surface areas with non-residential function. These include the road and rail network, as well as industrial and commercial areas. Masking the transport network is based on linear road and rail data, which are rasterized and slightly expanded, in order to cover associated areas as well. Non-

residential built-up areas are derived from CLC classes 1.2 (industrial, commercial and transport units), 1.3 (mine, dump and construction sites) and 1.4 (parks, sport and leisure facilities). Due to the large minimal mapping unit of CLC masking is limited to larger areas of this kind. Thus smaller non-residential areas are still represented in the soil sealing layer that will cause systematic errors in the disaggregation approach (as will be shown in the validation chapter). 100% sealed surfaces outside urban areas usually indicate industrial or commercial complexes or gravel pits and therefore are masked out as well. The remaining HR soil sealing layer is assumed to represent residential building densities and is used as input to the disaggregation approach.

Population grid The disaggregation was performed for a north-south transect of Europe, covering southern Sweden, Denmark, Germany, Poland, Czech Republic, Slovak Republic, Austria, Hungary and Italy. Population input data are provided on NUTS 3 level dated 2006, thus temporally corresponding with the HR sealing layer. Disaggregation is performed per NUTS 3 region, applying a spatial resolution of 500m. The resulting grid is shown in Figure 1 and will be referred to as HR-POP grid. Evaluation For evaluating the disaggregation results a reference population grid is used. The reference grid, provided by Statistics Austria, is based on the registration census and aggregated to 500m cell size in order to correspond with the geometry of the disaggregation results. It will be referred to as StatAut grid. In addition to the evaluation we also compare our results with the work of Gallego & Peedell (2001) and of Steinnocher et al. (2006). The former estimated population density weights for all potentially populated CLC classes and performed a probabilistic disaggregation resulting in a 1km grid covering EU 27. As it is has been produced by JRC and is now available from EEA we will refer to it as EEA-JRC grid. The latter applied the linear disaggregation method used in this study but was limited to residential classes of CLC, covering several central European countries in 500m resolution. It will be refered to as CLC-RES grid. Figure 2 shows the population grids for the central and eastern part of Austria. The upper images show the two CLC based disaggregation results. While the CLCRES clearly underestimates the sparsely populated areas, the EEA-JRC populates almost all areas (except for high alpine areas and water bodies). In terms of spatial distribution the HR-POP grid is by far closest to the StatAut reference grid. 43


E– Proceedings of European Forum for Geostatistics Conference 5 - 7 October, 2010 Tallinn, Estonia Figure 1: European transect with popuplation on NUTS 3 level (left) and disaggregated to a 500m grid

Figure 2: Comparison of population grids: CLC-RES (u.l.), EEA-JRC (u.r.), HR-POP (l.l.) and StatAut (l.r.)

44

www.efgs.info


www.efgs.info

E– Proceedings of European Forum for Geostatistics Conference 5 - 7 October, 2010 Tallinn, Estonia

For a numeric evaluation the population cell values were grouped into classes and differences per grid cell from the StatAut reference grid were calculated. Figure 3 shows the relative deviation of EEA-JRC, HR-POP and CLC-RES for the nine classes. There is a general tendency of overestimating lower populated grid cells and underestimating highly populated ones. One reason for this might be that with increasing building densities also the occurence of higher buildings increases. This leads to a non-linear relation between housing and population density that is not considered in the disaggregation models. The largest overall deviations occur with the CLCRES grid, which is due to the missing representation of smaller settlements in the CLC data set.

Figure 4: Scatterplots of HR-POP and EEA-JRC grids versus StatAut reference grid (provinces of Salzburg and Upper Austria)

The largest single deviation occurs for non-populated areas (class 0) whereto EEA-JRC assigns 8.6 % of the total population. This confirms the visual impression of Figure 2 and represents a systematic error of this data set. For the remaining classes the deviations of EEAJRC and HR-POP lie in the same range, with HR-POP reaching better results for most classes. However, for the very highly populated classes which occur only in the center of urban agglomerations the EEA-JRC grid yields extremely good results. Figure 3: Deviation from StatAut reference grid (in population classes)

In addition to the comparison of classes scatterplots give a good indication of how well numerical grids correlate. Figure 4 shows the scatterplot between the StatAut reference grid and the HR-POP and EEA-JRC grids respectively for the provinces of Salzburg and Upper Austria. While there is a positive correlation between HRPOP and StatAut, the pattern in the right scatterplot indicates no clear correlation between EEA-JRC and StatAut. The latter results from the fact that population densities are constant within one CLC class and similar between neighbouring municipalities. This leads to the flat distribution on a low population level for rural areas. Settlements within these areas are too small for the CLC map and their population is underestimated. The randomly distributed points on the other hand represent cities that are represented in the CLC map. Their population is likely to be overestimated as a compensation for the underestimated population of smaller settlements. Besides the comparison on a regional level selected

municipalities were analysed on a local level in order to better understand systematic errors in the HR soil sealing layer. One problem encountered is the already mentioned missing of the third dimension, leading to a systematic underestimation of the population in urban centers. In contrast to that town centers have a tendency of being overestimated as the population density is lower than indicated by the degree of sealing (streets and other impervious areas are not masked out). Masking of industrial and commercial areas is limited to large complexes whereas smaller areas with industrial or commercial function cannot be identified. As these areas usually have a high degree of soil sealing a large population number will be assigned to these areas. A similar effect occurs if gravel pits or quarries appear in the HR soil sealing layer.In rural areas roads are masked out based on linear road network data. If the positional accuracy of these network data is low masking will be incomplete and roads will be populated in the course of the disaggregation process. On the other hand single buildings such as farms or scattered settlements are likely to be missed in the HR soil sealing layer, leading to an underestimation of population.

45


E– Proceedings of European Forum for Geostatistics Conference 5 - 7 October, 2010 Tallinn, Estonia Conclusion With the recently published EEA Fast Track Service Precursor on Land Monitoring a new land cover data set is now available that provides the degree of soil sealing for EEA 38 countries. This data set was applied for spatial disaggregation of population resulting in a 500m population grid for a European transect. The results were evaluated against a reference grid of Austria derived from the registration census of Statistics Austria. The results of the evaluation are encouraging, although there are some limitations in the use of the soil sealing layer as a proxy for housing density. (1) Improvements are needed for the separation of residential versus industrial or commercial land use, (2) the third dimension of built-up areas cannot be considered in urban centers, and (3) road and rail networks cannot be eliminated entirely. In order to overcome these limitations the use of Urban Atlas data for population disaggregation is currently analysed. The Urban Atlas is providing panEuropean comparable land use and land cover data for Large Urban Zones with more than 100.000 inhabitants as defined by the Urban Audit (Urban Atlas, 2010). So far the evaluation of the HR-POP grid was limited to the territory of Austria due to the availability of the bottom-up reference grid of Statistics Austria. For a European evaluation there is a need for further reference data sets. We therefore like to finish the paper with a proposal for cooperation with statistical offices in European countries who are interested to support our effort in producing a European wide population grid. Acknowledgements The study was performed within the Core Information Service for Spatial Planning of the project geoland2 (http://www.gmes-geoland.info/service-portfolio/coreinformation-services/spatial-planning.html) in the frame of the GMES (Global Monitoring for Environment and Security) initiative. The project geoland2 is a Collaborative Project (2008-2012) funded by the European Union under the 7th Framework Programme (project number FP-7-218795).

References Gallego J., Peedell S. (2001) Using CORINE Land Cover to map population density. Towards Agrienvironmental indicators, Topic report 6/2001 EEA, Copenhagen, pp. 92-103. http://www.eea.europa.eu/ publications/topic_report_2001_06 (last visited 10/09/2010)

46

Kopecky M., Kahabka H. (2009): Updated Delivery Report - European Mosaic. http://www.eea.europa.eu/ data-and-maps/data/eea-fast-track-service-precursor-onland-monitoring-degree-of-soil-sealing-100m-1 (last visited 10/09/2010)

www.efgs.info

Steinnocher K., Weichselbaum J., Köstl M. (2006): Linking remote sensing and demographic analysis in urbanised areas. In (P. Hostert, A. Damm, S. Schiefer Eds.): First Workshop of the EARSeL SIG on Urban Remote Sensing “Challenges and Solutions”, March 2-3, 2006, Berlin, CD-ROM. Urban Atlas (2010): GMES - Mapping Guide for a European Urban Atlas. Document Version 1.1 dated 26/08/2010. http://www.eea.europa.eu/data-and-maps/ data/urban-atlas (last visited 21/09/2010)

Accuracy of built-up area mapping in Europe from the perspective of population surface modelling Pavol Hurbanek*, Peter Atkinson*, Konstantin Rosina, **, Robert Pazur ** *University of Southampton, School of Geography ** Slovak Academy of Sciences, Institute of Geography

Built-up area density, the degree of soil sealing or imperviousness represents (at least to a certain extent) the degree of intensity of human activity and can be used as a proxy for the presence of population. Land cover and land use datasets containing information on soil sealing are therefore frequently used as ancillary data in grid-based population surface modelling. Depending on further information available about impervious surfaces from the data, an ambient or residential/ registered population can be modelled. Although such data are traditionally referred to as "ancillary" they are, in fact, crucial in the whole modelling process and their accuracy largely influences the accuracy of the final population surface model, whether representing population density or another population characteristic. This observation has been confirmed by several authors including Martin et al. (2000) and (Gallego 2010a), who concludes that "the quality of the land cover map, and more generally, of the proxy variables available, is more important than the choice of the downscaling algorithm." "Fast track service precursor on land monitoring degree of soil sealing" (EEA 2009b, 2010a), or soil sealing layer (SSL) for short, is a land cover dataset developed within the Global Monitoring for the Environment and Security (GMES) programme and distributed by the European Environmental Agency (EEA) specifically to serve as a source of high spatial resolution land cover data for disaggregation of socioeconomic statistics. Although this layer does not hold information on which of the mapped impervious surfaces are residential and which are not, with the same coverage as the Corine Land Cover (CLC) dataset, much finer spatial resolution (20 x 20 m grid cell as opposed to 25 ha minimum mapping unit in CLC) and related increased purity within the


www.efgs.info

E– Proceedings of European Forum for Geostatistics Conference 5 - 7 October, 2010 Tallinn, Estonia

mapped units (leading to land cover classes rather than land cover mix classes in CLC), SSL seems to be well suited to complement CLC in the further development and improvement of the existing CLC-based population density grid provided by the EEA (EEA 2009c, Gallego 2010a). While, for example, in Slovakia in 2006 there are about 15% (out of the total of 2,891) communes with no CLC2006 Category 1 "Artificial Surfaces" grid cells at 100 m spatial resolution, there are only less than 1% communes with no greater-than-zero SSL2006 (version 2) pixels at both 20 m and 100 m spatial resolution. Similarly, Gallego (2009) reports that no CLC2000 urban area is found in 29% (out of the total of slightly above 114,000) communes in Europe, which account for 16.6% of the total area and 2.9% of the total population. This leads to overestimation of population density in nonurban (agricultural, heterogeneous, forest) CLC classes in all methods used by Gallego (2010a). Although Steinnocher et al. (2006) use finer spatial resolution (50 m) land cover data (derived from SPOT5 imagery) mapping out residential areas and the density of artificial surfaces within them (0-50, 50-80, 80-100%), they conclude similar findings: major errors in sparsely populated areas due to minimum mapping unit (in this case reported as underestimations thanks to the different approach). Further, they report underestimations in city centres due to the fact that building heights were not considered. Before the potential of SSL is harnessed, it is important to assess the accuracy of SSL and its possible influence on the accuracy of population surface modelling, especially with respect to the above mentioned issues in areas with dispersed settlement. Preliminary assessment of SSL suggests that in some cases it forces values of soil sealing out to the extremes (0% and 100%) by overpredicting (usually, but not only) medium and larger soil sealing values and – more importantly – underpredicting (usually, but not only) medium and smaller soil sealing values. This potentially results in overprediction of the share of impervious surfaces in areas with more compact settlement pattern (usually urban areas) and in underrepresentation or complete omission of small and dispersed rural settlements, which, however, account for a substantial part of the total area of impervious surfaces in Europe. The detailed results drawing on the samples from Slovakia (Hurbanek et al. 2010) and Austria (Banko 2008) collected according to the recommendations of Maucha and Büttner (2008) and briefly also on the accuracy assessment of SSL by Gallego (2010b) in the group of eleven European countries (including Belgium, Czech Republic, France, Germany, Hungary, Italy, Luxembourg, Netherlands, Poland, Slovakia, Spain) using Land Use/Cover Area-frame Survey (LUCAS) 2006 data are illustrated in the presentation. This research was supported by a Marie Curie Intra European Fellowship within the 7th European Community Framework Programme (FP7/2007-2013 under grant agreement no 220832).

References Banko, G. (2008): Austria: verification of high resolution soil sealing layer – Qualitative assessment. Umweltbudesamt, Vienna, Austria. 19 s. Available online: http:// www.eea.europa.eu/about-us/tenders/eea-ses-09-003/ reports-qualitative-or-quantitative-assessment Caetano, M., Araújo, A., Nunes, V., Carrao, H. (2008): Accuracy Assesment of the High Resolution Biltup Map for Continental Portugal. Portugese Geographic Institute, Lisbon, 26 p. Available online: www.igeo.pt/gdr/ pdf/CLC2006_HRSoilseeling_Validation_Report.pdf Czaplewski, R., Foody, G., Stehmann, S. (2010): Fundamentals of Accuracy Assessment (workshop). The Ninth International Symposium on Spatial Accuracy Assessment in Natural Resources and Environmental Sciences, University of Leicester, July 2010. Dorelon, P. (2009): Accuracy assessment of the high resolution soil sealing map for France. MEEDDAT/ SOeS, Paris, France. 8 s. Available online: http:// www.eea.europa.eu/about-us/tenders/eea-ses-09-003/ reports-qualitative-or-quantitative-assessment EEA (2007): Tender Specifications – GMES Fast Track Service on Land Monitoring (EEA/IDS/07/001). Copenhagen, 2007, 16 s. Available online: http:// www.eea.europa.eu/about-us/tenders/EEAIDS07001/ tender_specifications.pdf EEA (2009a): Tender Specifications – GMES Fast Track Service on Land Monitoring (EEA/SES/09/003). Copenhagen, 2009, 13 s. Available online: http:// www.eea.europa.eu/about-us/tenders/eea-ses-09-003 EEA (2009b): EEA Fast Track Service Precursor on Land Monitoring – Degree of soil sealing 100m. Available online: http://www.eea.europa.eu/data-and-maps/ data/eea-fast-track-service-precursor-on-land-monitoring -degree-of-soil-sealing-100m/ EEA (2009c): Population density disaggregated with Corine land cover 2000. Available online: http:// www.eea.europa.eu/data-and-maps/data/ds_resolveuid/ F6907877-C585-45DE-B93F-E7FC0975DE2A EEA (2010a): EEA Fast Track Service Precursor on Land Monitoring – Degree of soil sealing 100m. Available online: http://www.eea.europa.eu/data-and-maps/ data/eea-fast-track-service-precursor-on-land-monitoring -degree-of-soil-sealing-100m-1 EEA (2010b): Corine Land Cover 2006 raster data version 13 (02/2010). Available online: http:// www.eea.europa.eu/data-and-maps/data/corine-landcover-2006-raster/ Gallego, F. J. (2009): A Downscaled Population Den-

47


E– Proceedings of European Forum for Geostatistics Conference 5 - 7 October, 2010 Tallinn, Estonia Information. JRC, Ispra, Italy. 10 p. Available online: http://epp.eurostat.ec.europa.eu/portal/page/portal/ research_methodology/documents/ S14P3_JAVIER_GALLEGO_DOWNSCALED_POPULA TION_DENSITY.pdf Gallego, F. J. (2010a): A population density grid of the European Union. Population and Environment, Vol. 31, No. 6., pp. 460-473. Available online: http:// www.springerlink.com/content/h22617v812p51014/ fulltext.pdf Gallego, F. J. (2010b): Validation of GIS Layers in the EU: Getting Adapted to Available Reference Data (final draft). IPSC, JRC Ispra. 19 p. Gallego, J., Peedell, S. (2001): Using CORINE Land Cover to map population density. Towards Agrienvironmental indicators, Topic report 6/2001 European Environment Agency, Copenhagen, pp. 92-103. Available online: http://mars.jrc.ec.europa.eu/mars/content/ download/757/5096/file/disagg_pop_EEAreport2001.pdf Hurbanek, P., Atkinson, P. M., Pazur, R., Rosina, K., Chockalingam, J. (2010): Accuracy of built-up area mapping in Europe at varying scales and thresholds. The Ninth International Symposium on Spatial Accuracy Assessment in Natural Resources and Environmental Sciences, 20 - 23 July 2010, University of Leicester, International Spatial Accuracy Research Association (ISARA). Martin, D., Tate, N. J., & Langford, M. (2000): Refining population surface models: Experiments with Northern Ireland Census data. Transactions in GIS, 4(4), 343– 360. Available online: http://onlinelibrary.wiley.com/ doi/10.1111/1467-9671.00060/pdf Maucha, G., Büttner, G. (2008): Recommendations for Quantitative assessment of High resolution soil sealing layer, v. 2.1. EEA and European Topic Centre for Land Use and Spatial Information, 31 p. Available online: http://etc-lusi.eionet.europa.eu/CLC2006/FTSP/builtup_areas/Recommendations_GMES.doc Maucha, G., Büttner, G. (2010): Validating EUROLAND products. GEOLAND Forum 6, March 2010, Toulouse, 47 p. Available online: http://www.fp6.gmesg e o l a n d . i n f o / e v e n t s / download/04_Maucha_Validating_EUROLAND_Product s_final.pdf Maucha, G., Büttner, G., Kosztra, B. (2010): European Validation of GMES FTS Soil Sealing Enhancement Data (final draft). FÖMI Budapest. 35 p.

48

Maucha, G., Petrik, O. (2008): Accuracy assesment of the high resolution builtup map for Hungary. Institute of Geodesy, Cartography and Remote Sensing – FÖMI, Budapest, Hungary. 11 s. Available online: http:// www.eea.europa.eu/about-us/tenders/eea-ses-09-003/

www.efgs.info

reports-qualitative-or-quantitative-assessment Müller, R., Krauß, T., Lehner, M., Reinartz, P. (2007): Automatic Production Of A European Orthoimage Coverage Within The GMES Land Fast Track Service Using Spot 4/5 And Irs-P6 Liss III Data. ISPRS Conference Proceedings, Volume XXXXVI, ISPRS Workshop Hannover, Germany, May 2007, 6 p. Available online: http:// w w w. i p i . u n i - h a n n o v e r . d e / f i l e a d m i n / i n s t i t u t / p d f / Mueller_krauss_lehner_reinartz.pdf Piepponen, H. (2008): Assessment of HR Soil Sealing Layer. Finnish Environment Institute, Helsinki. Available online: http://www.eea.europa.eu/about-us/tenders/ eea-ses-09-003/reports-qualitative-or-quantitativeassessment Steinnocher, K., Weichselbaum, J., Köstl, M. (2006): Linking remote sensing and demographic analysis in urbanised areas. In: Hostert, P., Damm, A., Schiefer, S. (eds.): First Workshop of the European Association of Remote Sensing Laboratories (EARSeL) Special Interest Group (SIG) on Urban Remote Sensing, Challenges and Solutions, 2-3 March 2006, Berlin. Available online: http://www.earsel.org/workshops/SIG-URS-2006/PDF/ Session1_steinnocher.pdf

A Population grid for Spain: Experiences is assembling population and cartographic data from publicly available sources Francisco J. Goerlich* and Ivie. Isidro Cantarino** *University of Valencia ** Polytechnic University of Valencia.

Introduction This paper presents a dasymetric binary method (Langford and Unwin 1994; Langford 2007) to construct a population density grid for Spain that assigns population only to urban polygons. We argue that this binary method is likely to commit less error in representing population density for Spain than other downscaling methods currently in use, in particular the ones used by Gallego (2010) in constructing a population grid for the whole Europe from municipality (LAU2) data and Corine Land Cover (CLC) information. Given that urban boundaries, as such, do not exits we construct them from digital cartography, in a similar way that other authors have used digital street maps (Chen et al 2004). We use population data at infra‐ municipal level, which is quite detailed and publicly available from the Spanish National Statistical Institute (Instituto Nacional de Estadística, INE), and represents


www.efgs.info

E– Proceedings of European Forum for Geostatistics Conference 5 - 7 October, 2010 Tallinn, Estonia

an improvement over municipality data without the need to introduce further assumptions about the location of the population. All our data is public, so our results can be replicated by any researcher. The next section argues that downscaling from municipal population data and Corine Land Cover is likely to produce very bad results for Spain. Given this, we explore the population data available at infra‐municipal level, as well as the cartographic sources to map population on the space. We eventually explain our methods and show our results for a NUTS2 region of Spain, comparing our grid with the one produced by the European Environment Agency (EEA), at 1Ha resolution, and used by the European Forum for Geostatistics (EFGS) at a resolution of a 1Km2. In view of our methods and results, we admit that they can bias the grid towards population concentration; given that population is only allocated to the urban boundaries previously constructed, and we disregard, almost by construction, scattered population; but given the Spanish landscape we think this is closer to the real world than results obtained using dasymetric methods with a low resolution layer as auxiliary information, like CLC. The European Environment Agency population grid and the Spanish landscape If we compare the EEA population grid for Spain at 1Km2 resolution, with the map obtained drawing population density at municipality scale, under a uniform distribution assumption, we can obviously see some differences, but the visual impression is pretty much the same, as can be seen by inspecting Figure 1.

This confirms well known facts about population grids: 1. Top‐down methods tend to: (i) u underestimate urban population, and (ii) overestimate rural population (Gallego 2010). 2. Quality, and resolution, of the land cover layer is more important than the choice of the downscaling algo rithm (Martiin, Tate and Langford 20 00). 3. Dasymetrric mapping is poorer in areas where municipalities are large and heterogeneous in size (Gallego 20010). Moreover, Figure 1 is at odds with direct observation of the Spanish landscape. Given that the Minimum Mapping Unit (MMU) in CLC is 25Ha, CLCC misses most of the nucleus in thee rural areas, and spreads the population in these areas in a similar way as to the uniform distribution assumption commonly used by social scientist. A simple example from a NUTSS3 region is highly illustrative of the problem. Figure 2 shows urban fabric, as identified by CLC 2006 (classes 111 and 112), in LAAU2 boundaries of the inland region of Burgos. Only in 34, out of 371 municipalities (9. 2%), CLC 20 06 reports urban fabric; and only in 74 municipalities (19.9% %) CLC 2006 reports any artificial surface (class 1). However, direct observation of the landscape shows population concentrated at small and clearly defined urban nucleus. Even our experience is probably not unique; Spain is an example of extreme heterogeneity at LAU22 level. Municipalities are highly heterogeneous in surface area, running from tiny sizes, as small as 30 Ha ( Emperador, 46117), to huge ones, covering more than 1,5500 Km2 (Cáceres, 10037; 175,003 Ha). They are heterogeneous

Figure 1: Mapping population density for Spain

49


E– Proceedings of European Forum for Geostatistics Conference 5 - 7 October, 2010 Tallinn, Estonia in population; more than 60%% of the de 8,000 municipalities have less than 1,000 inhabitants, and less than 1% have more than 1000,000 inhabitants. While average municipal population size is about 5,0000 inhabitants, median size is lower than 600 people. In the last census, 2001, the minimum size is only 7 inhabitants (Salceedillo, 442033, and Illán de Vacas, 455080), whereas the maximum is Madrid (280799), the capital, with almost 3 million people. A As a result, we observe extreme densities that range from a minimum of 0.33 Inhab./Km2 (Viacamp y Litera, 22247) to a maximum of about 220,000 Inhab./Km2. With such amount of heterogeneity is not difficult to understand that statistical modelling at the country level is a difficult task. In addition, the physical structure of Spanish territory is quite complex at LAU2 level. There are some territories that do not belong to a single municipality, but they are shared among many; they don´t appear when statistics are distributed at municipality level, and, to our knowledge, a complete and exhaustive reference list of these territories and their boundaries is still lacking. Many municipalities have their territory split into several small polygons; the extreme case is Xátiva (446145), with 30 isolated polygons, some of them tiny. In most cases split polygons only touch at a single point between them. Other municipalities are completely surrounded by the land area of one municipality, forming “donut” polygons. The most curious case is Albarracín (44145), which completely surrounds 18 small municipalities. We have even a municipality completely inside France (Llívia, 170944). Data and methods for a population grid for Spain Given such a huge municipal heterogeneity, dasymetric mapping is likely to perform poorly for Spain as a whole,, especially if the resolution of the auxiliary Figure 2: NUTS3, LAU2 and and CLC 2006 for Burgos

50

www.efgs.info

information layer is not high, as it happens with CLC. It is generally agreed that: “The best way to produce a gridded map of population density is collecting individual data with coordinates of the dwellings and counting the number of people in each cell of the grid (bottom‐up approach)” (Gallegoo 2010). However this requires an enormous amount of information that simply may not exits, or, if it exists, it is not generally available to the research community for gridding purposes. For this reason we focus on a more limited question: Can we improve the existing EEA population grid for Spain, with relatively little effort, if we lack such a detailed data set? We think that the answer to this question is a positive one, and for this purpose we focus on: 1. “Night population” or residential population: locating each person in his/her dwelling. 2. An (almost) automatic procedure (ideally no iterative corrections), so it can be updated regularly with little cost 3. Combining public population data beyond LAU2 level with public cartographic data Given sufficiently detailed population data, and because scattered population is scarce, we can try to locate (most of) the population at his residence in urban nucleus. So we use a binary method (Langford and Unwin 1994; Langford 2007)), and with a given criteria (on which an agreement is still lacking), we can draw urban boundaries around settlements, and locate the population in them. Unfortunately, information on urban boundaries is not t readily available, but a very rich and detailed population data set at infra‐municipal level is.


www.efgs.info

E– Proceedings of European Forum for Geostatistics Conference 5 - 7 October, 2010 Tallinn, Estonia

Figure 3 shows the public population data beyond LAU2 level published by INE. It comes in two alternative formats: (i) Census tracts, used for managing the electoral census and for survey design structure, and (ii) a full and comprehensive list of settlements, with their urban nucleus and scattered population, for the whole country; the “nomenclátor”.

Figure 4: Urban boundaries

Figure 3: Population data and administrative structure

In principle, the advantage of the census tracts is that they have a clear defined surface (a polygon shape file exists for them). However, for small villages we have only one census tract and for medium villages their shape is quite arbitrary, so in these cases they offer no clear advantage over municipal data. For these reasons, after experimenting a little bit with them, we decided not to base our grid on census tracks, although we recognize that they can be useful to increase accuracy in big cities and metropolitan areas. While municipalities are administrative divisions with an historical background, the nomenclátor is intended to study the distribution of population over space. We have a little more than 8,000 municipalities; by there are more than 70,000 settlements, so this data set seems ideal for gridding purposes. The main disadvantage is that they don’t have a surface assigned to them where to spread their population, these are precisely the urban boundaries we have to construct. There is available, however, a coordinate for each settlement (so a point shape file exist for them). Urban boundaries can be constructed from digital topographic maps available from the National Geographical Institute (IGN) but eventually we found easier and more convenient to extract them from navigator databases. They offer a high resolution, with a MMU of only 35 mts2. Figure 4 describes the process, which essentially consists in: 1. extracting constructions and artificial structures, differentiating dwellings from industrial buildings and commercial and transport units; 2. aggregating houses and urban buildings into greater polygons (using buffers and threshold distances) and disregarding polygons lower than 0.25 Ha; 3. intersecting the previous layer with municipal l boundaries (pycnophylatic constraint)

Once these polygons have been constructed we match them with the settlements given their coordinates, conflicting results are solved by proximity GIS algorithms, unmatched small polygons are disregarded and the population is spread uniformly within these urban boundaries with an added buffer, as can be seen in Figure 5. Eventually, the gridding is performed. The whole process is programmed in ArcView using Model Builder.

51


E– Proceedings of European Forum for Geostatistics Conference 5 - 7 October, 2010 Tallinn, Estonia Figure 5: Matching urban polygons with settlements

www.efgs.info

Results for a NUTS2 region of Spain The final grid at 1 Km2 resolution for Comunidad Valenciana (NUTS2) can be seen in Figure 6. The grid clearly shows a higher degree of population concentration than the one obtained by downscaling municipal data using CLC. In fact, the visual impression we get is a set of scattered points at the settlement locations, with some exceptions at the metropolitan areas. Only 11.4% of the grid cells are inhabited and, while population density in inhabitants per Km2 of the land area is 210, inhabitants per inhabited cell (Km2) is nine times higher, 1,846. We think our grid is closer to the real world than the one produced by the EEA, even it seems that we concentrate the population too much. Validation is still in progress but superior and more up to date cartographic product, as well as a further disaggregation in big cities, will likely improve the grid a great deal. (While writing this the IGN has just released a street map in SIG vector format for almost the whole Spain. This information will be very useful to improve and validate our results.)

Once these polygons have been constructed we match them with the settlements given their coordinates, conflicting results are solved by proximity GIS algorithms, unmatched small polygons are disregarded and the population is spread uniformly within these urban boundaries with an added buffer, as can be seen in Figure 5. Eventually, the gridding is performed. The whole process is programmed in ArcView using Model Builder. Figure 6: 1 km2 population grid for Comunidad Valenciana

52

Using a higher resolution enlarge the differences. Hence, we find striking dissimilarities when we compare our results with the EEA population grid at 1Ha resolution, as can be seen by observing Figure 7, which offers both grids for a NUTS3 region, Valencia. While most of the interior is empty in our grid, except at the small urban nucleus, where tiny points can be observed; the EEA grid offers a picture corresponding to a much more dispersed population. In the near future, for the next census, the INE is georeferencing all the buildings of the country according to the Census 2011 project, with a particular emphasis in houses. So gridding population will be a trivial exercise


www.efgs.info

E– Proceedings of European Forum for Geostatistics Conference 5 - 7 October, 2010 Tallinn, Estonia

Figure 7: Population data and administrative structure Note: The same colour intensity does not rep resent the same number of people per cell in each map.

using a bottom‐up approach. This will provide invaluable information to validate our methods, to see if they can be applied and extended to more general problems with other economic statistics, so we can move beyond gridding population.

Refeerences

Langford, M. and Unwin, D. J. (1994) “Generating and mapping population density surfaces within a geographical information system”. Cartographic Journal, 31, 1,, 21‐26. Martin, D.; Tate, N. J. and Langford, M. (2000 ) “Refining population surface models: Experiments with Northern Ireland Census data”. Transactions in GIS, 44, 4, 343‐360.

Chenn, K.; McAneney, J.; Blong, R.; Leigh,, R. Hunter, L. and Magill, C. (2004) “Defining area at risk and ittseffect in catastrophe loss estimation: A dasymetric mapping approach”. Applied Geography, 24 , 2, (April), 997‐117. Galleego, J. (2010) “A population density grid of the European Union”. Population & Environment, 31, 3, (Ju ly), 460‐473. Langford, M. (2007) “Rapid facilitation of dasymetric –based population interpolation by means of raster pixel maps”. Computers, Environment and Urban Systems, 331, 1, (January), 119‐32. 53


E– Proceedings of European Forum for Geostatistics Conference 5 - 7 October, 2010 Tallinn, Estonia

www.efgs.info

Rules, techniques and processes to design the enumeration areas to collect and disseminate Italian census data Grazia Ticca ISTAT

The presentation given at the European Forum for GeoStatistics, Tallin 5-7 October 2010 illustrates the rules and techniques to design/update the enumeration areas (EA) to collect the census data; the main goal is to provide a description of the EAs, that could be useful to identify the rules to disaggregate the data, in order to produce the 1 SQKM grid for 2011 census population, as aimed by the GEOSTAT project. ISTAT, the Italian NSI is in the process of updating the census cartography. The process is based on the update of the previous census EAs, on top of images and local cartography, especially focused on urban areas, rather then in rural areas. The EAs are built on administrative boundaries and the updating rules are oriented to maintain the comparability with the previous censuses, when possible, and to follow the international recommendations on urban areas. The dissemination of the census results is based on the same enumeration areas and on the administrative boundaries, depending on the variables. The updating process will be completed within 2010 and is operated in cooperation with the Italian municipalities. The census cartography is strictly linked to the administrative boundaries and it is designed to identify and describe the urban areas and productive areas, rather than rural areas; that is done inside each municipality (or commune – LAU2). ISTAT provide the rules to design the cartography, supervise more than 8.000 municipalities, build and harmonize the geographic census database; the municipalities are the main census executors and that’s stated by the law. The national administrative subdivision is hierarchical; it is based on regions (20), composed of provinces (110), composed of municipalities (8100). Inside each municipality, the census cartography identify mainly the following entities: 1. “urban localities”, classified into:

 “big localities” (centri abitati): groups of houses,

54

not far more then 70 meters one from each other, connected by roads. The locality should have public services (schools, stations, pharmacies) and should be an “aggregation center” for the living community

 “small localities” (nuclei abitati): groups of houses, not far more then 30 meters one from each other, with a minimum number of buildings and households (5, for the 2001 census), but without the public services. 2. “productive localities” (industries): a locality in a non-urban area, with at least 10 firms or 200 employees, with an area of at least 5 hectares. 3. the remaining part of the municipality is delimited, but not classified. It represents the non-urban areas. Each locality is then divided into census enumeration areas (sezioni di censimento); those are the minimum enumeration and dissemination census areas. Each EA is composed of groups of buildings, so that the number of enumeration units does not exceed 400 units (a rule related to the 2001 census, this limit had to be verified with the census results); so the final version of EAs and localities could be available after the census revision. There are no special coding rules for the IDs of the EAs, other than they should be unique. For the 2001 census, ISTAT went through a long and complex process to update the census cartography (CENSUS2000); that was done to enhance the geometrical precision and territorial sub-division. A codification process was also done to code the morphological attributes, such as: rivers, lakes, desert mountains, islands etc but that is not a land use. The result was a national geographic database, with a reference scale of 1:10.000 for the urban areas and 1:25.000 for the non-urban areas; due to the different reference cartography used (aerial photos, digital street maps, local cartography, etc) the scale could not be uniformly certified, but for statistical purposes that was a very good result. The numbers of the CENSUS2000 project, based also on the 2001 census results, are on the Table 1. In the process of updating the cartography for 2011 census, there are some changes related both to maintain the comparability with 2001 cartography (minimal changes in the existing EAs, both for shape and numbering) and to the adoption of the international recommendations in terms of the requirements to identify new localities. The updating process, still in progress but close to the end, is especially oriented to the design of the EAs in the new built-up areas. The definition rule for the localities has not changed, except for that: a constraint has been added to follow international recommendations, so that about 50 people (or 15 houses and 15 households) should be in the new locality. This rule is .


www.efgs.info

E– Proceedings of European Forum for Geostatistics Conference 5 - 7 October, 2010 Tallinn, Estonia

Table 1: numbers of the localities and figures for enumerator areas

Total number of EAs (sezioni)

382.534

Total population 2001

56.995.744

EAs by type of locality

No of EAs

Population

Percent of population

Big Localities (Centri)

258.656

51.858.988

91%

Small Localities (Nuclei)

39.449

1.725.470

3%

Productive Localities

2.589

53.385

0,1%

Non-urban areas

81.840

3.357.901

5,9%

Minimum Population per EA :

1 inhabitant

Number of EAs with 1 inhabitant

3.289

Maximum Population per EA

3.386

Mean Population per EA

49

Number of EAs with population

336.788

Empty EAs

45.746

EAs by area

Min 100 m2 Max: 50.000.000 m2 (50 km2) Mean: 800.000 m2 (0,8 km2)

Total number of Localities

75.495

Big Localities (Centri)

21.672

Small Localities (Nuclei)

36.577

Productive Localities

2.233

Non-urban areas

15.013 Table 2: CENSUS2011 provisional figures for localities

not applied backward to the existing localities. Special care is dedicated to the analysis of the borders among municipalities, so that the concept of “crossing border locality” is adopted; that is a crucial point to skip the limitations of having a cartography cut on administrative boundaries. (For ex. a group of houses that haven’t the requirements to be a locality, can be promoted to be a locality, because those are close to a locality in the adjacent municipality). In the process of updating the cartography, ISTAT adopted a mixed strategy, depending on the population size of the municipalities. There are essentially 2 groups: municipalities with more than 20.000 inhabitants (about 180 on 8100) and the others. For the smaller ones, ISTAT updated the census cartography and then share it with the municipalities themselves to have their final approval; while the bigger ones had the possibility to update themselves the cartography and send it back to ISTAT. ISTAT then takes care of building the national geodatabase, harmonizing all the data that are then ready for the 2011 census. Some provisional numbers of this CENSUS2011 project are shown on Table 2. For the bigger municipalities, there is also an ongoing project to create lists of addresses geocoded to the EAs; that because in the incoming census round, Italy will adopt a register supported, sample strategy (long-short form in municipality with 20.000 inhabitants or more) and a coverage survey to estimate and to correct register under coverage.

Total number of EAs (sezioni):

more than 400 000

EAs by type of locality:

ca 320 000

Bigger Localities (Centri)

ca 41 000

Smaller Localities (Nuclei)

ca 3 000

Productive Localities

ca 3 000

Non-urban

ca 40 000

In municipalities with more than 20.000 inhabitants, ISTAT will produce a basic set of data (the variables in the short form) at level of Enumeration Areas, while for social and demographic variables which are only in the long form estimates will be produced at Census Areas level (a newly defined sub LAU2 areas with about 15.000 inhabitants). In municipality with less than 20.000 inhabitants only long form will be adopted and data will be produced at EAs level. With the described census mapping cartography, we are not able to guarantee when it will be possible to have a grid based dissemination system, anyway due to the granularity (small size) of the EAs, some tests have been done and the results will be illustrated during the presentation. We hope that the presentation could be useful, to the GEOSTAT project, in the definition of the disaggregating rules to build the European 1 SQKM grid 2011 census population map.

55


E– Proceedings of European Forum for Geostatistics Conference 5 - 7 October, 2010 Tallinn, Estonia

www.efgs.info

New spatial dimension and applicability of old census and administrative data Igor Kuzma Statistical Office of the Republic of Slovenia A wide range of applicability of spatial statistical data for managing and planning various human activities in the environment or monitoring the trends of different phenomena in space and time requires an adequate response from data providers. Long time series and register-oriented databases managed by the Statistical Office of the Republic of Slovenia (SORS) or other authorities were recognised as a valuable support for these tasks and many enable their managing and dissemination on grids which offers a new dimension to the existing administrative territorial division. Grid-based statistics at SORS are derived both from polygons (e.g. enumeration areas) and from point located data. Register-oriented statistics in Slovenia expectedly offered a good foundation for creating grid statistics of high resolution. The Register of Spatial Units – initiated by SORS and now managed by the Surveying and Mapping Authority of the Republic of Slovenia – was the first step towards a sound territorial division which enabled first point locating of statistical data (1971 Population and Housing Census) in Slovenia. These 1971 Census data were used for the establishment of the Central Population Register (CPR) and for the very first time personal identification numbers were assigned to the people residing in Slovenia (O. Flander 2007), which is important for easier later joining of the data from some of the registers. Although these data could be stored only in tables and not really managed graphically as they can be today by means of GIS, it was decided to permanently preserve the spatial references of the highest possible (or acceptable) positional accuracy. This far-sighted decision became very relevant when the graphical part of the Register of Spatial Units was completed in 1995. The data stored in tables did have their spatial reference but before that it was very difficult or even impossible to analyse them by means of GIS on the entire national territory. Practically this means that from 1995 on e.g. population data captured in the 1971 Census could be graphically presented for each person on a map as accurately as to their house of permanent residence or to the corresponding enumeration area. When SORS started to handle spatial statistical data on grids, mostly the point located data from various registers were considered as applicable but later it was decided to georeference also the census and register data captured before 1995.

56

CPR data records before 1995 include complete set of IDs of addresses of persons and these IDs were used to connect 1994 and older CPR data to co-ordinates of buildings from the Spatial Register (Figure 1). Set of IDs for individual person includes IDs of municipality, settlement, street, house number and optionally house num-

ber addition. At the moment about 90 % of CPR data can be linked to coordinates and 95 % is expected at the end of the project. CPR data since 1981 can thus be aggregated into grid cells of optional size. Data from 1971, 1981 and 1991 Census on the other hand include only the attributes of spatial districts and higher territorial units of administrative territorial division without addresses. Spatial district (enumeration units consists of spatial districts) is the smallest spatial unit in Slovenia and 11% of them have surface smaller than 100m2 and 66 % smaller than 1km2. To handle polygon data on grids the polygons have to be either dissagregated into grid cells or the polygon data have to be point located and then aggregated into grids. Since spatial districts are relatively small in surface it was decided to apply the aggregation method. The process of transforming polygon data into point located data practically means defining the centre of gravity (centroid) in the form of a co-ordinate for a particular phenomenon in a particular spatial district. Centroids of spatial districts in Slovenia already meet this requirement since they mostly coincide with the area of the highest population density in that particular spatial district. The centroids are thus defined by the location of significant objects, e.g. schools. Spatial districts without significant objects obtain their centroids from significant natural objects, i.e.: 1) centre of gravity of densely built-up area of the spatial district; 2) centre of gravity of all buildings in the spatial district when buildings are scattered; 3) centre of gravity of the spatial district when there are no buildings in the spatial district. Any territorial change of the spatial district consequently means a change of its centroid. Despite this, the centroids of spatial districts were additionally examined and corrected where necessary since the population distribution has changed over the past decades significantly in some areas. The correction performed was based on the present state of the centroids of residential buildings where the information of the construction year of buildings was used to select only buildings which existed and were populated in a particular census period. Additionally, the position accuracy of 1981 and 1991 census data on population when aggregated to grids was secondly examined with georeferenced data from the CPR already available for those periods. Point located historical data can be easily manipulated with GIS and various spatial analyses cartographically presented offer a unique overview of trends of various phenomena in time and space (Figure 2).


www.efgs.info

E– Proceedings of European Forum for Geostatistics Conference 5 - 7 October, 2010 Tallinn, Estonia

Figure 1: Process of assigning coordinates to the CPR data

Figure 2: Change of ageing index, Osrednjeslovenska statistical region (NUTS3), 1 km grid, 1981 – 2010

References Oblak Flander, A.: Opportunities and Challenges of a Register-Based Census of Population and Housing – the Case in Slovenia. Seminar on Registers in Statistics – methodology and quality, Helsinki, 2007

57


E– Proceedings of European Forum for Geostatistics Conference 5 - 7 October, 2010 Tallinn, Estonia

www.efgs.info

A Proposal for the dissemination of Brazilian 2010 Population Census Maria do Carmo Dias Bueno IBGE/CDDI/COPES Introduction The data used in social studies are largely derived from household surveys such as the population census. These data are spatially distributed in areal units where it appears to have an internal homogeneity, i.e., these areas are composed by groups of individuals and tend to be similar when compared to other areas. Of course, this assumption is not always true and there is no guarantee that the data distribution is homogeneous within these units (Dias, 2002). It is common that these survey units are defined by operational criteria (census tracts) or politicaladministrative boundaries (municipalities). Unfortunately, these criteria do not meet the needs of users who would like to have disaggregated data in the smallest spatial unit possible, and especially that these units do not change over time. These needs will meet the studies and analysis conducted by researchers from different areas, many of social and/or environmental nature, whose spatial representation cannot be achieved by any of these territorial units. Although there are several methods to transfer data from one area to another, these techniques are limited by the fact that data are not available at the individual or household level. The unavailability of individual data, both as a person or as a household, aims to regard data confidentiality, but on the users' point of view this is a limitation. Another difficulty in using these data is the temporal comparability, since the boundaries of the territorial units change over time, so that a census tract or a politicaladministrative region does not necessarily are the same in previous periods. Some possible alternatives to be implemented by the statistical offices that allow the use of more detailed data is the provision of an internal and/or external service where users can work with the micro database without direct accessing it, or implement a confidentiality engine to prevent the identification of the informant (United Nations, 2001) or, disseminate data based on regular grids (Tammilehto-Luode, 2003) or small output areas drawn artificially based on similarity criteria (Vickers and Rees, 2007). The use of geospatial data in statistical offices

58

A major goal of statistical offices is to help users better understand data - better in this context means readily and easily. The data produced by these institutions are the knowledge base necessary for governments, compa-

nies and individuals understand the constant changes that occur in society. These data integrated with others from areas such as economics or the environment, increase the value and usefulness of the resulting data. The results of statistical surveys need to be disseminated to a diverse audience so that the benefit of its use may be accessed by everyone. These data are essential for most public policy decisions and then they must be readily available in a quick and easy way. The traditional role of maps in the production of statistics is to support data collection and display the results in a cartographic form. Likewise, spatial technologies such as Geographic Information Systems, Global Positioning System - GPS and remote sensing imagery, support the research activities of statistical offices in a comprehensive way, working in all stages of operation – data collection, processing and dissemination (United Nations, 2009). Thus, one can say that has always existed a relationship between geography and statistics, and this integration brings great benefits, like the reduction of money and time needed to collect, compile and distribute information. The emergence of geospatial technologies has strengthened this relationship by enabling better management of information, faster data retrieval and better report and analysis. The role of data integration offered by GIS, allowing the relationship between different types of information, led to a wider use of statistical information. This, in turn, increased the need of statistical offices to produce geospatial information with quality and precision. The use of geospatial technologies in IBGE The use of emerging geospatial technologies in household surveys began to be present in the work processes of IBGE around year 2000 and resulted initially in two projects: the creation of a digital updated census mapping and the creation of a digital address list. Address List The statistical offices use extensively address information in conducting their surveys, either in the data collection period, during which the interviews are done, or monitoring period of this task, or even during the sending and control of receipt of postal questionnaires. Thus, one can see that addresses have an essential role in carrying out household surveys, although not very evident several times (IBGE, 2008). To IBGE, the creation of an address list has contrib


www.efgs.info

E– Proceedings of European Forum for Geostatistics Conference 5 - 7 October, 2010 Tallinn, Estonia

uted to increase the efficiency of the stages of planning and supervising data collection. The benefit of this list to society is to make possible to geocoding, i.e., the location of an address in space. The association of addresses with maps allows the relationship of data from many sources, such as health and education, for example, to the information produced by IBGE (IBGE, 2008). The address list was created from paper registers of units which have been surveyed in 2000 Census. In 2007 it has been updated during the operation of the Agricultural Census and Population Count. In rural areas the geographic coordinates of farms, hospitals, schools and residential establishments were captured. This was possible only because of the use of handheld computers or Personal Digital Assistant - PDA equipped with GPS (IBGE, 2008). The address list, despite dealing essentially with spatial entities - addresses - was created initially as a simple database or a digital list. Its spatial feature inevitably led to its integration to census maps, a task that began with the preparations for the 2010 Census. To accomplish this integration, the vector features that represent the street faces were related to appropriate geographic codes that allow the correct identification of each part of the street. The numbering of existing buildings at the corner of each street was also put on the database. Thus, through transactions between databases, the vector lines can be related to the address list, allowing the identification of all units located in each street segment. This tool will be used in the application that performs data collection for the 2010 Census, making the operation enhanced and fast, as a result of better territory recognition by the enumerator. Digital Census Mapping The Census Mapping consists of a set of maps and databases that enables the management of the territory, splitting it up in small pieces to organize the operation of data collection to household surveys, with particular emphasis on the censuses. These small pieces of land are the census tracts. IBGE is the producer of the topographic mapping used as basic input for rural maps; however, the urban mapping that uses maps in cadastral scales is not developed by the Institute, but by state and local public agencies and private companies. Because of these inputs’ characteristics, the tasks concerned to census mapping initially had an unusual treatment and so the rural and urban parts were worked separately (Barbuda, 2004). IBGE began to work with a digital census mapping during the preparatory actions for the 2000 Census, continuing this effort for Censuses 2007 and incorporating new techniques for the 2010 census, as discussed below.

2000 Population Census and 2007 Censuses (Agriculture Census and Population Counting) In 2000, the production process of the Rural Census Mapping began with the preparation of municipal maps, representing the physical elements, such as rivers and roads, and the administrative boundaries, such as municipal and district boundaries. From these maps were drawn statistical maps, with the addition of the census tracts. The production of these maps was performed using the Semi-Automatic System to Prepare Municipality Maps – SisCart, developed specifically to IBGE, using the software MicroStation/MGE from Bentley/Intergraph. This system greatly sped up the construction of the municipality map, made in a decentralized manner, addressing, among other tasks, the homogenization of the cartographic projection and scale, georeferencing of raster topographic sheets, validation and processing of geometric features at the sheet borders, clipping of the perimeter of municipalities and the composition of the frame and marginal data (Barbuda, 2004). The final digital map had a hybrid format, corresponding to a raster layer and an overlay of the vector information corresponding to rural census tract and those data coming from the update stage (IBGE, 2008). The urban census mapping is supported by detailed maps produced by government agencies (municipal governments and others), utility services like water, sewer, electricity, telecommunications and other producers of mapping at compatible scale. These maps have presented differences in its geometry, degree of updating and digital format, and they were the source to map all cities, towns and villages in Brazil. These map production was performed by a system based on the MicroStation platform, enriched with many features to speed up specific tasks such as editing, reviewing and querying (IBGE, 2008). For the dissemination stage, the file with the census tracts was treated in a GIS environment and provided in vector format (shapefile) (IBGE, 2008). The process of updating urban and rural maps continued for Census 2007 and has been performed a more comprehensive update. 2010 Population Census The main goal of 2010 Census Mapping was the urban map edition – its linkage to address list and better geometry and georeferencing - through the use of geotechnologies. Another objective was the integration of the urban maps with the rural ones, creating a fully integrated and continuous census mapping (IBGE, 2009). The implementation of the project was divided into two modules: one for municipalities with population up to 20 000 inhabitants and one for those with populations above 20 000 inhabitants. 59


E– Proceedings of European Forum for Geostatistics Conference 5 - 7 October, 2010 Tallinn, Estonia The first module was developed by IBGE staff, using Geobase, software developed by the institution itself, which uses Map Objects and Delphi technology. The function of this application was to allow the adjustment of the graphic files and make the association of vector features with the address list. The main inputs for this stage were satellite images, maps and information available at IBGE and the address list. A survey has been conducted with local governments (municipalities, public agencies, etc.) and utility services (electricity, water, etc.) to verify the availability of cartographic products that could support the implementation of this task. For locations where there was any kind of input current available, which could be used as reference for the vector editing, the streets were drawn through field survey using GPS tracking (pathway with collected GPS points) and the boundaries of census tracts were drawn based on the textual description of them. The second module was performed by companies operating in the geotechnology market, who performed the integration of their databases with the census tracts and the address list from IBGE.

www.efgs.info

of the mission of the statistical offices, mainly when they have technologies and data in a proper format, that help to solve or at least to minimize these issues. The proposal for the dissemination of 2010 Census is the creation of a query service over the Internet, where a user properly identified can define a study area using a graphic interface and choose the variables that want to retrieve. This information will be processed and the result will be the sum of the values of each variable correspondent to the set of faces that make up the area selected by user (Figure 3). Thus, the user can query the census database using as the minimum spatial unit a group of street faces, so that the result will be better. In the case of the minimum spatial unit is the census tract, the result of the intersect operation will be too coarse when compared to the desired area (Figure 2).

Figure 1: Area of Interest

The output of both modules resulted in a continuous spatial map, managed and operated by a new application specifically developed to IBGE - the Mapping System – SISMAP, developed with Intergraph products. Dissemination trends Understanding the spatial distribution of phenomena is a great challenge to comprehension and managing of issues from different areas of knowledge, like health, environment, education, agriculture and many others. Such studies are becoming increasingly common due to the democratization of information, the technological advances, the falling costs of these technologies, and the dissemination of geographic information systems, which are becoming more friendly and interactive. In a context of geospatial technologies, IBGE is making a major change in its methodology of work related to mapping census, as was pointed out in previous sections. One of the rewards of this effort will be the improvement of the products available for dissemination and the creation of new products, especially those related to statistical data associated with maps. A common protest from users of census data is about the aggregation of results. Although data is collected for each household unit, the results are released aggregated on territorial unit as administrative regions and census tracts. It is well known that the census tracts are units of data collection and that they are defined by operational purposes; they were not developed for dissemination purposes, despite being used as one (Martin, 2000). This "deviation" in its use is largely responsible for the problems faced by users. 60

The offering of new possibilities to using data is part

Figure 2: Intersection of area of interest with census tract

Figure 2: Intersection of area of interest with street face


www.efgs.info

E– Proceedings of European Forum for Geostatistics Conference 5 - 7 October, 2010 Tallinn, Estonia

There are several possibilities to delimitate the user area of interest, such as a point location and a radius, the identification of streets by their names, the drawing of lines and/or polygons, the loading of vector files, and the use of areas corresponding to census tracts in previous years. It will also be possible to delimit study areas covering rural and urban parts, because census data include the entire territory.

IBGE, 2008. Censos 2007 – Inovações e impactos nos sistemas de informações estatísticas e geográficas do Brasil. http://censos2007.ibge.gov.br/ Censos2007_Inovacoes_web.pdf

As the query is complex and requires an excessive time for processing, the user will receive the data via email.

Tammilehto-Luode, M., 2003. GIS for Dissemination of Census Data. EuroGrid Workshop. Italy, 27-29 October.

It is not yet established the restriction criteria that should be imposed to preserve confidentiality, since the query uses the micro database.

United Nations, 2001. Guidelines on the Application of new information technology to population data dissemination. Economic and Social Commission for Asia and the Pacific.

Martin, D., 2000. Census 2001: making the Best of zonal geographies. The Census of Population: 2000 and Beyond. University of Manchester, UK, 22-23 June.

Future researches Many opportunities and many challenges arise from the creation of a census mapping geocoded at the addresses level, acquired from the use of emerging technologies. One approach that begins to be investigated by IBGE refers to the dissemination of aggregate data in a regular grid. Despite the many advantages presented in the literature (Tammilehto-Luode, 2003) that suggest the feasibility of its use, some questions need to be carefully analyzed so that the solutions can effectively turn into reality. Is it feasible to use regular grids in Brazil, considering the dimensions of the grid cell (at least 1 x 1 km) and the area of the country (8.5 million km2)? What are the advantages in using such grids, from the point of view of the user and the data producer? How to use smaller data dissemination areas and preserve statistical confidentiality? References Barbuda, M. M. S., 2004. A Atualização Cartográfica na Base Territorial Rural Visando a Contagem da População 2005 e o Censo Agropecuário 2006. 6º Congresso Brasileiro de Cadastro Técnico Multifinalitário e Gestão Territorial - COBRAC 2004. Florianópolis, SC, 10-14 de outubro. Dias, T.L., Oliveira, M.P.G., Câmara, G., Carvalho, M.S., 2002. Problemas de escala e a relação áreaindivíduo em análise espacial de dados censitários. Informática Pública, 4:89-104. IBGE, 2009. Censo 2010 – Os primeiros passos. http://www.ibge.gov.br/censo2010/ censo2010_primeiros_passos_ago08.pdf

United Nations, 2009. Handbook on geospatial infrastructure in support of census activities. Studies in Methods, Series F, n. 103. Vickers, D., Rees, P., 2007. Creating the UK National Statistics 2001 output area classification. Journal of the Royal Statistical Society, A (Statistics in Society). Vol.170, Number 2, pp. 379-403.

Statistics on small areas in Denmark with special attention to urban areas Michael Berg Rasmussen Statistics Denmark.

Census on urban areas then and now The art of counting people on an urban level are in Denmark not a new phenomenon. Already in the first statistical yearbook of Denmark in 1896 a census appears for certain towns. However, the early censuses only included urban areas, which at that moment seemed interesting. From 1901 the censuses made in urban areas became more systematic and from this year a census concerning the urban areas was made every fifth year. This five year interval between the censuses was kept up until 1986, where the interval became shorter. From 1996 the census on urban areas is made annually. New methods in 2005 Until 2004, the demarcation of the urban areas was carried out by Statistics Denmark itself. It was done by using paper maps, pencils and rulers and was a rather time consuming affair. In 2006 new GIS-technology was introduced and cooperation with the National Maps and Cadastral commenced. The development of the new method took a long time, so no census was made in 2005.

61


E– Proceedings of European Forum for Geostatistics Conference 5 - 7 October, 2010 Tallinn, Estonia

www.efgs.info

 Town cores

Statistics Denmark and the National Maps and Cadastral work together

 Close high built-up areas

The purpose of the cooperation is to make a more reliable census on urban areas without those errors, which earlier affected the census. The process in the cooperation is in broad outline as follows:

 Close low built-up areas  Camping sites (depends on the location of the entrance)

1. The National Map and Cadastre makes the demarcation of built-up areas and delivers the result as built-up polygons in ArcGIS shape file-format to Statistics Denmark

 Recreational area and sports centres

2. Statistics Denmark sums up, how many persons, who are living in the built-up areas

 Industrial areas

 Technical areas

 Harbours (if they in a natural way form part of a

3. Statistics Denmark decides from the summation whether a built-up area can be classified as an urban or a rural area.

built-up area)

 Forests, groves and lakes/village ponds (in case

Definition of an urban area

they are totally surrounded by built-up areas).

Statistics Denmark defines an urban area as a coherent build-up area, where the distance between the buildings is less than 200 meters, unless the interruption is due to public facilities, parks, cemeteries, railways and similar things of urban character. This definition is based on the UN definition of urban areas.

Holiday cottage areas make up a special class, for which reason they in many cases are segregated as their own built-up areas. The above mentioned list of urban elements comes up to the requirements of Statistics Denmark for an urban demarcation. The result of this demarcation is stored in an ArcGIS - shapefile.

The contribution of the National Map and Cadastre to the census

The contribution of Statistics Denmark to the census

The National Map and Cadastre includes the following topographical features and areas in their urban demarcation (look at Figure 1 for an example):

Figure 1: The demarcation of Elsinore (Helsingør) according to the National Map and Cadastre (Kort- og Matrikelstyrelsen). The old town core is situated down by the harbour and the castle of Kronborg. Here and there are areas with close high builtup areas and some areas with close low built-up areas. There are also some industrial and technical areas together with parks, cemeteries and sports centres - all of which are characteristic of a town. All these types of areas are surrounded by a red line which closes up into a polygon. This is the built-up polygon, which demarcates the town of Elsinore.

An extract of data is made from the Central Population Register. This contains all the inhabited unit Figure 2: Right to the west of the town of Elsinore are these five built-up areas (Munkerup, Dronningmølle, Villingebæk, Kildekrog and Hornbæk) located. They are merged into one single urban area (under the name Hornbæk-Dronningmølle), because they are nestled close up to each other. Another example in this map you find in the south-western part of the map, where the two small settlements of Esbønderup and Esbønderup Kohave are assembled into one single unit called Esbønderup.

# # #

#

#

#

# # #

# ##

#

#

#

#

# # # # # #

#

#

#

#

#

# # #

# #

#

# #

## #

##

#

#

#

#

#

#

# #

#

#

#

#

#

#

# #

#

# #

# #

# #

#

#

#

# #

#

## # # # # #

## #

#

# #

#

## #

#

#

# #

#

#

# #

#

#

#

#

#

#

# # #

#

#

# #

#

# #

#

#

#

# #

#

# #

#

#

#

#

#

# #

#

# #

# # #

# #

#

## #

#

# #

#

#

#

##

# #

#

#

#

# # #

Fuglefę ngerhuse

#

#

#

Helsingųr

#

#

#

#

#

#

#

Munkerup

#

#

#

#

#

# #

#

#

#

##

#

#

#

# # #

#

#

# #

#

#

#

#

# #

#

# # # # # ## # # # # # ## # # # # # # # # # # # # # # # # ## ## ## # # # # # # # # # # # # # # # # # ## ### # ## # # # # # # # # # # # # # # # # # # # # ### # # # # # # # # # ## # # ## # # # # # # ## # # # # # # # ## # # # # # # # # # # # # # ## # # # # # # # # # # # # # # ## # # # # # # # # # # # # # # # # # # ## # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # ## # # # # # # # # # # # # # ## # # # # # # # # # # # # # # # # # # # # # ## # # # # ## ## # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # ### # # # # #

#

#

#

# ## #

#

#

# #

#

#

#

#

#

#

#

#

Nygård

#

#

#

#

#

# #

#

#

# #

# #

## #

#

#

#

#

#

#

Spidsbjerg

Espergę rde

# # ## #

# #

## # # # # # # #

#

# # # # #

#

# #

#

# #

#

#

#

# #

#

#

#

# #

#

#

#

# # #

# #

## #

#

#

#

#

# #

# # #

#

# # #

#

# # # # #

#

#

#

#

# #

#

#

#

#

#

#

#

#

#

#

# #

#

# #

#

#

# #

#

#

#

# #

#

#

# ##

#

#

#

#

# #

# # # #

#

#

#

#

#

Petersborg

# # # # #

# # # # ## # # # # # ### # # # # # # # # # #### # # # # # # # # # # ## # # ## # # # # # ## # # ## ## # # # # # # ## # # # # ## # # # # ## # # # # # ## # # # # # # # # # ## # ## ## # ## # # # # # # ## # # # # # # # # # # # # # ## # ## # # #

#

#

# ##

# #

# #

#

# #

#

#

#

# #

#

#

#

#

#

# #

# # # #

#

#

##

#

#

#

#

Hellebę k #

#

#

#

#

#

# #

# ## #

#

#

#

# #

# #

# #

#

# # #

#

## ## #

# # # ##

# #

#

#

Snę vret

#

#

#

#

#

# #

#

#

#

#

#

#

#

#

#

#

#

#

#

# # ## # #

# #

# #

Saunte

# #

# # # # # # # # # # # # # # # # # # # # ### # # ## #

#

# # ## #

#

# # # #

#

#

#

#

# #

#

#

#

#

# #

#

#

#

# #

# ##

# #

#

#

#

#

#

#

#

#

#

# #

#

#

#

#

# #

#

#

## # #

# #

#

#

#

# # #

# #

##

# # #

#

# #

#

#

# # #

#

# #

#

#

#

#

#

#

# #

#

##

# ##

# #

#

# #

# #

#

#

#

#

# ## # # # # #

#

Bųtterup

#

#

#

##

# #

#

#

#

#

# #

# #

# #

#

#

#

# #

# #

#

#

#

#

#

#

#

#

# #

#

#

Sųnderborg Huse

##

##

# # # # # # #

#

#

#

#

# #

# #

# # # #

#

# #

# # # #

#

#

# #

# ## #

Esbųnderup Kohave

# #

# #

#

##

#

### # # # # # # # # #

#

#

# #

##

## ## # # # #

#

##

#

## ## ### # # # # # #

# # #

## # ##

#

#

# # #

#

#

#

#

# #

#

Ny Horserųd

# #

# #

#

#

# #

# #

#

##

# # # #

#

#

#

##

#

# #

# # #

#

# #

#

#

#

#

#

# #

# #

# # #

##

#

# #

#

#

# #

#

# #

#

# #

# #

#

#

# #

#

#

#

# #

# # #

# #

#

#

#

#

#

#

#

#

# ##

#

#

#

#

# #

#

# #

#

#

# #

# #

#

# #

#

#

#

# # #

# #

#

#

#

#

# #

# #

#

#

#

# # #

# # #

#

#

#

Harreshųj #

#

#

#

#

# #

## #

# # # # #

#

#

#

#

#

#

# #

#

#

#

# #

# #

Dale

#

# # # #

#

###

#

# # # ## #

#

#

#

#

#

#

#

#

#

#

# #

# # # ## # ## # # # # # # # #

#

#

# #

#

#

#

© Kort- og Matrikelstyrelsen

# # #

# # #

#

#

# #

#

# #

#

# # #

# #

#

##

#

#

#

#

#

# #

#

#

#

#

#

# #

#

#

#

#

#

#

#

#

# #

#

#

# #

## # #

#

# #

#

#

#

#

# # #

#

Bistrup

#

Plejelt #

#

#

#

#

#

#

# #

##

#

#

#

#

#

# #

# #

#

# # # #

#

#

#

#

#

# #

##

#

#

#

# #

#

#

# #

#

#

#

# #

#

Horserųd #

# #

#

#

#

#

#

#

#

#

#

Risby

##

#

#

#

#

##

# #

#

#

#

#

#

# #

# # #

#

#

#

#

#

#

#

#

#

#

#

#

#

##

#

# #

#

#

#

#

#

#

#

#

## ## # # # # ### # ## # # # # ## # # ## # # # # #

#

#

#

#

#

#

#

#

#

#

#

### #

#

#

#

# ## # # # # # # # ## # # # # # # # ## # # # # # # # ## ## # # # ## ## # # # ## # # ## # # # ## # # ## ## # # # # # # # ## # ## ## # # # # # ## # # # # # # # # # ## ## # # # # # # # # # # # # # # # # # # #### # # # # # # ## # ## ## ## # # # # # ## # # # ## # # # # # ## # #

#

#

#

#

#

#

Esrum #

#

#

#

#

# #

#

#

#

#

## #

## # # # ## # # # # ### # #### # # # # # ## # # # # # # # # ## # # # # # ## # # # # # # # # ## # # # # # # ## # # # ## # # # # # # # # ## # ## # # # # ## # # # # # # # # # # # # # # # # # # ## # # ## # # # # # ## # # # # # # # ## # # # ### # # # # # ## # # ### # # ## # # ## # # # # ## # # # # # # # ## # # # # # # # # # ## # # # # # # ## ## ## # # ## # # # # # # ### # # ## # # # # ## # # # # ## # # # ## # # # #

#

Holtet #

#

# #

# # # #

## # #

## ## #

# # #

##

#

# #

## # #

#

Esbųnderup

# # # # # ## # # # # #

#

#

#

#

#

#

#

#

#

# ## #

#

#

#

#

#

# #

#

# # #

#

#

#

##

# #

#

# #

# #

#

#

#

#

#

# #

# #

# #

#

#

#

#

# # #

#

#

#

#

# #

#

# #

#

#

# #

#

#

# #

Esrum Mark

# #

#

#

# ##

# #

#

#

#

#

# #

Havreholm

# ## # ##

#

# #

#

#

#

#

#

#

#

#

#

#

#

# # #

#

#

#

#

# #

# # ##

#

#

#

# #

#

#

#

# ### #

#

#

# #

#

#

# #

#

#

#

#

#

#

# #

Skibstrup

#

# #

# #

#

Ųverup

#

# # #

#

#

#

#

#

#

#

#

#

#

#

# #

Borsholm

#

#

# #

#

# # # # ## # # ## # # # ### # ## # ## # ## # ## # ## # # # ## ## # # # # # # ## ## # # # # ## # # # # # # # # # ## # ## # # ## # # # # # # ## # # # # # # # # # # # # # # # # ## # # # # # ## # # # # ## # # # # # # # # ## # # # # # # # # # # # # ## # # # # # ## # # ## # ## # # ## # # # # # # ## ## # # # ### # # ## # # # # # # ## ## # # ## # # # ## # # ## # # ## ## # # # # # ## ## # # ## # # # # # ## # ## # # # # # # # ## # ## ## # # # ### ## # ## # ## ## # # ## ## # ## # # ### # ## ## # # # ## ## # # # # ## # ## # # # # ## # ## # ## # # # ## # ## # ## ### # # # # ## # ## ## ## # # # # # # ## # # # # ## ## # ### ## ## # # ## # ## # ## # ## ## # # # # ## # # ## # ## # # # ## ## # ## # ### # # # # # ## # # ## # # # # # ## # ## ## # ### # # # ## # # # # ## ### # # ## # # # ## # ## # # # ## ## # # ## # # # # # ## # # # # # # # # ### ## # # # ## ## # # # # # # # # ## # # ## # # ## # # # # ## # ## # ## # ## ## ## # ## ## ### # ## # ## # ## ## # # # ## # # # # # ## # ### # ###### # ## # ## ## # ## # ## # # ## # ## # ##### # # # # ## # # # # ## # # # ## # # # # # ## # # # # # # # ### # # # # # # # # #### # # # # # # # ## # ## # # # # # ## # ## ## # ## # # # ### # ## ## ## # ## # #### # # # ## ## # # # # ## # # # # ## # # ## # # ## # # ## ## # # ## # # ### # # ## # ## # # ## # # ## # ## # # # ## # # # # # # ## # # ## ## ## ## # # # ## # # # ## # # # # # # ### ## # # ## # # ## # # ## ### # # ## # ## # # ## ## # # # ## # # # ## # ## # ## # # # # # # # # # ### # # ## ## ## # # # # # # # ## # # # # # # # # # # # # # # ## # # # # # # # # # # # ## # # ## # # # # ## # # ## # # # # # # ## # # # ## # # # # # # #### ### # # # # # # ## # # # # # # # # ## # ## # ## # # # ## # # # # # # # # # # ## # # # # # # # # ## # ## # ## # # # ## # # # # # # # ## # # # ## # # ## # # ## # # ## ## # # # # # # # # ## # ## # ## ## # # # # # # # # # # # ## # # # ### # # # # # # # # # ## # # # # # # # # # # # # # # # # # # # # # # # # ## ## # # # # ## # # # # # # # # # # # # # # # # # ## ## # ## # ## # # # # # # # # # # # # ## # # # ## # # # # # # ## # # ## # # ## # # # ## # # # ## # # # # # # # # # # # ## # # # # # # ## # ### # # # # # # # # # # # # # # # # # # ## # # # # # ### # # # # ## # # # ## # # # # # # # # # ## # # # # ## # # # # # # # # # # # ## # # # # # # # ## # # # # # # # ## # # ## # # # # # ## # # ## ## # ## # # # # # # # ## # ## # # # # # # # # # # # # # ## # ## # # # # # # # # # # # # # ## # # # # # # # # ## # # ## # # # # # ## # # # # # # # # # # # # # # # # ## # # ## # # # # # ## # # # # # ## ## # # # # ## # # ## # # ## # # # # # ## # # # # # ## # # # ## ## # ## # # # # # # # # # # # # #

#

##

##

#

#

#

#

#

#

# # ## # ## # # # # # # # # # #

#

#

#

#

#

# #

#

#

#

#

# #

##

#

#

##

# #

#

# #

# ##

# #

#

#

##

#

# #

# # #

#

#

# #

#

#

# #

#

#

#

#

#

#

#

Villingerųd

#

#

#

#

#

# #

#

#

#

#

# # #

Dragstrup

#

# ## #

# #

#

# #

#

#

#

#

#

#

# # # # #

#

#

#

#

# #

##

#

## ## # # ## # # #

#

# # # # # # # # # ## # # # # # # # # # # # # ## # ## # # # ## # # # # # # # ## # # # ## # # ### ## # # # # # # # # # # # # # # # # # # ### # # # ## # # # ### # # ## # # ## # # ## # # # # ## # # ## # # # # ## # # # # # ## # # ## # # # # # # # # # # # # # # # ## # # ## # ## # # ## # # # # # # # # # ## # # # # # # # # # # # # # # # ## # # # # # # # # # # ## # # # ## # # # # # # # # # # # # ### # # # # # # # # # # # ## # ### # # ## # ## # ## # # # # # # # ## # # # # # # # # # # # # # # ## # # # # # # # # # # # ## # # # # # # # ## ## # # # # # # # # # # # # # # # ## ## ## ## # # # ## # # # # # # # # # # # # ## # # # # # # # # # # # # # # # # # #

Stenstrup

#

# # # #

#

Hornbę k #

# #

#

# #

#

#

#

#

#

# # # # ## # # ## # # # # # # ## # ## # # # # # ## # # # ## # # ### # # # # # # # # # #

# #

# # #

#

#

#

#

#

# #

#

#

#

#

#

# #

#

##

#

#

#

#

#

#

#

Rusland

#

Ferle

#

#

#

#

#

#

#

#

#

#

# #

#

# #

#

# #

#

#

#

#

#

#

#

#

#

#

# #

#

#

#

#

#

#

#

#

#

#

# # #

# #

#

# # # # # ## # # # # #### # # # ## # ##### ## # # # ## # # # # # # # # ## # # ## # # ## # # ## #### # # # # # # # # # # # ## ## # # # # ## # # ## # # ## # # # ### ## # # # # ## ## ## ## # # # # # ### # # # # ## # ## ## ## # # # # # # # #### # # # # ## # # # # # # # ## # # ## # # # # # ### # ### # # # # # # # # ## # ## # # # # # # ## ## # # # # # # ## ## # # ## # # ## # # # ## # # # # # # # ## ## # # ## # # # # # ## ## ## ## # ## # # # ### # ## ## # # ## # # ### # # ## # # # ## # # # # # # # # # # # # # # # ## # # ## # # ## # # # # ## # # # # # # # # # # # # # # # # # # ## # # # # # # # # ## # # # # # # # # ### ## # # # # # # ## ## # # # ### # # # # # # # # # # # # ## # # # # ## # # # # # # # # # # # # # # # # # # # # # # # # # # # # ## ## # # # ## # # # # # # ## # # # # # # ## # # # # # ## # # # ## # # ## # ## # # # # ## # # # # # # # # # # # # # # # # # # # # # # # # ### # # # # # # ## # # # # # # # ## ## # ## # # # # # ## # # ## # # # ### # # # ### # # ### # # ## # # ## ## # # # # # ## ##### # # # # ### # # # # # # ## # ## # # # # # ## # # # ## # ## ## # # # # # # # # # ## # # ## # # # # # # # # # ## # # # # # # # # ## ## # # # # # # # # # # ## # ## # # # # # # # # ## # ## # # # # # # # # # # # ## # # # # # # ## ## # # # # # # ## # # # # # ## # # # # # ## # # # # # # # ### # # # # # ## # # # # # # # # # # # # # # # # # # ## # ## # # # # ## # # # # ## ## # #### # ## ## # # # #### ## ## # # ## # # # # # # # # # # ## # # ### ### ## ## # # # # ## # ## # # # # # # # # # ## # ## # # # # # # # ## ## # # # # # ## # # # # ###### # # # ## # # # # # # # ## # # # # # # # ## ## # ## # # ## # # # # # # # # # # # ## # # # # # # ## # # # # # # # # # ## # ## ## # # # # ## # # ## # # # # ## ## # # # # # ## # # # ## ### ## # # ## # # ## ## # ## ## # ## ## # ## # # # # # # ### # # # ## ## ## ## # # # # ## # # ## # # # ## ## # # # # # # # # # # # # # # # # # # # # # # # # ## # # # # # # ## # #

#

#

#

#

# # #

#

#

#

#

#

#

#

#

##

#

#

# # #

# #

#

#

#

#

#

#

#

# #

#

#

# #

# # #

#

## # #

#

#

Kildekrog

#

#

# #

#

#

# #

## # #

#

# ##

## #

# # #

#

#

# #

# #

# #

#

# # # # # # # #

#

#

#

#

# #

# # #

##

# #

#

## #

#

#

##

#

# # #

#

#

#

#

#

#

# ##

# # #

#

# #

#

# #

# #

##

#

# #

#

##

#

#

#

# # # #

#

#

#

#

# # #

# # #

#

#

#

#

#

## # # # # # ## #

#

#

##

#

# #

#

#

#

#

#

#

# #

#

#

#

#

#

#

# #

#

#

##

#

#

#

#

#

# #

#

62

# #

#

#

# # #

#

#

#

#

# # #

#

#

#

#

#

Type of Area Town centre Market garden Heath/Moor High buildings Industrial area Graveyard/cemeteries Low buildings Recreational area Sand/dune Forest/Wood Sports centre Technical area Wetland Open land

# ## # # # ## # #

# #

#

#

# # #

#

#

#

Ųmose

#

# # # ## # #

#

#

#

#

# ## #

#

# # ## ##

#

#

#

#

#

#

#

#

Delineation of buildt-up areas

#

# # # # #

# #

#

#

#

#

# #

# # # # # # # # # # # # # # # # # # #

# # # ## # # # # # # ## # # ## # # # # # # #

#

#

#

# #

#

#

# # # # # # # # # # # # # # # # # # # #

# #

#

Gurre

#

#

#

##

# #

#

#

#

#

# # ## # # #

#

#

#

#

#

Villingebę k

#

# #

# # ##

# #

# #

#

#

#

# #

##

# #

#

#

#

#

#

#

#

# #

# #

# #

##

# #

#

#

Dronningmųlle #

#

#

#

# # #

#

#

#

#

#

##

# #

# #

#

# #

#

#

#

#

Ny Horserųd

x

#

# # #

#

# #

Kohave

# #

#

#

#

Inhabited addresses on January 1 Municipality-border Built-up areas Residence all the year round Holiday cottage area Non built-up areas

#

#

# #

#

# # #

#

#

#

# #

#

# #

# #

# #

#

# #

#

#

#

# #

# #

#

#

#

#

#

#

#

#

#

# #

#

#

# #

#

# #

#

#

#

#

#

#

## #

# # #

#

#

# #

#

# # #

# # #

#

# #

#

##

# #

#

#

## # #

# #

# #

#

#

# #

#

#

#

# #

# #

## #

# #

#

#

#

#

# #

# #

# #

# #

#

#

#

###

#

# ##

#

#

# ## # #

#

#

© Kort- og Matrikelstyrelsen, Krak og ZAP #

# #

#

#


www.efgs.info

E– Proceedings of European Forum for Geostatistics Conference 5 - 7 October, 2010 Tallinn, Estonia

addresses on January the 1 a given year. In the Danish system these unit-addresses consist of a 17-digit address-code, where we find information about the location of the address concerning municipality, road, house number, floor and side. Subsequently the addresses are cut down to an 11digit access-address (witch now only contains information on municipality, road and house number). By this process we get the number of persons at each accessaddress summed together. Finally geographical x, y-coordinates are retrieved from the Danish Public Information Server. These coordinates are joined to the access-addresses in order to place them on the map and in relation to the built-up polygons of the National Map and Cadastre. Nearly all addresses can be placed Location of the addresses by means of geographical coordinates is a very effective method; thus 99.95% of all inhabited addresses can be retrieved by coordinates from the Danish Public Information Server (OIS) and placed on the map. However there are some addresses that can’t be set - although they are few! Some of these (0.03%) can be placed after a little adjustment of the addresses (that will typically by removing a house number letter).

addresses are connected to either a built-up area or a rural area. Then an enumeration is made in order to count how many inhabitants there are living in each of the built-up areas. This is done by counting the number of people, who belongs to the addresses, which are placed inside each of the built-up areas. If the number of residents in a built-up area reaches at least 200 people, then this area will get the status of being an urban area. If it is below 200 inhabitants, then it will be given the status of a village and is hereby adopted into the unifying category, which is designated as the rural areas of the municipality. Finally Statistics Denmark carries out some amalgamations of the built-up areas. This is typically done, where two or more built-up areas are nestled close up to each other. In such cases the built-up areas will be merged into one single area and the population will be added up (look at Figure 2 for an example). Other statistic on urban and rural areas The census on rural and urban areas is the official census concerning such areas. Statistics Denmark is however also making other kinds of statistic on rural and urban areas (e.g. a register of workplace addresses and a register of holiday cottage addresses on rural and urban areas). These are however all for service only.

For the case of the remainder it is nearly quite impossible to find x, y coordinates. In some cases it is possible to place them in either a town or a rural district thanks to other information, which can be associated with the address (e.g. facts about affiliation to parish, postal code and the like). Finally, there is a residual group of 0.17 ‰ of all residential addresses, as we are not able to locate. These addresses are often fictive made addresses, which have been create with the purpose to place e.g. homeless people or people, who regularly work abroad in another country. If it is a fully built-up municipality such as the municipality of Copenhagen, we can place these fictive addresses in the city. However, in practice this is only possible in the metropolitan area. Elsewhere, we have to put these people in the category “non-place able". At 1 January 2010, there were a total of 13,881 persons with fictive address (of which 5,790 can be placed in the metropolitan area). This gives a remaining total of 8,091 people = 1.46 ‰ of the population, who can not be located in a rural or urban area.

Procedure for making a census on urban and rural areas Once the join between addresses and built-up polygons is done, a very large proportion of the residential

INSPIRE and the process of the development of Data Specifications Udo Maack, KOSIS-Verbund, Member of the TWG SU-PD

Introduction This paper summarises a presentation held on the European Forum for Geostatistics (EFGS) 2010 in Tallinn, Estonia. As a member of an INSPIRE Working group I would like to report about the INSPIRE directive with a focus on technical issues. I will talk about the aims, the technical components, the development process and expected results. The presentation covers important issues related to the work of statisticians, describes contributions expected from statisticians and points out possibilities to test and influence the INSPIRE development.

63


E– Proceedings of European Forum for Geostatistics Conference 5 - 7 October, 2010 Tallinn, Estonia Aims What is INSPIRE, the abbreviation for INfrastructure for SPatial InfoRmation in the European Community? It is a directive of the EU (2007/2/EC) which aims at harmonisation of spatial data across Europe. It entered into force on the 15th May 2007. All member states are obliged to implement the directive into national law and to provide data according to particular implementation rules. The development of these rules is part of the implementation of the directive and is currently ongoing. Such rules are necessary to manage a harmonised Geo Data IT-Infrastructure. They apply spatial data as listed in “Annex I” to “Annex III” of the directive (see Figure 3). Out of 34 data themes defined, some are related exclusively to the environment, but the majority are of general significance for applications in administration, economy and public. The driving force of INSPIRE is the political goal to develop a modern public governance. More specifically the INSPIRE aims are: • The access to and the use of Geodata has to be simplified and harmonised for the benefit of the administrations, the economy and the public. • According to the PSI-Directive (2003/98/EG) the potential of adding value of the Geodata has to be activated by the administration. • Transparency, participation, awareness of the environment has to be pushed by the administration according to the EI-Directive (Environmental Information Directive - 2003/4/EG).

To ensure that the spatial data infrastructures of the Member States are compatible and useful in a community and transboundary context, the Directive requires that common Implementing Rules (IR) are adopted in a number of specific areas (Metadata, Network Services, Data Specifications, Data and Service Sharing and Monitoring and Reporting). To develop the IR’s, drafting teams are established for each of this areas. Each technical focused team is in charge of the components according to Figure 2. Themes The drafting team data specification (DT-DS) deals with 34 very specific themes and is supported by according Thematic Working Groups (TWGs). Because some themes are closely related 28 TWG’s are working on the specifications (see Figure 3). Data Specification The objectives of the data specification are taken from the directive. We find the following guidelines: In Article 6 4. Implementing rules referred to in paragraph 1 shall cover the definition and classification of spatial objects relevant to spatial data sets related to themes listed in Annex I, II and III and the way those spatial data are geo -referenced.

Figure 1: Technical components of INSPIRE.

Last but not least, INSPIRE should curtail the reporting obligations for the benefit of the administration. Components There are three aspects to the implementation approach: the legal basis, the organisational framework and technical components. I will focus my contribution on the technical components, (see: Figure 1). The user will have access to Geo Data via Geo Portals or via applications, using standardized Web services. The first service is the catalogue service where a user applies for data and receives a Uniform Resource Locator (URL), an internet address where the data can be found. At this address he can ask for a map preview, using a view service. If he accepts, the data can be accessed using a download service. An additional service may be used to transform the coordinates into the Coordinate reference System needed. All these services follow the OGC / ISO standards. Other services - like accounting - may be used, but these are not part of the INSPIRE standardization. You can find more detailed information in the INSPIRE Architecture diagram (see Figure 2). 64

www.efgs.info

Figure 2: The INSPIRE Architecture.


www.efgs.info

E– Proceedings of European Forum for Geostatistics Conference 5 - 7 October, 2010 Tallinn, Estonia

Figure 3: The INSPIRE Themes and corresponding Thematic Working Groups

The Data Specifications shall not NOT define and in Article 8 2. The implementing rules shall address the following aspects of spatial data: (a) a common framework for the unique identification of spatial objects, to which identifiers under national systems can be mapped in order to ensure interoperability between them;

collection criteria,

scale and resolution,

minimum quality requirements

The reason for this is that INSPIRE aims at interoperability of data as it exist, and must not mandate collection of new data.

(b) the relationship between spatial objects; (c) the key attributes and corresponding multilingual thesauri commonly required for policies which may have an impact on the environment; (d) information on the temporal dimension of the data; (e) updates of the data; The Data Specifications shall contain as a result of the harmonised definition •

an application scheme in UML;

spatial object types, attributes, attribute values

and relationships between spatial objects, •



Definition Process The data specification procedure follows a step-wise approach as shown in figure 4. The first step is to find and describe use cases which serve as a source for requirements. To provide these cases is under the responsibility of the Consolidation Team (CT). Major sources are: • European environmental policies, e.g. the Water framework-Directive, monitoring and reporting obligations

rules for unique identifiers for spatial objects,

derived from national identifiers •

Hopefully the data specification will influence the further development of the underlying systems to enhance the harmony. To support a harmonised specification development across all themes, the DS-DT established some framework documents.

theme specific metadata (including data quality) simple portrayal rules for the INSPIRE Viewing Service

• user requirements survey of members of SDIC / LMO list • reference material provided by interested institutions from the SDIC/LMO list. • EU-funded initiatives and projects, like the ESSnet projects. 65


E– Proceedings of European Forum for Geostatistics Conference 5 - 7 October, 2010 Tallinn, Estonia Figure 4: The INSPIRE data specification development process

www.efgs.info

this step is the version 3.0 of the data specification according to ISO 19131. The production of the Version 3.0 by the TWG is not the final step. This document will be split into the regulation and guidance document by the commission service. The regulation contains the binding part (legal text of the IR) and includes the spatial object types and associated data types, enumerations and code lists, the identifier management, encoding, metadata elements and rules for updates and portrayal. The guidance document - which is not binding – contains the full data specification document.

In step 2 the TWG has to identify the requirements on the data content, metadata, data quality, portrayal and the other elements of the data specification from the use cases. The results are a list of spatial object types and a first draft of an application schema. In step 3 the current situation has to be analysed regarding spatial data sets for the theme, based on the reference material submitted by SDICs + LMOs, existing internationally standardized data specifications and the expertise of TWG members (As-is Analysis). In step 4 follows a comparison between the identified data sources in the member states with the identified user requirements described in the draft application schema (Gap-Analysis). In step 5 a first version of the specification document has to be elaborated based on the results of step 3 and 4. The specifications must be designed to ensure easy mapping between existing data and the harmonized data specification. The recital 16 which says “No excessive costs to Member States” and Article 4(2) “No collection of new data” have to be considered. The result of this step is a Version 1.0 of the data specification for each theme. Next round (step 6) is the integration of the first comments of an internal review and the refinement of the data product specification with a refined application schema. This version 2.0 will be published for review by SDIC’s and LMO’s.

66

In the step 7 a first implementation, test and validation is performed by interested SDIC’s and LMO’s, as well as cost benefit analysis are performed under the responsibility of the CT. Later the TWG has to consider the comments from review and the results of a final round of harmonization between themes. The result of

The IR’s are processed again within EU institutions. First, the draft Version IR V1 is reviewed by the Drafting Teams and the relevant TWG. The result is included into Version 2. After that, the representatives of the member states are asked to deliver an opinion. This feedback will be included into an IR V3, which is reviewed by the DG’s of the European Commission by the so called “Inter Service Consultation”. The comments from this consultation will be integrated a form a Version 4 of the IR’s. This mutually agreed version is presented to the INSPIRE Committee, the highest arbitration authority in the INSPIRE process. An overview of this comprehensive process and the timeframe given is shown in the Figure 7. The version IR V4 now is translated into 7 languages to be submitted to the European Parliament for final approval. The TWG Statistical Units–Population Distribution / Demography The development of the data specifications for the themes “Statistical Units” and “Population Distribution / Demography” is given into the hands of a multinational group of experts ( see Table 1 ) proposed by SDIC’s and LMO’s. Nine of the eleven members are active to create the data specification documents. As guidance of the TWG work act several norms and standards and INSPIRE framework documents like: ISO Norms and OGC Consortium Standards. Framework documents as guidance for the TWG’s are:

D2.3 Scope of Themes

D2.5 Generic Conceptual Model •

• themes

rules for Application Schema elements that are common to several


www.efgs.info

E– Proceedings of European Forum for Geostatistics Conference 5 - 7 October, 2010 Tallinn, Estonia

Figure 5: Detailed application schema in UML class ov erv iew «codeList» Administrativ eHierarchyLev el «enum eration» TechnicalStatusType edge-m atched = 1 not-edge-m atched = 2

«enum eration» LegalStatusType

«enum » + 1st order (country level) = 1 + 2nd order = 2 + 3rd order = 3 + 4th order = 4 + 5th order = 5 + 6th order = 6

agreed = 1 not-agreed = 2

+coAdm i nister «voidabl e» 0..*

+adm inisteredBy «voidable» 0..* «featureT ype» AdministrativeBoundary + + + +

geom etry: GM _Curve inspireId: Identifier country: CountryCode nationalLevel: Adm inistrativeHierarchyLevel

«voidable, lifeCycleInfo» + beginLifespanVersion: DateT i m e + endLifespanVersion: DateT im e [0..1]

«featureT ype» Administrativ eUnit +boundary «voi dable»

1..*

+ + +adm Uni t + «voidabl e» + + 1..* +

nationalCode: CharacterStri ng i nspi reId: Identifier nationalLevel: Adm inistrativeHierarchyLevel country: CountryCode nam e: GeographicalNam e [1..*] geom etry: GM _M ultiSurface

+upperLevelUnit «voidable» 0..1

«voi dable» + nationalLevelNam e: CharacterString [1..*] + residenceOfAuthority: Nam edPlace [0..*]

«voidable» + legalStatus: LegalStatusT ype = agreed + techni calStatus: T echnicalStatusT ype = edge-m atched + stabili ty: bool ean

«voi dable, l ifeCycleInfo» + begi nLifespanVersion: DateT im e + endLifespanVersion: DateT i m e [0..1]

+lowerLevelUnit «voidable» 1..*

0..*

+adm inister «voidabl e» the area of Condom i nium cannot be part of geom etry representing spatial extent of Adm inistrativeUnit.

0..*

+NUT S «voidable»

«featureT ype» Condominium + +

geom etry: GM _M ultiSurface i nspi reId: Identifier

«voi dable» + nam e: Geographical Nam e [0..*]

1..3

«placehol der,featureT ype» NUTSRegion + + +

geom etry: GM _M ultiSurface inspireId: Identifier NUT SCode: CharacterStri ng

Figure 6: INSPIRE data specification road map until finalization of version 3.0

67


E– Proceedings of European Forum for Geostatistics Conference 5 - 7 October, 2010 Tallinn, Estonia

www.efgs.info

Figure 7: INSPIRE data specification road map phase 4

D2.6 Methodology for the development of data specifications •

D2.7 Encoding

GML

D2.8 Specification of the data themes of Annex I

Your contribution Some hints for your possibilities of institutional / personal contributions. First, be aware that you are affected by INSPIRE. The implementation rules bind all NSO’s how to provide data, first for European institutions, but over time the specifications will become a general framework (the normative power of facts). Second, get and keep contact with your national institutions which are responsible to establish the national spatial data infrastructure. Try to influence the development for your and an overall benefit. Third, register as an SDIC / LMO on the INSPIRE Web-sites to get a first hand information about the European perspective and use the possibility to comment the specifications. Finally, be prepared to participate on the commenting phase in spring / summer 2012. This is the moment where you can directly influence the specification. 68

Conclusion The INSPIRE process can be compared with the situation of a big container vessel entering a harbour. A cumbersome process with many elements comparable as you can see in figure 8. Due to the complexity and extension of the themes SU and PD a big challenge exist, to get an useful harmonized result .The group as well as the individual member need support from outside, e.g. from projects like GEOSTAT and Census 2011 or comparable national approaches. Please, do not hesitate to contact your national representative in the TWG or the group inspire-twg-supd@jrc.ec.europa.eu. Acknowledgements I wish to thank for some contributions from presentations of Hartmut Streuff (Federal Ministry of Evironment, DE), Andreas Illert (Member of the Drafting Team Data Specification) and Roland Mordhorst (SDI Hamburg, DE).


www.efgs.info

E– Proceedings of European Forum for Geostatistics Conference 5 - 7 October, 2010 Tallinn, Estonia

Figure 8: The cumbersome INSPIRE process

Table 1: Expert group of the TWG Statistical Units–Population Distribution / Demography

Surname

Name

Ctry

Organisation

1

Bresters

Pieter Wrister

NL

Statistics Netherlands (CBS), LMO

2

Coady

Ian

UK

Office for National Statistics (ONS), LMO

3

Gaffuri

Jullian

4

Haldorson

Marie

SW

Statistics Sweden, LMO

5

Lipatz

Jean-Luc

FR

INSEE, LMO

6

Kmiecik

Alina

PL

Intergraph Poland Sp. z o.o. , SDIC

7

Maack

Udo

DE

KOSIS-Verbund, SDIC

8

Schnorr-Bäcker

Susanne

DE

Federal Statistical Office (destatis), LMO

9

Migacz

Miroslaw

PL

Central Statistical Office (CSO), LMO

10

Placeholder

EuroStat,

11

Bianchini

Roberto

IT

Interuniversity Research Centre for Sustainable Development of Sapienza University of Rome SDIC

European Commission, JRC

69