Issuu on Google+

Journal of Geographic Information and Decision Analysis, vol. 4, no. 2, pp. 37-51

Spatial Patterns of Demand for Education: A Case Study Ronald N. Buliung School of Geography and Geology, McMaster University, Hamilton, Ontario, L8S 4K, Canada buliungr@mcmaster.ca

Patrick F. De Luca School of Geography and Geology, McMaster University, Hamilton, Ontario, L8S 4K, Canada delucapf@mcmaster.ca

Contents 1. Introduction 2. Student Records 3. Exploring Spatial Clustering 3.1 First Order Effects 3.2 Second Order Effects 4. Controlling for Spatial Variation in the Underlying Population 5. Conclusions and Research Directions References

Recognizing that numerous challenges currently face Canadian post-secondary institutions the Office of the Registrar at McMaster University expressed interest in conducting a study to examine student records data. McMaster University is located in the City of Hamilton, Ontario, Canada and is an example of a medium sized Canadian university with an average undergraduate student population of approximately 10,000 students. Dialogue with University administrators charged with managing enrollment revealed considerable concern over the potential impacts of the double-cohort phenomenon. The double-cohort refers to the Province's program to reduce secondary education from five to four years by 2003 and subsequent increases in demand for post-secondary education. Discussions helped provide direction for this research that came to focus on a geographical analysis of the demand for full and part-time education at the university. Event distributions for both full and part-time students were compiled from the student records database. These event distributions have been analysed, using spatial statistical techniques, with the purpose of geographically delimiting the field of demand for education at McMaster. The results show a local focus of demand with spatial clustering. In practical terms, the research serves as an impetus for focused planning that addresses the synergy between the university and secondary institutions across the region. KEYWORDS: spatial point patterns, demand for university education, double-cohort. ABSTRACT


R. N. Buliung and P. F. De Luca

1. Introduction Post-secondary institutions in Canada and the U.S. currently face fiscal challenges and increasingly compete with one another for enrollment (Advisory Panel, 1996; Marble et. al, 1997). Universities in the province of Ontario and to a certain extent across Canada could also face new challenges due to the double-cohort phenomenon. The Ontario Ministry of Education and Training intends to reduce the secondary program from five to four years. This means that by the year 2003 Ontario high schools will graduate the last group of students of the five-year program alongside the first group of students from the new four year program. Implied by this is an expectation of increased demand for enrolment in undergraduate programs in Ontario universities. Universities are beginning to prepare for a potential increase in demand for education. In the fall of 1998 the Office of the Registrar at McMaster University expressed interest in conducting a study to examine student records data. McMaster University is located in the City of Hamilton, Ontario, Canada (Figure 1). The institution is an example of a medium size Canadian post-secondary institution with approximately 10,000 undergraduate students registered on an annual basis. Through the use of geographical information systems (GIS) and spatial statistical techniques the process of university selection, as it pertains to McMaster University, is explored in this paper. Discussions with university administrators responsible for enrollment issues helped focus the research on questions concerning the geographical extent and focus of demand for enrolment. Evidence of the presence of demand is revealed through the study of existing spatial distributions of students registered during the 1998 fiscal year.

Applications of GIS within university settings have included the analysis of student recruitment operations and the admissions process (see for example Marble et. al, 1997; Marble et. al, 1990) and space-use or facilities management (Gondeck-Becker, 1999). In contrast to previous studies descriptive and exploratory techniques for point-

2


Spatial Patterns of Demand for Education: A Case Study

pattern analysis are used here to study student records data in an effort to determine the geographical scope of a university's demand surface. Specific techniques are applied to describe the spatial distribution of registered full and part-time students. The remainder of the paper is organized as follows. Data sources are introduced and the general spatial distribution of the student records is explored. This is followed by an examination of spatial clustering for both full and part-time students in Ontario and a sub-region of the province known as the Greater Toronto Area (GTA). Building on this, an attempt is made to assess the demand for education while simultaneously controlling for spatial variation in the underlying population. Finally, a general discussion of the results is provided along with directions for future research.

2. Student Records The Office of the Registrar extracted student records from their database for the 1998 fiscal year. The fiscal year includes the summer (May - August), fall (September December) and winter (January - April) semesters. An individual is recorded in the student records database if he/she is a registered student at McMaster. The total number of student records extracted is 18,266. This dataset includes all full and part-time students, graduate students, foreign students and Canadian students for all faculties. Students in continuing education programs have been excluded from this analysis. Full and part-time students have been extracted from the dataset for separate analysis. The rationale behind this is that full and part-time students have different fee structures and administrative units managing them. There is an initial expectation that spatial patterns of demand will be different for these two groups due to the presence of potentially disparate individual criteria used in determining where to pursue an education. The full-time student subset contains 13,472 records and includes all graduate, foreign and Canadian students. There are 4,794 part-time student records and these also contain all graduate, foreign and Canadian students. The student records database contains numerous fields including gender, marital status, faculty of study, status (full or part-time), citizenship and the postal code of permanent mailing address. The process of university selection has lead to the creation of student events. In this case an event is the occurrence of a full or part-time student that selected McMaster University. To facilitate an examination of spatial clustering it is necessary to transform standard addressing to geographical coordinates that will approximate the absolute location of student events. Student events have been geocoded to the postal code level using the postal code field from the student records database and matching this with a corresponding single link indicator record in Statistics Canada's Postal Code Conversion File. This file tags a postal code with the geographic coordinate pair (stored in latitude and longitude) of the centroid location of the most representative enumeration area. An enumeration area is the smallest zone of census geography and represents the area canvassed by an individual census enumerator (Statistics Canada, 1997). Table 1 summarizes the results of the geocoding process. The reasons for data loss include, the presence of foreign students in the database, missing postal records in the student records database and non-existent or improperly entered postal codes in the student records database.

3


R. N. Buliung and P. F. De Luca

Enrollment Status Full-time Part-time

Table 1: Summary Statistics for Geocoding Total Records Geocoded Records 13,472 12,969 4,794 4,618

Success Rate(%) 96.3 96.3

Student events have been plotted in Figure 2 below. The province of Ontario contains 12,670 (97.7%) of the geocoded full-time students and 4,525 (98%) of the parttime. The process used to generate the map involved a point-in-polygon operation that resulted in some data loss. That is why the maximum values in the classifications do not equal the values for Ontario stated above. Upon examination of the data at the national level it was decided that the focus of the analysis should be the student population of Ontario. Closer examination of the data reveals heavy concentrations of full and part-time students within the GTA. There are 9,507 (73.3% of the national total) full-time students within the boundary of the GTA and 3,638 part-time students (78.8% of the national total). The GTA includes the City of Toronto and the Regional Municipalities of Durham, York, Peel, Halton and the City of Hamilton (Figure 1) (Data Management Group, 1994). Hamilton is included because it is the host city of McMaster University.

3. Exploring Spatial Clustering The extent to which student events are clustered or regularly distributed in space will be examined at two spatial scales. Firstly, the province of Ontario and secondly, the GTA. The rationale for this immediately becomes clear when the data are examined. The heaviest concentration of students is focussed in the GTA. The analysis will be divided 4


Spatial Patterns of Demand for Education: A Case Study

into two sections. Initially, the presence of first order effects of the respective event distributions will be examined. The first order effects concern global or area wide variation in the expected number of events per unit area (Bailey and Gatrell, 1995). Local maxima of the event distributions can be readily identified through the study of first order effects. In practical terms, this means that peaks in demand for education at McMaster can be identified at regional and provincial scales. This will be followed by a direct examination of second order effects. These concern the underlying covariance structure of the event distributions and directly test for spatial dependence in the process under study (Bailey and Gatrell, 1995). That is, the hypothesis that the process of selecting a university reveals itself in a specific spatial pattern of student events is explored. 3.1 First Order Effects Initial exploratory analysis includes a characterization of the spatial distribution of student events in Ontario and the GTA using dot maps. The results for full and part-time

students in Ontario can be seen in Figure 3. All spatial data presented for Ontario has been projected using an equidistant conic projection centred on the region of greatest concentration of student events. This helps to ensure that measures of relative distance used in the analysis of spatial clustering are as accurate as possible. Qualitatively, it appears as though there are clusters of both full and part-time students in the southeastern portion of the province or the region previously defined as the GTA. This is not necessarily the case as the geocoding process used here results in the allocation of many student events to the same absolute location. An additional exploratory technique is required to control for this problem.

5


R. N. Buliung and P. F. De Luca

Kernel estimation is used to further explore the first order properties of the distribution of student events. The kernel estimate provides a global measure of the intensity of a spatial point pattern (Bailey and Gatrell, 1995). Intensity can be thought of as a summary estimate of the number of events per unit area across a plane. The general functional form of this intensity, is given by:

where si is the distance of an event, i, from an estimation point, s, and Ď„ is the bandwidth. The value of Ď„ is defined by the analyst and is generally dependent upon the scale of analysis. An adapted form is used here to control for edge effects to ensure that underestimation of intensity does not occur at the boundary of the study area. The kernel estimate for both full and part-time students in Ontario is given in Figure 4. Experimentation with bandwidth and resolution failed to yield better results. There is definite spatial variability in the intensity of events within the province. Intensity for both distributions peaks in the GTA. The region of greatest intensity in the full-time student case stretches toward the City of Toronto whereas the part-time student distribution remains focused in Hamilton. It is clear that the large number of student events in the

GTA overwhelms the rest of the distribution. Changing the scale of analysis to the GTA yields more detailed information in terms of the first order effects. The geocoded full and part-time students are plotted in Figure 5. A quick look at this plot might yield the conclusion that most full-time students originate in the City of Toronto and the immediately adjacent regional municipalities. Kernel estimation, with edge correction, indicates that this is not the case (Figure 6). The intensity of the distribution of full-time

6


Spatial Patterns of Demand for Education: A Case Study

student events is greatest in Hamilton. There appears to be two local peaks in intensity within Hamilton for both full and part-time students. Initial examination of the dot map for part-time students might lead to the conclusion that the density of student events is greatest in Hamilton. This is confirmed through the kernel estimate for part-time students. A visual comparison of the two plots suggests that intensity is much greater for full-time students in the area surrounding the City of Toronto than is the case for part-time students. In practical terms these results suggest that the demand for education at McMaster has a local focus. Efforts designed to mitigate potential double cohort effects should be focused locally.

3.2 Second Order Effects In this section the spatial dependence between student events will be analysed using a variety of exploratory techniques. If no second order effects are present then student events will exhibit no spatial dependence. That is, each student event would have an equal likelihood or probability of occurrence within the study area. One way to assess the potential existence of second order effects is to examine a cumulative probability distribution function of nearest neighbour distances (Bailey and Gatrell, 1995). There are two types considered here, event-event and point-event. Event to event distances in this study are the shortest Euclidean distances between full and parttime student locations. Points to event distances are the shortest Euclidean distances between any arbitrary point on a plane (two-dimensional plane bounded by the study area polygons) and a full or part-time student event. The cumulative probability distribution function of event-event nearest neighbour distances is given by:

where w is the event-event nearest neighbour distance, # means “the number of�, and n is the total number of events in the study area (Bailey and Gatrell, 1995). The cumulative probability distribution function of point-event nearest neighbour distances is given by:

7


R. N. Buliung and P. F. De Luca

where x is the point-event nearest neighbour distance, # means “the number of�, and m is the total number of arbitrary points visited in the plane (Bailey and Gatrell, 1995). Separate estimates of

and

have been derived for full and part-time students in

Ontario (Figure 7) and for the GTA (Figure 8). In all estimates of the distribution functions climb steeply indicating that much of the distribution is comprised of short event-event nearest neighbour distances. This suggests the presence of spatial clustering. As for there is an observed high probability of long point-event distances which also suggests clustering in the event distributions for all full and part-time students in Ontario and the GTA.

Describing the event distributions using and provides a useful summary of the distribution of nearest-neighbour distances. This in turn provides a fairly subjective means of assessing the extent to which clustering exists in event distributions. A more rigorous test for clustering can be conducted by comparing estimates of and with similar estimates for a completely spatially random process (CSR) (Bailey and Gatrell, 1995). Under a CSR process each event is considered independent with an equal probability of occurrence. The results for full and part-time students in the GTA are summarized in Figure 9. Modeling the event distributions of full and part-time students in the GTA against complete spatial randomness provides evidence against randomness and suggests clustering or spatial dependence. The hypothesis that the spatial distribution of full and part-time student events is random can be rejected. If the event distributions conform to spatial randomness then the plot of

against the CSR simulation would

8


Spatial Patterns of Demand for Education: A Case Study

have been roughly a 45 degree line. It is deviations from 45 degrees that suggest clustering. The upper and lower simulation envelopes are useful in visually assessing the extent to which deviation from 45 degrees occurs (Bailey and Gatrell, 1995). While not reported here, the same results were found for the Ontario wide distributions.

The analysis of second order effects presented above provides evidence supporting that the clustering observed in the dot maps of full and part-time students for both Ontario and the GTA is a true characteristic of the corresponding event distributions. The study of nearest-neighbour distances intrinsically treats spatial dependence at small scales (Bailey and Gatrell, 1995). A less scale dependent assessment of clustering can be derived through a transformation of the K-function (Bailey and Gatrell, 1995). The transformed Kfunction is represented here as L(h) and the functional form is given by:

where k(h) is the K-function and h is radial distance from an event. The computational form of the function applied here is as follows:

9


R. N. Buliung and P. F. De Luca

where all terms are similarly defined. In Figure 10 the estimate of L(h) is plotted against h. Under randomness the plot of L(h) against h will approximate a 45 degree line. A 45 degree line has been placed on the plot as a point of reference. In estimating L(h) a maximum value for h of 10,000 meters was selected. This was done to partially control for edge effects and the existence of larger scale first order variation in intensity. The plots for full and part-time students in the GTA suggest clustering as the estimated L(h) deviates from 45 degrees and lies above the line. The degree of clustering increases as h approaches 6,000 metres and then begins to decrease toward the maximum distance of the estimation. This result suggests that student events are generally clustered within a 60 km radius of the university site. Further evidence for or against complete spatial randomness is acquired by plotting the estimate of L(h) against the upper and lower simulation envelopes of estimates of L(h) derived for 100 simulations of a set of points conforming to complete spatial randomness. Figure 11 summarizes the results. The estimate of L(h) for the full and part-time student event distributions is well above the upper simulation envelope. This provides enough evidence to reject the hypothesis of complete spatial randomness.

4. Controlling for Spatial Variation in the Underlying Population Prior to exploring first order effects there was an initial expectation that the intensity of the student event distributions would vary with underlying population density. Much of the population of Ontario is located in the southern part of the province. The kernel estimates presented earlier indicate local peaks in intensity

10


Spatial Patterns of Demand for Education: A Case Study

for full and part-time students in this region. In this section of the paper a ratio of kernel estimates is derived for full and part-time students in the province of Ontario in an attempt to correct for spatial variation in population density.

Controlling for population density required the derivation of an event distribution considered representative of the spatial variation of population in the province. The enumeration area (EA)is Statistics Canada's highest resolution of census geography. In urban areas an EA can contain up to 440 households whereas in the rural case 125 households (Statistics Canada, 1997). There are generally many more EAs in urban as opposed to rural areas and these urban EAs tend to cover much smaller areas. This means that the density of EA centroids will be much greater in urban areas where population density is high than in rural areas where the converse is true. Figure 12 displays the spatial distribution of EA centroids in Ontario. The southern portion of the province is characterised by the occurrence of many EA centroids. This changes moving northward to the less populated regions of the province. The EA centroid is an event that occurs as a result of the underlying distribution of households within Canada and in this instance Ontario. The EA centroid therefore serves as a reasonable proxy for real population values in the context of kernel estimation. The ratio of kernels,

, is given by:

11


R. N. Buliung and P. F. De Luca

where si is the distance of a student event from estimation point, s, and sj is the distance of an EA centroid from estimation point s and Ď„ is the bandwidth. Controlling for population density in this manner should elevate the intensity of regions characterised by relatively low estimates of population density and decrease the intensity in regions with relatively high estimates of population density. Estimates of for both full and parttime students have been plotted in Figure 13. The northern regions of the province, characterised under ordinary kernel estimation by flat intensity, now show elevated intensity in certain areas. Local peaks remain within the part of the province known as the GTA. A visual comparison of the estimates for full and part-time students indicates greater variability in intensity for the full-time student events. The ratio of kernels represents a more thorough means of addressing first order effects in that the local peaks in intensity highlighted through standard kernel estimation have been confirmed when variation in population density is controlled.

5. Conclusions and Research Directions In this paper a variety of spatial statistical methods have been applied to derive a better understanding of the spatial distributions of full and part-time student events in Ontario and the GTA. The analysis of first order effects highlighted peaks in the distribution of full and part-time students in the southeastern portion of the province. Shifting the analysis to the GTA revealed local peaks in event distributions within Hamilton. These local effects remained when variation in population density was controlled using a ratiobased estimate of intensity. 12


Spatial Patterns of Demand for Education: A Case Study

The exploratory analysis of spatial dependence suggests clustering of full and part-time students at both the provincial scale and in the GTA. Modeling against a completely spatially random process provided evidence against spatial randomness and supporting evidence for clustering in the event distributions. Clustering occurred within a 60 km radius of the institution. This research needs to be expanded to an explanatory analysis of the factors that give rise to the observed student event distributions. In other words, the existence of local peaks in the intensity of the event distributions and the tendency of student events to be spatially clustered needs to be explained. In practical terms, the analysis presented here indicates, in a general sense, the area from which McMaster students are drawn. In the context of this institution the process of university selection has resulted in locally focussed demand and clustering within 60 km of the university site. Clustering in the event distributions suggests that certain factors make McMaster an attractive option for potential students located within 60 km of the institution. This work can now be extended to an analysis of potential factors that explain the observed event distributions. Another avenue worth pursuing is the question of demand for specialized programs (for example graduate versus undergraduate, specific disciplines etc.). The nature of the event distributions for these programs could be considerably different from those investigated in this study. Here, an assumption of homogeneity in the student population has been applied. It is expected that a relaxation of this assumption will result in very different event distributions. Finally, time has been treated discretely in this study. That is, data for the 1998 fiscal year have been used. It would be interesting to study whether or not the conclusions reached hold over an expanded temporal horizon. In the context of university planning this study has served to delimit the geographical extent and focus of demand for education at McMaster University. The results of the study suggest that the potential impacts of the double-cohort effect will be largely derived from significant increases in locally focussed demand for student positions. University administrators could help offset potential effects related to the double-cohort by engaging in constructive dialogue with secondary institutions throughout the region with a focus toward developing a reasonable course of action.

References Advisory Panel (1996) Excellence Accessibility Responsibility: Report of the Advisory Panel on Future Directions for Postsecondary Education, Ministry of Education and Training, Government of Ontario, Toronto. Bailey, T. C. and A. C. Gatrell (1995) Interactive Spatial Data Analysis, Longman Group Limited, Essex, England. Data Management Group (1994) 1991 & 1986 Travel Survey Summary for the Greater Toronto Area, Joint Program in Transportation, University of Toronto, Toronto, Ontario. Gondeck-Becker, D. (1999) Implementing an Enterprise-Wide Space Management System - A Case Study at the University of Minnesota, Proceedings of the 1999 ESRI Users Conference. Marble, D. F., V. J. Mora and M. Granados (1997) Applying GIS Technology and Geodemographics to College and University Admissions Planning: Some Results from the Ohio State University, Proceedings of the 1997 ESRI Users Conference.

13


R. N. Buliung and P. F. De Luca

Marble, D. F., V. J. Mora and J. P. Herries (1995) Applying GIS Technology to the Freshman Admissions Process at a Large University, Proceedings of the 1995 ESRI Users Conference. Statistics Canada (1997) 1996 Census Dictionary, Industry Canada, 1996 Census of Canada. JGIDA vol. 4, no. 2 JGIDA Home

14


Spatial Patterns of Demand for Education