Improving the availability of roof shape information in OpenStreetMap
Chair of Architectural Informatics
Prof. Dr.-Ing. Frank Petzold
Algorithmic Design
Ivan Bratoev, Frank Petzold
Wolfgang Hobmaier - 03653653
Aaron Farrell - 03764082
TopicThe goal of our project is to help to solve a problem which is present in much of modern mapping at the moment. Much of modern mapping is still solely focused on two dimensions with three dimensional aspects of visualisation being neglected. Having a 3 dimensional marker such as the shape of a roof can help to very quickly locate a place or orient yourself in an area. This data is very limited at present, however it is something that is tagged on OSM (OpenStreetMap[1]) . We wish to help improve this situation by using image classification to determine the roof shapes of buildings. This prediction would be based on some the available tags for roof shapes in OSM and this data can then be used to tag buildings in OSM. Which in turn will help to improve the situation.
Storyboard Concepts
We imagine that our AI will allow for large scale roof shape recognition across large areas. Our AI will be able to use comprehensive publicly available data sources to train on roof shapes from around the world. It wont be limited by data quality or human classification. This commitment to use public data will allow the AI to be trained more comprehensively on data from a wider selection of regions and will ensure that it doesn’t breakdown when provided with data that hasn’t been edited to fit it. This will create a robust AI that will provide reasonable accuracy across large areas and its learning can be transferable as opposed to a very specialised AI which can very accurately predict roof shapes with modified, manually classified data from smaller geographic regions. This reasonably accurate prediction can be used for large-scale digital modelling where total accuracy is not required . It would be able to provide a semi-representative picture of the roof scape of a region. This would help in modelling of 3d environments by reducing the manual labour in classifying building roofs. The AI while not perfectly accurate would help to provide a representative picture of the roof type breakdown in a region. It would be more robust when faced with average/low data quality. If we achieve good enough results we can resubmit our classifications to OSM. We will be conscious of the use of arbitrary data sources regarding feature extraction as this is something we will need to ensure is allowed.
In order to navigate and find data regarding mapping of our environment, OpenStreetMap provides the most comprehensive database
Unfortunately, that data is very sparse and limits the ability to realistically display the environment. E.g. lack of definition for building shapes
Often, you find yourself looking at satellite images to get a better actual infrastructure in the area as they allow us to see spatial elements that plain 2-dimensional maps do not
We want to ask that by using widely available data sources which allow for feature extraction can we improve the specificity of the OSM data for one important visual aspect: Roofs
Since the geometry is always present, we can crop satellite image tiles to individual buildings and use machine learning to guess the roof’s shape
We can use these images from widely available data sources to train our AI to classify roof shapes which we can then resubmit to OSM and improve its 3d visualisation aspect
Concept Development
Much of our concept development came from looking at previous studies and how they approached the problem. We looked at two in particular which are shown on the next page. One of these used high quality aerial imagery[2] to train its AI while the other used a combination of satellite imagery and LiDAR[3] data. In the study which looks to classify roof shapes using LiDAR and satellite images we noticed a variety of issues which we felt could be improved upon and took this as inspiration for how to proceed. The first issue is to do with the data set. The data set only contained 4500 images which is relatively small and on top of this is it not balanced with some classifications of roofs having a significantly larger number of images than others. The next issue is the classifications “unknown” and “complex” which could be used as a large catch all for roof shapes that the AI can‘t properly classify which in turn drives the accuracy rating up without necessarily providing a better AI. The third issue is the geographical scale used to collect the data. Their data was only sourced from three cities which doesn’t not provide a very representative picture of roofs across a region. The image data used was also significantly modified which reduced the scalability of the project. We wished to solve these issues in our own work to make an AI that is scalable and could be applied on a larger scale without the need for image editing and manual roof classification. Firstly we would have a significantly larger and more balanced data set . The number of images from each category would be much more balanced. We would not be using LiDAR data as the amount of it is very limited and using it would reduce the scalability of the project. We would also not use any catch all categories and instead ensure that the AI tries to accurately classify all roof shapes. Our geographical scope would also be larger with our images being take from across the entire state of Bavaria. The number of alterations made to our images would also kept to a minimum to preserve the scalability of the AI.
Here is a set of images used in the high quality aerial imagery study. Here we can see the editing that has been done on each image is quite significant. These images were all extremely high quality. Both of these facts mean that the AI was highly specialised to a specific type of data that isn’t widely available meaning that using it on a large scale to classify larger and different regions wouldn’t be possible.
There are similar issues with this study using LiDAR as with the previous study. Here the images were highly modified and LiDAR data was used which is not widely available. The combination of both of these limits the use of the AI on a larger scale.
Data Structure
In order to generate the training images of building roofs, the pipeline starts with gathering a specified OpenStreetMap extract. The extract is downloaded as a binary file in a format called PBF (Protocol buffer Binary Format), which packages OpenStreetMap data using Google Protocol Buffers as low-level storage. This data is unpacked, parsed and imported into a PostgreSQL database with spatial extensions (PostGIS) through imposm3, which additionally filters for ways and relations with a valid roof:shape tag. Subsequently, a python script queries this database for a bounding box (bbox) around the tagged geometry (a (Multi-)Polygon) and the associated roof
Basedshape.onthis bounding box, a 400x400px image is generated with help by the Mapbox Static Images API and saved in a sub-folder belonging to the corresponding label (the roof shape).
Label 2 Label 3Label 1
Program Directory AI
This structure allows the use of the keras function “image dataset from directory” utility to easily import the images with the correct labels attributed to them. This is slightly limited by the use of the mapbox free tier which limits monthly requests to 50k
Artificial Intelligence Research
Image Classification is a common problem meaning there are a wide variety of pre-fabricated models and applications available for us to explore and work with. We looked at a variety of different solutions. The first was a SimpleCNN model which was a variation/mix of the CNN the LiDAR Paper used and a simple default CNN. The second was ResNet[4] which is a common prebuilt architecture. This model was used in the paper using high resolution aerial imagery where it was used to benchmark their dataset. We used the ResNet V2 in our prototype. The third model EfficientNet (V2)[5] is also another common pre-built architecture that was also mentioned in the paper using high resolution aerial imagery. ImageNet is used as a common dataset for benchmarking with many models. We also looked at a variety of more complicated models such as Xception and EfficientNetV2S/M/L which would have been very interesting to test but were too resource intensive. Even on dataset v8 it throws a ResourceExhaustedError even on a high-ram GPU instance (25GB RAM)
SimpleCNN architecture from the LiDAR Paper[3]
34 Layer ResNet architecture with skip connections[6]
EfficientNet (B0) Architecture Example[7]
Prototype
We created a variety of versions of datasets that were tested and alterations were made between each version. The first was simply an initial prototype with a limited dataset of 2.5k. The second version was again using an unbalanced data set with the numbers of images of different roof types being representative of there prevalence in Bavaria. The third and fourth test balances the data set ensuring a more similar number of images for both the common and uncommon roof types. In this test the category “many” was removed classified roofs with many different features and could serve as a catch all reducing the integrity of the results. In the fifth test low quality images were removed so that the images that were being used had enough data present to work with. In the sixth test gambrel was removed along with pitched. Pitched was removed as it wasn’t an official OSM classification and gambrel was removed as it was impossible to classify at low resolutions due to its similarities with other roof types. In version 7 the data set was cleaned up with manual classification to see how this would affect results. In version 8 the manually classified data set was balanced.
For our prototype we used a variety of different models with our dataset to test which approach would provide us with the best results. This includes a simple CNN model which is a variation/mix of the CNN the LiDAR Paper used and a simple default CNN. We also used two pre built models for image classification, ResNet (V2) and EfficientNet (V2).
Results - SimpleCNN
DSv7, 50 Epochs ADAM optimizer, 71% accuracy, loss: 0.6193 - accuracy: 0.7739 - val_loss: 0.9091 - val_accuracy: 0.7122
Results - ResNet 50 V2 with light preprocessing
DSv7, 2x50 Epochs ADAM optimizer, 73% accuracy 165s 509ms/step - loss: 0.6241 - accuracy: 0.7808 - val_loss: 0.7744 - val_accuracy: 0.7397
Results - SimpleCNN
DSv8, >200 Epochs ADAM optimizer, 63% accuracy, loss: 0.3844 - accuracy: 0.8581 - val_loss: 1.4909 - val_accuracy: 0.6216
Results - SimpleCNN
DSv6, 50 Epochs ADAM optimizer, 51% accuracy, loss: 1.0953 - accuracy: 0.5865 - val_loss: 1.3101 - val_accuracy: 0.5144
Results - ResNet 50 V2 with light preprocessing
DSv6, 50 Epochs ADAM optimizer, 55% accuracy, loss: 0.7674 - accuracy: 0.7162 - val_loss: 1.3348 - val_accuracy: 0.5633
Results - ResNet 50 V2 with light preprocessing
DSv8, ~125 Epochs ADAM optimizer, 70.5% accuracy, Epoch 100: loss: 0.2105 - accuracy: 0.9228 - val_loss: 1.5083 - val_accuracy: 0.7042
Final: loss: 0.0892 - accuracy: 0.9693 - val_loss: 2.4735 - val_accuracy: 0.7005
Results - EfficientNetB3 V2 with light preprocessing
DSv8, ~90 Epochs ADAM optimizer, ~73% accuracy, Epoch 86: loss: 0.1081 - accuracy: 0.9629 - val_loss: 1.6014 - val_accuracy: 0.7343
While our results on the unbalanced datasets look promising, the perceived value is substantially limited by the over classification of images towards the over-represented categories. In order to quantify this effect, we compared it to a balanced dataset and immediately saw a drop in overall accuracy, which is also noticeable in the confusion matrices for these datasets. Additional manual review of the images and the creation of a higher quality dataset, which brought the performance back into a more competitive accuracy when compared to the high-quality data sources as used in related work, may indicate that an additional (potentially AI aided) cropping of the images may help to make the input in convoluted areas more precise. As for the inaccuracies, the confusion matrices for the v8 dataset are surprisingly explainable from a human perspective. Especially the distinction between hipped, half-hipped and pyramidal is hard to determine even for human reviewer in low resolution satellite imagery. When monitoring the progress of the training, a similar confusion between skillion, flat and round roofs was noticeable, which also tracks with the experience of manual classification. While only a weak indicator, these symptoms increased the confidence towards the validity of the neural network‘s learning progress. While we would not feel comfortable submitting the predictions back into OSM as a reliable source, incorporating the predictions into 3D/AR mapping applications would likely enhance the experience considerably.
Reflection and Outlook
With our data source there can be a number of problems with the images due to them not being manually classified. These include tagging mistakes where the tags on OSM don’t correspond to what the satellite image shows. Then there are unclear images which can be caused by a variety of reasons. Some of these are reflections, shadows and offset images. Sometimes the OSM location and the satellite image don’t line up correctly resulting in an unclear image. Construction sites can also be an issue where buildings on OSM have since been demolished and a construction site is now in their place, meaning that the AI will train based on an image of a construction site rather that the roof type that OSM said should be there. Trees sometimes obscure a significant amount of the roof. Sometimes a building is tagged correctly but is very large so when an image is taken it can miss the whole roof and only take a section in the middle missing the defining features of the roof type.
Balance in the data set is something we have tried to achieve. For all of the roof types we have tried to include a significant number of images. This still isn’t perfect as for example the round roof type is very rare and even using all tagged round roofs in Bavaria the amount of images for this roof type is still significantly less that the other types.
Problem Images
Construction Site Offset OutlinesBuilding too bigGarageThere are a few changes we would like to make if we were to continue forward with this project. The first of these is to do with the dataset. We would limit the size of flat roofs that images are collected for. This would mean only taking flat roofs above a certain size. This would prevent the collection of images of garages as these are often connected to houses which can lead to confusion in the training process and poor results. This leads to the second point which is to reduce the need for a manual review. We began without manually reviewing the data but realised that many of the images had issues which was negatively impacting our results. Due to this we manually reviewed the data not only did this help our results but also made us aware of the problems with our images and how we may be able to avoid these in the image collection process in the future so that the whole process can be automated. Fixing issues with bounding boxes is also something that we would like to address in the future. This would involve only selecting buildings where no other buildings are visible within the crop so that the AI an be trained with less interference. We would also like to use test out using TPU training.
Collaboration
Throughout the course of our project and the creation of our prototype there was constant communication and discussion regarding the direction the project should take and adjustments that needed to be made both in regards to the state of the dataset, the process for image collection and the AI models used. The coding of the dataset image collection program was undertaken by Wolfgang with the process and nature of images collected decided collaboratively. These decisions were often made through researching using the papers mentioned previously. Throughout the creation of the prototype various elements were undertaken by both students at different times. Initial data importing and structuring was begun by Aaron from which point work was done collaboratively with the running of the training being done by Wolfgang. The final presentation was put together collaboratively. For the final presentation the code packaging was done by Wolfgang with the booklet being done by Aaron.
[1]OpenStreetMap Taginfo. https://taginfo.openstreetmap.org/ . Accessed on 04.09.22
[2]Buyukdemircioglu, Mehmet & Can, Recep & Kocaman, Sultan. (2021). DEEP LEARNING BASED ROOF TYPE CLASSIFICATION USING VERY HIGH RESOLUTION AERIAL IMAGERY. The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences. XLIII-B3-2021. 5560.10.5194/isprs-archives-XLIII-B3-2021-55-2021.
[3]Castagno J, Atkins E. Roof Shape Classification from LiDAR and Satellite Image Data Fusion Using Supervised Learning. Sensors (Basel). 2018 Nov 15;18(11):3960. doi: 10.3390/s18113960. PMID: 30445731; PMCID: PMC6264004.
[4]K. He, X. Zhang, S. Ren and J. Sun, „Deep Residual Learning for Image Recognition,“ 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770-778, doi: 10.1109/
[5]Tan,CVPR.2016.90.M.and Quoc, V.L. (2019) EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. arXiv:1905.11946.
[6]Towards Data Science.
[7]Googleclassification-localization-detection-e39402bfa5d8.https://towardsdatascience.com/review-resnet-winner-of-ilsvrc-2015-image-Accessedon04.09.22AIBlog.https://ai.googleblog.com/2019/05/efficientnet-improving-accuracy-and.html. Accessed on 04.09.22