Mobile Mapping

Page 1

MOBILE MAPPING Mapping the Human Experience of Spaces

MOBILE MAPPING Mapping the Human Experience of Spaces Adam Heisserer


Can we create an efficient process for mapping the human experience of the spaces we design? ABSTRACT Mobile Mapping is a workflow that creates maps of the human experience of spaces. Every space has unseen qualities such as temperature, humidity, and human biometric data that are rarely visualized. This is a series of post-occupancy evaluations that map air quality, sound, or occupant heart rate to visualize and understand the human experience of spaces. This data was gathered with mobile phone cameras and off-the-shelf sensors, and then mapped with Grasshopper. This automated process increases the precision and volume of data that can be collected from our work, which can then influence our design process.

KEYWORDS Biometrics, Post-Occupancy Evaluation, Indoor Positioning Systems, Photogrammetry.

INTRODUCTION Building performance data is currently limited to anecdotes, surveys, energy monitoring, and on-site measurement with hand-held devices. Post-occupancy evaluation through manual measurement is timeconsuming and inefficient, and stands to benefit from automated positioning. By pairing timestamped datacollecting sensors with an indoor positioning device, data can be collected efficiently, making crowd-sourced post-occupancy evaluation feasible. The data can then be visualized spatially, allowing architects to evaluate the human experience of their work. In addition to this valuable design feedback loop, biometric data can be used to support the value of the high-quality spaces that architects provide.

RESEARCH SUMMARY The challenge is to collect as much data as possible from as few devices as possible. In this workflow, an Arduino micro-controller collects environmental quality data such as daylight, temperature, and heart rate, while a mobile phone records video. The video is used to detect objects in the space and reconstruct the camera path through a process called Simultaneous Localization and Mapping, or SLAM. The data and position are then synchronized in Grasshopper to form a map of each metric. The workflow is as follows: 1. Collect experiential data with a timestamp 2. Determine position with a timestamp 3. Synchronize data and position to create a map. 4

Mobile Mapping Workflow

MOBILE MAPPING WORKFLOW All of the data collection is done on-site with a mobile phone recording video, and an Arduino microcontroller with any combination of sensors. Timestamped data is logged from both devices, and the video frames are processed with a method called Simultaneous Localization and Mapping (SLAM) to create a 3D point cloud of the scanned space and the path of the camera moving through it. The resulting data points and corresponding position are processed in a Grasshopper script that interpolates a heat map of the data.



The first potential set of data to be collected is typical environmental quality data such as:

The Subjective data can be collected by post-occupancy surveys, but the growing accuracy, availability, and affordability of biometric devices has made it possible to map formally subjective qualities with objective and accurate data. Biometric data comes closer to illuminating the actual human experience of a space than environmental quality data can.

1. Daylight 2. Temperature 3. Humidity 4. Wind Speed 5. Carbon Dioxide 6. Carbon Monoxide 7. Air Pollutants

This project emphasizes the importance of biometric data, a realm of data that has been less frequently explored because of the historical challenges of quantifying subjective human experiences. This data set includes: 1. Heart Rate 2. Blood Oxygen 3. Body Temperature 4. Movement 5. Neural Activity 6. Eye Movement 7. Stress 8. Sound 9. Color 10. Distance of Views 11. Vegetation 12. Object Recognition

SELECTING DATA-COLLECTING DEVICES The traditional toolkit for collecting spatial postoccupancy data can become expensive. We are currently limited both by cost and the amount of data we can collect. Many of the devices listed above are expensive, not readily available, or indiscrete, and do not include a built-in geo-location and timestamp. The only way to gain valuable insights from spatial data collection is by collecting a lot of it. This requires an easy workflow with universally available devices so that data has the potential to be crowdsourced. The accessibility of mobile devices and sensors makes it possible for architects to perform detailed post-occupancy studies of the spaces they design. Photogrammetry software, augmented reality apps, and GPS are advancing fast enough that mobile phones are becoming a very inexpensive and accessible positioning device. These can be paired with sensors to map environments based on an individual’s experience. Mobile positioning and sensing allow any designer to gather data about the




• • • • • • • • •

• • • • • • • •

Daylight Meter $$ Sound Meter $$ Thermometer $ Humidity Meter $ Anemometer $ Carbon Dioxide Meter $$ Thermal Camera $$$ Energy Monitor $$$

Laser Scanner $$$$

Daylight Sound Level Temperature Humidity Wind Speed Carbon Dioxide Thermal Imaging Energy 3D Scanning



• • • • •

Temperature Humidity Daylight Carbon Dioxide Air Quality Metrics


• • • • • • • •

GPS Movement Sound Sound Recognition Color Object Recognition Positioning 3D Scanning


Arduino $

Mobile Phone $

built environment at a very granular level. For this project, data will be collected through a combination of an Arduino microcontroller and a mobile phone. Arduino is an open source microcontroller computer that can be programmed to operate several sensors simultaneously and record the data. A mobile phone loaded with the right apps can record video and collect data such as sound (decibel level, sound recognition), image (color, light, vegetation, face and object recognition), movement (from the built-in gimbal and accelerometer), rough global positioning from GPS, and more precise spatial localization from photogrammetry techniques, which use photos to estimate distances between objects and build maps accordingly.

CROWDSOURCING DATA COLLECTION Crowdsourcing is the process of building a data set with input from a large group of people. It has become more feasible since widespread mobile access to the internet. These data sets can be mined and displayed graphically to reveal both major trends and subtleties in the data set. Applications such as Roast by Kieran Timberlake track survey responses of building occupants based on their location. Survey responses are valuable data, but are subject to intentional or unintentional deception or biases. Collecting biometrics sheds light on human response to the built environment without the subjectivity, inaccuracy, or inconvenience of manual data entry.

Twitter and Flickr activity in New York City, Eric Fischer

eBird App, Cornell Lab of Ornithology

Nuclear Legacy, Radiation Mapping in Chernobyl, Greg McNevin

Long Exposure of LED on a Roomba, Andreas Dantz 7

SETTING UP THE ARDUINO The Arduino UNO is a portable micro controller powered by a small battery pack. Six AA batteries last longer than a 9-volt battery, which is also sufficient to power the system. For the first experiment, a program is uploaded to the Arduino that allows it to record light level and temperature from two sensors once every second. A data logging shield is soldered to the top that records the data to a memory card with a timestamp. The Arduino begins recording data as soon as it is plugged in to the power source. 8

Arduino Microcontroller with data logging shield and breadboard.


Positioning Method





GPS (Phone) Wi-Fi Beacon SLAM (LIDAR) SLAM (Video) Stationary Array Manual Mapping

POSITIONING OPTIONS After establishing a workflow for collecting data, the next step is to establish a positioning system. Six options were considered: GPS is the simplest and most easily available option. Early tests were completed with GPS as the positioning system, which works well when covering large areas outdoors. A margin of error of a few meters is reasonable for mapping at an urban scale, but GPS is not accurate enough for an indoor application. Indoor positioning with a WiFi signal is feasible, but it requires the placement of beacons within a space before every survey. The ideal positioning system would stand alone as its own device and not require any set up work.


SLAM is an acronym for Simultaneous Localization and Mapping. It’s a method used in robotics and autonomous vehicles in which a moving object uses sensors such as cameras, radar, or lidar (like radar, but with lasers) to map its surroundings in the form of a 3D point cloud, and then compute how it is moving relative to those surroundings. The choice between lidar versus a camera is a choice between investing more heavily in hardware versus software. Lidar requires more impressive hardware, with less work to process the data, and vice versa. Lidar is more expensive and not as universally available as video on a mobile phone, making visionbased SLAM more ideal, if not as easy. Augmented reality applications for mobile phones are becoming increasingly common, and are likely to improve. The downside to vision-based SLAM is that the 3D point cloud is generated from unique photos and videos, and not as uniform as a laser-generated point cloud that scans a space evenly, regardless of visual features.

Temperature and Light mapping at the Pearl District in San Antonio. GPS is sufficiently accurate for positioning at large, urban scales, though it takes some time to callibrate.


A third method uses a stationary array of data-collecting sensors to bypass active positioning altogether. Finally, a human-powered compromise entails manually marking one’s location on a printed plan or touchscreen as data is being collected and timestamped separately. Vision-based SLAM (loosely synonymous with Structure From Motion (SfM) or Photogrammetric Range Imaging) is perhaps the most challenging option but is the most promising considering its ease and the ubiquity of smart phones. There is also reason to be optimistic about the open-source visual SLAM packages that have become recently available and are likely to continue improving.

INTRODUCTION TO SLAM This project employed COLMAP 3.4, a Structure from Motion software, to perform SLAM. COLMAP is a pipeline with a graphical interface that inputs photos and outputs a 3D point cloud and, more importantly for this project, the locations of the camera in each photo. It has three major processes: 1) Each frame of the video is analyzed to determine the focal length, dimensions, and other properties of the image. 2) Unique features are identified in each frame and matched with unique features in other frames. 3) When the same unique feature appears in multiple frames, its position is triangulated and a point is created with position and color value. Camera locations are determined through the same process. The reconstruction process continues, occasionally correcting


for errors as the point cloud is solidified.

SLAM FROM VIDEO AT CONFLUENCE PARK This test measured the environmental qualities of the Confluence Park pavilion in San Antonio, Texas. After turning on the Arduino, which begins collecting light and temperature data by the second, I begin recording video and walk around the pavilion for about ten minutes. The reconstruction works best when the video frames are in focus, and the object or space is photographed from several vantage points. The video is uploaded and processed in Photoshop to turn video into individual frames at about 4 frames per second. a. b. c. d. e.

Drag and drop video into Photoshop 2017 File > Export > Render Video Select Photoshop Image Sequence Set frames per second, (probably at least 2 fps, and aim for 500-5,000 total frames) > Render

This produces a few thousand individual frames that are then processed in COLMAP. a. b. c. d. e. f. g. h.

Open COLMAP (double click colmap.bat) File > New Project. Specify a name for the new database Select file path to Images Processing > Feature Extraction > Extract Processing > Feature Matching > Run Reconstruction > Begin Reconstruction Save model as a .nvm file

Video Frames

Feature Matching



15 Point cloud of Confluence Park, San Antonio, Texas.

The reconstruction in COLMAP is the most time-consuming part of the process. The test at Confluence Park in San Antonio was constructed from a 10-minute video, with about 4,000 individual frames. It took a few hours to match unique features, and at least 24 hours for a complete reconstruction of about 300,000 points. Of course, 4,000 frames is far more than necessary to identify the path of the camera, which is all that is needed to identify the locations from which environmental data were collected. COLMAP is an incredible and generous open source program, which produced an impressive 3D point cloud and camera path, however the processing time is a bottleneck that needs to be replaced with real-time SLAM to make abundant data collection more feasible.

Pearl Brewery, San Antonio, Texas.

LIMITS OF MONOCULAR SLAM A successful reconstruction requires enough unique features in the video to compare and match numerous features from frame to frame. This requires a video that captures a space from a wide variety of angles and view sheds with enough overlap and redundancy so that it can be pieced together. Motion blur is a common problem in capturing successful video. A large percentage of video frames captured manually with a mobile phone are susceptible to motion blur, making feature recognition difficult. This is


Incomplete reconstruction of Pearl Brewery

especially problematic in low-light environments, such as this incomplete reconstruction of the Pearl Brewery district in San Antonio, captured after dark. High contrast or high-glare environments are also difficult to process. These issues can be minimized with a higher-quality camera, such as an action camera built for capturing video in motion, or by taking a fast time lapse of individual frames with a fast shutter speed. Repetitive patterns and reflective surfaces such as glass, water, or mirrors can also make 3D reconstructing difficult. This reconstructed model from a video was taken in the lobby of the Seattle Public Library. A part of the glass facade was visually mistaken for another part of the glass facade, causing a compounding error in the reconstructed model.

Partially successful reconstruction of Seattle Public Library lobby. 17

ACTION CAMERAS A GoPro Session 5 action camera was used for a few experiments in reconstruction. Compared to a mobile phone camera, the GoPro Session has a wider angle of view and a better ability to focus while in motion, making it ideal for photogrammetry. Selecting the correct camera model in COLMAP is important for an accurate reconstruction. Open CV Fisheye seems to be the most accurate camera model for reducing distortion from the wide angle lens of the GoPro. These two images are reconstructions of the same studio space, one with the incorrect camera model, which caused distortion and misplacement of points on the periphery of the image, and another with the correct camera model that produced a much more accurate model from the same wide-angle video frames.

Distorted reconstruction of a Lake Flato studio with fisheye lens.

360 DEGREE CAMERAS 360 cameras were considered for recording video, however, a few difficulties arise when recording in 360 degrees. Consumer-level 360 cameras are improving quickly, but are still a developing technology. An extremely wide angle of view produces low resolution images and unmanageable file sizes. There are inconsistencies when stitching two hemispheres together, making it difficult to model the exact properties of the camera. There is also the issue of the camera operator being in frame, which is problematic for photogrammetry. Because 18

Accurate reconstruction of a Lake Flato studio with fisheye lens.

this project is focused on the human experience of space, it makes sense to have a video recording method that is most similar to human vision, that is, directional, horizontal, and a relatively wide angle of view. The directionality of the video reveals something about how the camera operator navigates the space and what they chose to look at. This data can be enhanced by eye tracking to reveal how the operator visually experiences the space.

ROBOTICS AND DRONES There is also the potential of recording video from an unmanned drone or robot. Equipping a small roaming robot (such as a Roomba) with a camera and indoor air quality sensors is an interesting opportunity to autonomously collect data in a large space at multiple time intervals throughout the day. The variable of time is a key aspect of the qualities of spaces that is difficult to explore without repetitive, autonomous exploration and data collection in a space. An experiment was done to take online video footage of a small quadcopter drone with a first-person-view camera flying through Confluence Park and compare the reconstruction to that of the video taken manually. The drone video was successfully reconstructed into a point cloud and a nearly complete camera path. This is another opportunity for quickly collecting data at inaccessible locations. Although collecting video from online videos or autonomous vehicles can be used for spatial

Camera path of quadcopter drone. One minute flight through Confluence Park, San Antonio. 19

reconstruction, sound mapping, object recognition, construction administration, and other video-based exercises, this process cannot yield biometric data.

INTERPOLATING THE MAP When the reconstruction is done, the point cloud and camera path data is exported and processed in Grasshopper. Using the time stamps for each, this script synchronizes the location data with the corresponding light, temperature, and heart rate data, and produces a heat map of each metric. The remainder of the map outside of the path is interpolated with a non-linear regression model. Grasshopper is a visual programming language for the Rhino 3D modeling platform. Two data sets are imported into the script: The 3D point cloud and camera path data from COLMAP formatted as an .nvm file, and the timestamped Arduino data formatted as a .csv file. The COLMAP data includes a list of camera path locations with a corresponding file name, which includes a universal timestamp. All coordinates are correct relative to each other, but they have no universal orientation, so the model has to be manually reoriented so the X,Y,Z axes are east, north, and up, respectively. A non-linear regression model interpolates the points between known data points to infill the rest of the map. The non-linear regression model can be adjusted to smooth out the gradient between data points, and the color scale can be calibrated depending on the maximum and minimum values of the data set being mapped.


Grasshopper Script for Interpolating the Map 21

Illuminance: 22

Heart Rate:

MAPPING CONFLUENCE PARK AND THE WITTE MUSEUM There were no surprises with the daylight map, ranging from about 1,000 lux in blue to about 100,000 lux in red. North is to the right on this map. My heart rate fluctuated between 70 and 90 bpm over the 10-minute survey time. The temperature map produced a nice gradient from about 90 degrees on the sunlit, paved north side, to about 80 degrees on the vegetated south side. While biometrics are compelling data to be collected and mapped, they are also the most variable. While certain metrics such as illuminance and temperature are meaningful in a small sample size, biometrics such as heart rate require a much larger statistical sample before any inferences can be made. At this point, these maps are a proof of concept of the workflow, and not yet meaningful data about the correlation between an occupant’s physical response and the environmental quality of the space they are in.

OBJECT RECOGNITION Each video frame is uploaded to Google Photos Temperature:


Front lawn of the Witte Museum, San Antonio, Texas. 24


Heart Rate:



to create a list of object tags present in the frame. They are then mapped onto the location where the video frame was captured. This adds another layer of qualitative mapping to the space. Image recognition through machine learning provides a spatial understanding to the designer that is in some way obvious, but never codified. Converting the subjective experience of recognition into real data allows for the comparison of different spaces, and the potential to identify correlations between the objects present and the biometric experience of the space. The following maps indicate the presence of certain objects at Confluence Park. This is a sample of dozens of object maps produced with objection recognition.

SITE VISITS AND CONSTRUCTION ADMINISTRATION Another application of the photogrammetry process is that photos taken on a site visit or during construction administration can be mapped based on their location, assuming there is enough overlap and common imagery between the set of photos. In addition to object recognition, there is potential that images could be automatically scanned for quality control to aid the punch-listing process. Photos from a site visit mapped according to location and directionality. 26


INDOOR AIR QUALITY MAPPING The atrium of Lake Flato’s Austin Central Library was tested for indoor air quality. Positioning was done through a pair of GoPro Session 5 cameras. A combination of sensors on the Arduino microcontroller and other off-the-shelf data logging air quality sensors were used to collect temperature, humidity, carbon dioxide, PM2.5, PM10, VOC’s, illuminance, sound, and the heart rate of the surveyor. These air quality data points can be mapped onto the nearest point on the point cloud and colored with a gradient to visualize the properties of the space.

The Austin Central Library was evaluated for several air quality metrics such as temperature, humidity, carbon dioxide, PM2.5, PM10, and VOCs.



VISUALIZATION METHODS Some maps were visualized by overlaying a heat map on a drawing, or inserting it within the 3D point cloud. These images were created by colorcoding the 3D point cloud itself to take on a color gradient relative to the nearest data collection point. The 3D point cloud is only a by-product of the SLAM process, but it can be rendered and densified to create a compelling 3D model. Visual photogrammetry produces point clouds based on visually unique objects, unlike LIDAR-based point clouds that are sampled evenly.

FUTURE WORK The SLAM workflow needs to be streamlined to get as close to real-time processing as possible. Volume of data is critical for this kind of work, so it needs to be processed quickly and easily. The types of biometrics being collected could be expanded, for example galvanic skin response devices to approximate stress levels. Eye tracking and pupil dilation are other key biometrics that could be used to analyze the way an occupant experiences a space. There is also a lot more data to be mined from the video itself, such as color, sound, and more detailed object recognition.

Map of sound level at Confluence Park, mapped onto the 3D point cloud. 30

Temperature map at Confluence Park.

Illuminance map at Confluence Park.

Illuminance map at Confluence Park.


32 3D point cloud of Confluence Park reconstructed as a mesh.


APPENDIX RELATED PROJECTS Bird Migration Occurrence Maps - Cornell Lab of Ornithology

Visual Exploration (Light, Photography, and the Invisible) – Technorhetoric

Thousands of users download a mobile app and record when they see a particular bird species. The time and location are recorded, and the aggregate results produce compelling maps of bird migration over time for several different species.

Everywhere I’ve Been – Aaron Parecki

Multidimensional Post-Occupancy Evaluation Tool BOSSA

Automated 3D Modeling of Building Interiors – UC Berkeley Electrical Engineering and Computer Sciences

Data Portraits Powered by 3.5 years of data and 2.5 million GPS Points

Stress Mapping, The Economics of Biophilia - Terrapin Bright Green

Circulation Visualization - ZGF

Behavioral Maps and GIS in Place Evaluation and Design - Urban Planning Institute of Slovenia


Manual Behavior Mapping in Emergency Rooms University of Kentucky Pointelist – Kieran Timberlake Data Mining the City – The Living

Prototyping custom hardware to simultaneously gather both physical and personal data about the city, and then using Machine Learning algorithms to discover patterns and correlations between the physical realities of the city and our personal experiences of it.

ARCore - Google


Biometrics Biometrics- metrics related to the body or human characteristics. Galvanic Skin Response (GSR) - change in the electrical resistance of the skin caused by emotional stress, measurable with a galvanometer. Electroencephalography (EEG) - the measurement and recording of electrical activity in different parts of the brain.

Urban Theory Psychogeography - an exploration of urban environments that emphasizes playfulness and “drifting”, linked to the Situationist International. Dérive - a mode of experimental behavior linked to the conditions of urban society: a technique of rapid passage through varied ambiances, put forward by Guy Debord. Computer Vision Computer Vision - how computers can be made for gaining high-level understanding from digital images or videos. Simultaneous Localization and Mapping (SLAM) - constructing or updating a map of an unknown environment while simultaneously keeping track of an agent’s location within it. Usually applied to autonomous vehicles and robotics. Vision-based SLAM -Simultaneous Localization and Mapping based on photography, rather than rangefinding technology such as lidar. Monocular SLAM - Simultaneous Localization and Mapping from a single visual camera as opposed to stereo vision. A benefit is that simpler hardware can be used, but the algorithm needed to perform SLAM is more complex. Structure from Motion (SfM) - a photogrammetric range imaging technique for estimating three-dimensional structures from two-dimensional image sequences that may be coupled with local motion signals. COLMAP - a general-purpose Structure-from-Motion (SfM) and Multi-View Stereo (MVS) pipeline with a graphical and command-line interface. It offers a wide range of features for reconstruction of ordered and unordered image collections. The software is licensed under the new BSD license. Copyright 2018, Johannes L. Schoenberger. Large Scale Direct (LSD) SLAM - a novel approach to real-time monocular SLAM. It is fully direct (i.e. does not use keypoints / features) and creates large-scale, semi-dense maps in real-time on a laptop.


Lake|Flato Architects 311 Third Street, San Antonio, Texas 78205 210.227.3335 38