Issuu on Google+

Differentiating Two Daily Activities Through Analysis of Short Ambulatory Video Clips Payam Moradshahi1, James R. Green1, Edward D. Lemaire2,3, Natalie Baddour4 1


Department of Systems and Computer Engineering, Carleton, University, Ottawa, Canada Centre for Rehabilitation Research and Development, Ottawa Hospital Research Institute, Ottawa, Canada 3 Faculty of Medicine, University of Ottawa, Ottawa, Canada 4 Department of Mechanical Engineering, University of Ottawa, Ottawa, Canada {mpayam, jrgreen},,

Abstract— Automatically detecting daily activities using wearable smartphones would provide valuable information to clinicians. While accelerometer data is effective in this area, classifying stair ascent can be difficult. In this paper, video content analysis is performed on short videos captured from a wearable smartphone in order to distinguish between level ground walking and stair climbing. High contrast image features, such as corners, were tracked across consecutive video frames to create feature paths. Computing the median of the slope of the paths in each frame revealed substantial differences, in both magnitude and variation over time, for stair climbing as opposed to walking. A time series of median slope values was produced for each video clip, and the number of local maxima and minima above a threshold of 1.0 were computed. Results revealed that the number of peaks during stair climbing were substantially larger than walking and, therefore, could be used as a feature for distinguishing between these two activities. Keywords-component; walk; stairs; video; video content analysis; wearable mobility monitoring system



Understanding mobility as people move within their chosen environment is essential for healthcare decisionmaking and research. Ideally, a quantitative approach is used to monitor a person as they perform their activities of daily living, since self-reporting activity is often biased and clinical mobility measures do not necessarily reflect mobility outside the clinic. Wearable mobility monitoring systems (WMMS) can provide this quantitative information to better understand activity and mobility. Smartphones have been used as a WMMS because they are small, light, easily worn, easy to use for most consumers, and they provide multitasking computing platforms that can incorporate accelerometers, GPS, video camera, light sensors, temperature sensors, gyroscopes, and magnetometer [1],[2]. Activities are typically recognized by analyzing the phone’s accelerometer or external sensor data [3]–[6]. A Blackberry smartphone-based system was developed that identifies each mobility change-of-state (CoS), attempts to classify the activity using accelerometer data, and triggers video capture for 3 seconds [6]. This single subject case study reported an average sensitivity of 89.7% and specificity of 99.5% for walking-related activities, sensitivity

978-1-4673-5197-3/13/$31.00 ©2013 IEEE

of 72.2% for stair navigation, and sensitivity of 33.3% for ramp recognition. Specifically, climbing a staircase was identified as being difficult to distinguish using accelerometer data alone. Video analysis could improve activity categorization and provide context for the activity. However, appropriate, automated video analysis methods for wearable video are not currently available. Automated video analysis for ambulatory video captured by a smartphone has numerous inherent challenges. First, the camera is not mounted on a stationary platform. In fact, typical waist-clip mounting leads to potentially large vibrations and translations as the person’s hips move, as well as occasional occlusions by the arm as it swings in front of the camera. Second, there is no frame of reference, which could be used for tracking purposes. Finally, no prior knowledge of the environment is known and environment content is continuously changing. Previous work related to video content analysis focused on unsupervised clustering of audio and video inputs for event detection [7], system training in the user’s environment as an indoor personal positioning system [8], or capturing specific environment features (such as edges) for heading change detection [9] and absolute corridor orientation detection [11]. For the corridor orientation study, the camera was mounted in a fixed orientation on a rolling platform and was not subject to excessive vibration, swinging, or other environmental interference. In this paper, short 3-second ambulatory video clips, captured using a wearable smartphone, were analyzed in order to extract features capable of distinguishing between walking and stair ascent. II.


Video recording was performed at 30 frames per second using a Blackberry10 ™ Alpha device or Blackberry Bold™ 9900 (Research In Motion, Waterloo, ON, Canada). A video database of level ground walking and stair climbing was created from videos recorded by the previously developed WMMS software [6] and manually captured videos on the Blackberry10 device. All trails were performed by ablebodied participants with no mobility deficits. The smartphone was placed in a holster and clipped onto the front right side of the person’s waist belt. A total of 12 stair climbing (using the Blackberry™ Alpha device) and 12

walking videos (3 from Blackberry10 ™ Alpha device and 9 from Blackberry Bold ™ 9900) were captured. Five different stairways were used to represent the diversity of stairwell design expected in an urban environment. Eight indoor and four outdoor videos were captured for walking. The video clip duration was 3 seconds for all trials, to remain consistent with videos captured in practice by the WMMS software.

two frames from the same video clip. Applying the same algorithm to the walking videos showed much less variation in slope as shown in Figure 3. Furthermore, the maximum slope also appeared to be greater for stair climbing video clips. These observations formed the basis for a method to distinguish between walking and stair ascent.

A Graphical User Interface (GUI) was designed to efficiently analyze the captured videos. The GUI was developed in C++ under Microsoft Visual Studio 2010 and Qt Designer 4.8.3. Open Source Computer Vision Library Version 2 (OpenCV2) was used for video and image processing and analysis. III.


The video database was studied to extract features capable of distinguishing between level ground walking and stair ascent. Due to the lack of a frame of reference in the captured videos, or any pre-defined regions of interest, the videotracking algorithm did not have prior knowledge of what to track within a given frame or a series of consecutive frames. In the absence of prior knowledge, high-contrast visual features, such as edges and corners, can often be identified systematically within a frame. By using these features within each frame, a feature-tracking algorithm was developed using the framework outlined in [11].

Figure 2. Feature tracking algorithm output for stair climbing clip for two frames of same video.

The feature-tracking algorithm uses the OpenCV2 goodFeaturesToTrack function, described in [12], to find high-contrast features within a given frame. The algorithm keeps track of the feature’s location within consecutive frames and discards features that are no longer found in the new frame. A straight line was used to visualize the path of a given feature across a series of frames and its initial location. A circle was drawn on one end of the line to represent the current position of the tracked features. We refer to these lines as “feature-paths”. Figure 1 depicts sample output from the feature-tracking algorithm for a video clip taken while walking.

Figure 3. Feature tracking algorithm output for walking for two frames of the same video. Figure 1. Feature tracking algorithm output for a walking video.

Output from the feature-tracking algorithm revealed a specific pattern for stair climbing. As the user moved up the stairs, the upward and downward motion of the pelvis translated into a cyclic variation of track-line slopes between video frames. This is illustrated in Figure 2, which depicts



The slope differences between walking and stair climbing were used as a way of distinguishing between the two activities. While most feature paths within a video frame share the same slope, there are a number of outliers, as seen in Figures 1-3. Therefore, the median slope among all the feature-paths in a frame was used. By applying this

algorithm to all frames in a video clip, a time-series of median feature-path slopes can be computed for each clip. Figures 4-9 depict the time series of the median of feature-path slopes across six sample video clips from the stair climbing and walking database (three of each). With the exception of a few outliers, two key differences were identified in the signal behavior between stair climbing and walking videos: 1.

Slope magnitude, reflected in the absolute value of peaks


Slope variation over time, reflected in the number of peaks observed within 90 frames

Figure 6. Stair climbing feature-path slopes (sample #3).

These two differences were encapsulated in the final algorithm, which counted the number of “high magnitude” change-of-slope peaks within the video clip. To quantitatively distinguish walking and ascending stairs, a peak detection algorithm similar to [13] was applied to the median slope time series. In this method, the local maxima or minima with an absolute magnitude greater than a certain threshold are considered to be a “peak”. Examination of relatively interference-free video clips revealed that a threshold value of 1.0 (i.e. 45 degrees, or rise:run=1.0) effectively filtered small variations in slope amplitude and provided a robust method of distinguishing between walking and stair ascent.

Figure 7. Walking feature-path slopes (sample #1).

Figure 8. Walking feature-path slopes (sample #2). Figure 4. Stair climbing feature-path slopes (sample #1).

Figure 5. Stair climbing feature-path slopes (sample #2).

Table 1 summarizes the number of peaks detected from the walking and stair climbing videos. The peak count for climbing stairs was substantially larger than walking. Furthermore, a two-tailed t-test comparing the peak counts

Figure 9. Walking feature-path slopes (sample #3).

revealed a significant difference (p<0.05) between conditions. While all video clips in the present study were the same length, peak counts should be normalized by frame count before application to videos of varying lengths.

Table 1. Number of peaks for walking and stair ascent. Number of Peaks Sample # Walking Stair Climbing 1 2 3 4 5 6 7 8 9 10 11 12 t-test


0 3 1 0 0 0 2 0 0 0 0 0

16 9 10 8 18 11 16 16 16 8 22 7 1.4 x 10-6


In this research, ambulatory video recorded by a wearable smartphone was analyzed in order to extract features capable of distinguishing between level ground walking and stair climbing. By automatically identifying high-contrast video features and tracking their progression across consecutive frames, it is possible to estimate the direction of flow in a video clip. By plotting the time series of this flow direction over the course of the video clip and counting the number of ‘high magnitude’ peaks, a method for discriminating between walking and stair climbing is achieved – activities that are difficult to differentiate using accelerometer data alone. The main challenge in the walking videos was the presence of interference in the video, such as other people walking through the field of view, independently moving objects, and the subject’s hand blocking the camera due to arm swinging. These error sources introduce a number of features that exhibit extreme slopes for a given frame. By computing the median, rather than the mean, we effectively removed these extreme slopes. However, in some cases the interference remained for a longer period of time, or covered a larger portion of the scene, and therefore introduced spikes in the peak count (e.g. walking sample #2 of Table 1) or resulted in a lower number of peaks (e.g. stair climbing sample #12 of Table 1). This paper focused only on stair climbing and did not examine stair descent. Videos captured from descending the stairs did not capture stair rails or bars due to the smartphone placement. As a result, the type of video captured from descending the stairs was different than climbing the stairs and hence remains as a future challenge. Addressing the descent task, and the detection of other activities of daily living from short ambulatory video clips, will be addressed in future work

REFERENCES [1] J.R. Kwapisz, G.M. Weiss, S.A. Moore, “Activity recognition using cell phone accelerometers,” in Proc. 4th Int. Worksh. Knowl. Discovery Sens. Data, 2010, pp. 10-18. [2] R.K. Ganti, S. Srinivasan, A. Gacic, “Multisensor fusion in Smartphones for lifestyle monitoring,” in Proc. Int. Conf. Body Sens. Networks, 2010, pp. 36-43. [3] S. Zhang, P. McCullagh, C. Nugent, H. Zheng, “Activity monitoring using a smart phone’s accelerometer with hierarchical classification,” in Proc. IEEE Int. Conf. Intelligent Environments, 2010, pp. 158-163. [4] R.K. Ganti, S. Srinivasan, A. Gacic, “Multisensor fusion in Smartphones for lifestyle monitoring,” in Proc. IEEE Int. Conf. Body Sensor Networks, 2010, pp. 36-43. [5] G. Hache, E. Lemaire, N. Baddour, “Wearable mobility monitoring using a multimedia smartphone platform,” IEEE Trans. Instrum. Meas., 2010, pp. 1-9. [6] H-H. Wu, E. Lemaire, N. Baddour, “Activity Change-of-state identification using a Blackberry Smartphone,” J. Med. Biol. Eng., 2012, vol. 32, Issue. 4, pp. 265-272. [7] B. Clarkson, A. Pentland, “Unsupervised clustering of ambulatory audio and video,” in Proc. Int. Conf. on Acoustics, Speed and Signal Processing (ICASSP), 1999. [8] H. Aoki, B. Schiele, A. Pentland, “Realtime Personal Positioning System for a Wearable Computers,” in Proc. Third Int. Symposium on Wearable Computers, 1999, pp. 37-43. [9] L. Ruotsalainen, H. Kuusniemi, R. Chen, “Heading Change Detection for Indoor Navigation with a Smartphone Camera”, in Proc. Int. Conf. on Indoor Positioning and Indoor Navigation, 2011, pp. 1-7. [10] S. Segvic, S. Ribaric, “Determining the Absolute Orientation in a Corridor Using Projective Geometry and Active Vision”, IEEE Trans. Indust. Electronics, 2001, Vol. 48, Issue. 3, pp. 696-710 [11] R. Laganière, OpenCV2 Computer Vision Application Programming Cookbook, Packt Publishing, 2011. [12] e_detection.html [13] H. Chatrzarrin, A. Arcelus, R. Goubran, F. Knoefel, “Feature extraction for the differentiation of dry and wet cough sounds,” in Proc. IEEE Int. Symposium on Medical Measurements and Applications, Bari, Italy, May 2011, pp. 162-166.

Differentiating two daily activities through analysis of short ambulatory video clips