FREE ELECTRONIC LIBRARY - Books, dissertations, abstract

Pages:   || 2 | 3 |

«Abstract. While recent advances in computer vision have provided reliable methods to recognize actions in both images and videos, the problem of ...»

-- [ Page 1 ] --

Assessing the Quality of Actions

Hamed Pirsiavash, Carl Vondrick, Antonio Torralba

Massachusetts Institute of Technology


Abstract. While recent advances in computer vision have provided reliable methods to recognize actions in both images and videos, the problem

of assessing how well people perform actions has been largely unexplored

in computer vision. Since methods for assessing action quality have many

real-world applications in healthcare, sports, and video retrieval, we believe the computer vision community should begin to tackle this challenging problem. To spur progress, we introduce a learning-based framework that takes steps towards assessing how well people perform actions in videos. Our approach works by training a regression model from spatiotemporal pose features to scores obtained from expert judges. Moreover, our approach can provide interpretable feedback on how people can improve their action. We evaluate our method on a new Olympic sports dataset, and our experiments suggest our framework is able to rank the athletes more accurately than a non-expert human. While promising, our method is still a long way to rivaling the performance of expert judges, indicating that there is significant opportunity in computer vision research to improve on this difficult yet important task.

1 Introduction Recent advances in computer vision have provided reliable methods for recognizing actions in videos and images. However, the problem of automatically quantifying how well people perform actions has been largely unexplored.

We believe the computer vision community should begin to tackle the challenging problem of assessing the quality of people’s actions because there are many important, real-world applications. For example, in health care, patients are often monitored and evaluated after hospitalization as they perform daily tasks, which is expensive undertaking without an automatic assessment method.

In sports, action quality assessments would allow an athlete to practice in front of Fig. 1: We introduce a learning framework for assessing the quality of human actions from videos. Since we estimate a model for what constitutes a high quality action, our "Lower Feet" "Stretch Hands" method can also provide feedback on how people can improve their acQuality of Action: 86.5 / 100

–  –  –

a camera and receive quality scores in real-time, providing the athlete with rapid feedback and an opportunity to improve their action. In retrieval, a video search engine may want to sort results based on the quality of the action performed instead of only the relevance.

However, automatically assessing the quality of actions is not an easy computer vision problem. Human experts for a particular domain, such as coaches or doctors, have typically been trained over many years to develop complex underlying rules to assess action quality. If machines are to assess action quality, then they must discover similar rules as well.

In this paper, we propose a data-driven method to learn how to assess the quality of actions in videos. To our knowledge, we are the first to propose a general framework for learning to assess the quality of human-based actions from videos. Our method works by extracting the spatio-temporal pose features of people, and with minimal annotation, estimating a regression model that predicts the scores of actions. Fig.1 shows an example output of our system.

In order to quantify the performance of our methods, we introduce a new dataset for action quality assessment comprised of Olympic sports footage. Although the methods in this paper are general, sports broadcast footage has the advantage that it is freely available, and comes already rigorously “annotated” by the Olympic judges. We evaluate our quality assessments on both diving and figure skating competitions. Our results are promising, and suggest that our method is significantly better at ranking people’s actions by their quality than non-expert humans. However, our method is still a long way from rivaling the performance of expert judges, indicating that there is significant opportunity in computer vision research to improve on this difficult yet important task.

Moreover, since our method leverages high level pose features to learn a model for action quality, we can use this model to help machines understand people in videos as well. Firstly, we can provide interpretable feedback to performers on how to improve the quality of their action. The red vectors in Fig.1 are output from our system that instructs the Olympic diver to stretch his hands and lower his feet. Our feedback system works by calculating the gradient for each body joint against the learned model that would have maximized people’s scores. Secondly, we can create highlights of videos by finding which segments contributed the most to the action quality, complementing work in video summarization. We hypothesize that further progress in building better quality assessment models can improve both feedback systems and video highlights.

The three principal contributions of this paper revolve around automatically assessing the quality of people’s actions in videos. Firstly, we introduce a general learning-based framework for the quality assessment of human actions using spatiotemporal pose features. Secondly, we then describe a system to generate feedback for performers in order to improve their score. Finally, we release a new dataset for action quality assessment in the hopes of facilitating future research on this task. The remainder of this paper describes these contributions in detail.

Assessing the Quality of Actions 3 2 Related Work

This paper builds upon several areas of computer vision. We briefly review related work:

Action Assessment: The problem of action quality assessment has been relatively unexplored in the computer vision community. There have been a few promising efforts to judge how well people perform actions [1–3], however, these previous works have so far been hand-crafted for specific actions. The motivation for assessing peoples actions in healthcare applications has also been discussed before [4], but the technical method is limited to recognizing actions.

In this paper, we propose a generic learning-based framework with state-of-theart features for action quality assessment that can be applied to most types of human actions. To demonstrate this generality, we evaluate on two distinct types of actions (diving and figure skating). Furthermore, our system is able to generate interpretable feedback on how performers can improve their action.

Photograph Assessment: There are several works that assess photographs, such as their quality [5], interestingness [6] and aesthetics [7, 8]. In this work, we instead focus on assessing the quality of human actions, and not the quality of the video capture or its artistic aspects.

Action Recognition: There is a large body of work studying how to recognize actions in both images [9–13] and videos [14–18], and we refer readers to excellent surveys [19, 20] for a full review. While this paper also studies actions, we are interested in assessing their quality rather than recognizing them.

Features: There are many features for action recognition using spatiotemporal bag-of-words [21, 22], interest points [23], feature learning [24], and human pose based [25]. However, so far these features have primarily been shown to work for recognition. We found that some of these features, notably [24] and [25] with minor adjustments, can be used for the quality assessment of actions too.

Video Summarization: This paper complements work in video summarization [26–31]. Rather than relying on saliency features or priors, we instead can summarize videos by discarding segments that did not impact the quality score of an action, thereby creating a “highlights reel” for the video.

3 Assessing Action Quality We now present our system for assessing the quality of an action from videos. On a high level, our model learns a regression model from spatio-temporal features.

After presenting our model, we then show how our model can be used to provide feedback to the people in videos to improve their actions. We finally describe how our model can highlight segments of the video that contribute the most to the quality score.

3.1 Features To learn a regression model to the action quality, we extract spatio-temporal features from videos. We consider two sets of features: low-level features that capPirsiavash, Vondrick, Torralba ture gradients and velocities directly from pixels, and high-level features based off the trajectory of human pose.

Low Level Features: Since there has been significant progress in developing features for recognizing actions, we tried using them for assessing actions too.

We use a hierarchical feature [24] that obtains state-of-the-art performance in action recognition by learning a filter bank with independent subspace analysis.

The learned filter bank consists of spatio-temporal Gabor-like filters that capture edges and velocities. In our experiments, we use the implementation by [24] with the network pre-trained on the Hollywood2 dataset [32].

High Level Pose Features: Since most low-level features capture statistics from pixels directly, they are often difficult to interpret. As we wish to provide feedback on how a performer can improve their actions, we want the feedback to be interpretable. Inspired by actionlets [25], we now present high level features based off human pose that are interpretable.

Given a video, we assume that we know the pose of the human performer in every frame, obtained either through ground truth or automatic pose estimation.

Let p(j) (t) be the x component of the jth joint in the tth frame of the video.

Since we want our features to be translation-invariant, we normalize the joint

positions relative to the head position:

q (j) (t) = p(j) (t) − p(0) (t)

where we have assumed that p(0) (t) refers to the head. Note that q (j) is a function of time, so we can represent it in the frequency domain by the discrete cosine transform (DCT): Q(j) = Aq (j) where A is the discrete cosine transformation matrix. We then use the k lowest frequency components to create the feature (j) vector φj = Q1:k where A1:k selects the first k rows of A. We found that only using the low frequencies helps remove high frequency noise due to pose estimation errors. We use the absolute value of the frequency coefficients Qi.

We compute φj for every joint for both the x- and y-components, and concatenate them to create the final feature vector φ. We note that if the video is long, we break it up into segments and concatenate the features to produce one feature vector for the entire video. This inreases the temporal resolution of our features for long videos.

Actionlets [25] uses a similar method with Discrete Fourier Transform (DFT) instead. Although there is a close relationship between DFT and DCT, we see better results using DCT. We believe this is the case since DCT provides a more compact representation. Additionally, DCT coefficients are real numbers instead of complex, so less information is lost in the absolute value operation.

In order to estimate the joints of the performer throughout the video p(j) (t), we run a pose estimation algorithm to find the position of the joints in every frame. We estimate the pose using a Flexible Parts Model [33] for each frame independently. Since [33] finds the best pose for a single frame using dynamic Assessing the Quality of Actions 5 Fig. 2: Pose Estimation Challenges: Some results for human pose estimation on our action quality dataset. Since the performers contort their body in unusual configurations, pose estimation is very challenging on our dataset.

programming and we want the best pose across the entire video, we find the N -best pose solutions per frame using [34]. Then we associate the poses using a dynamic programming algorithm to find the best track in the whole video. The association looks for the single best smooth track covering the whole temporal span of the video. Fig.2 shows some successes and failures of this pose estimation.

3.2 Learning

Pages:   || 2 | 3 |

Similar works:

«Buenos Dias, Artritis (Good Morning Arthritis) ~ GUIDE TO CONDUCTING A HEALTH COMMUNICATION CAMPAIGN TO REDUCE THE BURDEN OF ARTHRITIS IN THE HISPANIC POPULATION ~ Presented by: Centers for Disease Control and Prevention DEPARTMENT OF HEALTH AND HUMAN SERVICES Public Health Services Centers for Disease Control and Prevention (CDC) Atlanta, GA 30314-3724 Dear Colleague: I am pleased to introduce Buenos Dias, Artritis, a campaign to promote physical activity as a method of arthritis...»

«Hendrik Berth (Hrsg.) Psychologie und Medizin Der Band versammelt eine Reihe von Texten, die sich im weitesten Sinne den Schnittstellen von Psychologie und Medizin widmen. Das Buch entstand aus Anlass des 65. Geburtstags von Prof. Dr. phil. Friedrich Balck, Leiter der Medizinischen Traumpaar oder Vernunftehe? Psychologie und Medizinischen Soziologie am Universitätsklinikum Carl Gustav Carus Dresden von 1995 bis 2010. Inhaltlich werden durch verschiedene Autorenteams zahlreiche...»

«Marlen Bidwell-Steiner/CV & Publikationen MARLEN BIDWELL-STEINER Curriculum vitae PERSÖNLICHE DATEN 1964 in Lienz geboren, österreichische Staatsbürgerin Alleinerziehende Mutter (Julian 1988; Timothy 1994) 1080 Wien, Piaristengasse 9/3/5 ++43/1/9447433 marlen.bidwell-steiner@univie.ac.at www.univie.ac.at/bidwell-steiner/ AKTUELLE POSITION Elise-Richter-Habilitationsstelle (FWF Wissenschaftsfonds, V 148-G15) am Institut für Romanistik der Universität Wien (seit 03/2010); (Leiterin des...»

«Gesundheit ohne Grenzen 2. Gesundheitsforum Kooperationen in der stationären Versorgung 28. September 2005 in Basel GESUNDHEIT OHNE GRENZEN – dieses programmatische Thema hat die Länder, Kantone und Präfekturen entlang des Oberrheins veranlasst, auf dem 1. Gesundheitsforum 2003 über die Verbesserung der Mobilität von Gesundheitsdienstleistungen zu diskutieren. Der damalige Erfolg des Forums hat die Mitglieder der Oberrheinkonferenz veranlasst, diese Veranstaltung, dieses Mal in Basel,...»

«QDex : A database profiler for generic bio-data exploration and quality aware integration F. Moussouni1, L. Berti-Équille 2, G. Rozé1, O. Loréal, E. Guérin1 INSERM U522 CHU Pontchaillou, 35033 Rennes, France IRISA, Campus Universitaire de Beaulieu, 35042 Rennes, France Corresponding author : fouzia.moussouni@univ-rennes.fr Abstract: In human health and life sciences, researchers extensively collaborate with each other, sharing genomic, biomedical and experimental results. This necessitates...»

«-ITECHNISCHE UNIVERSITÄT MÜNCHEN Klinik für Psychiatrie und Psychotherapie Klinikum rechts der Isar (Direktor: Univ.-Prof. Dr. J. Förstl) Validierung des deutschen Examinati Addenbrooke´s Cognitive Examination (ACERevised (ACE-R) Elvira Schöll Vollständiger Abdruck der von der Fakultät für Medizin der Technischen Universität München zur Erlangung des akademischen Grades eines Doktors der Medizin genehmigten Dissertation. Vorsitzender: Univ.-Prof. Dr. E. J. Rummeny Prüfer der...»

«H Fellowship Assessment Handbook FELLOWSHIP Contact Details Australian College of Rural and Remote Medicine Level 2, 410 Queen Street, Brisbane, Qld 4000 GPO Box 2507, Brisbane, Qld 4001 P: (+61) 7 3105 8200 or 1800 223 226 F: (+61) 7 3105 8299 E: assessment@acrrm.org.au ABN: 12 078 081 848 ISSN ISSN 1447-1051 Copyright © Australian College of Rural and Remote Medicine 2016. All rights reserved. No part of this document may be reproduced by any means or in any form without express permission...»


«TECHNISCHE UNIVERSITÄT MÜNCHEN Klinik und Poliklinik für Augenheilkunde (Direktor: Univ.-Prof. Dr. Dr. (Lon.) Chr.-P. Lohmann) Stellenwert der PCR zur Diagnostik von Keratitis und Uveitis anterior Sophie Köhlein TECHNISCHE UNIVERSITÄT MÜNCHEN Klinik und Poliklinik für Augenheilkunde (Direktor: Univ.-Prof. Dr. Dr. (Lon.) Chr.-P. Lohmann) Stellenwert der PCR zur Diagnostik von Keratitis und Uveitis anterior Sophie Köhlein Vollständiger Abdruck der von der Fakultät für Medizin der...»

«Die subjektive Wahrnehmung der Prodromalsymptomatik bei akutem Myokardinfarkt Eine klinische Untersuchung zum Zusammenhang von somatischen und psychologischen Parametern Inaugural-Dissertation zur Erlangung des Grades eines Doktors der Medizin des Fachbereichs Humanmedizin der Justus-Liebig-Universität Giessen vorgelegt von Nicola Blum, geb. Voß aus Wermelskirchen Giessen 2001 Aus dem Medizinischen Zentrum für Psychosomatische Medizin Klinik für Psychosomatik und Psychotherapie Direktor:...»

<<  HOME   |    CONTACTS
2016 www.book.dislib.info - Free e-library - Books, dissertations, abstract

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.