WWW.BOOK.DISLIB.INFO
FREE ELECTRONIC LIBRARY - Books, dissertations, abstract
 
<< HOME
CONTACTS



Pages:     | 1 || 3 |

«Abstract. While recent advances in computer vision have provided reliable methods to recognize actions in both images and videos, the problem of ...»

-- [ Page 2 ] --

We then pose quality assessment as a supervised regression problem. Let Φi ∈ Rk×n be the pose features for video i in matrix form where n is the number of joints and k is the number of low frequency components. We write yi ∈ R to denote the ground-truth quality score of the action in video i, obtained by an expert human judge. We then train a linear support vector regression (L-SVR) [35] to predict yi given features Φi over a training set. In our experiments, we use libsvm [36]. Optimization is fast, and takes less than a second on typical sized problems. We perform cross validation to estimate hyperparameters.

Domain Knowledge: We note that a comprehensive model for quality assessment might use domain experts to annotate fine-tuned knowledge on the action’s quality (e.g., “the leg must be straight”). However, relying on domain experts is expensive and difficult to scale to a large number of actions. By posing quality assessment as a machine learning problem with minimal interaction from an expert, we can scale more efficiently. In our system, we only require a single real number per video corresponding to the score of the quality.

Prototypical Example: Moreover, a fairly simple method to assess quality is to check the observed video against a ground truth video with perfect execution, and then determine the difference. However, in practice, many actions can have multiple ideal executions (e.g., a perfect overhand serve might be just as good as a perfect underhand serve). Instead, our model can handle multi-modal score distributions.

6 Pirsiavash, Vondrick, Torralba

3.3 Feedback Proposals As a performer executes an action, in addition to assessing the quality, we also wish to provide feedback on how the performer can improve his action. Since our regression model operates over pose-based features, we can determine how the performer should move to maximize the score.

We accomplish this by differentiating the scoring function with respect to joint location. We calculate the gradient of the score with respect to the location of each joint ∂p∂S(t) where S is the scoring function. By calculating the maximum (j)

–  –  –

performer must move to most improve the score.1

3.4 Video Highlights In addition to finding the joint that will result in the largest score improvement, we also wish to measure the impact a segment of the video has on the quality score. Such a measure could be useful in summarizing the segments of actions that contribute to high or low scores.

We define a segment’s impact as how much the quality score would change if the segment were removed. In order to remove a segment, we compute the most likely feature vector had we not observed the missing segment. The key observation is that since we only use the low frequency components in our feature vector, there are more equations than unknowns when estimating the DCT coefficients. Consequently, removing a segment corresponds to simply removing some equations.

Let B = A+ be the inverse cosine transform where A+ is the psuedo-inverse of A. Then, the DCT equation can be written as Q(j) = B + q (j). If the data from We do not differentiate with respect to the head location because it is used for normalization.

Assessing the Quality of Actions 7

–  –  –

Fig. 3: Interpolating Segments: This schematic shows how the displacement vector changes when a segment of the video is removed in order to compute impact. The dashed curve is the original displacement, and the solid curve is the most likely displacement given observations with a missing segment.

–  –  –

4 Experiments In this section, we evaluate both our quality assessment method and feedback system for quality improvement with quantitative experiments. Since quality assessment has not yet been extensively studied in the computer vision community, we first introduce a new video dataset for action quality assessment.

4.1 Action Quality Dataset There are two primary hurdles in building a large dataset for action quality assessment. Firstly, the score annotations are subjective, and require an expert.

Unfortunately, hiring an expert to annotate hundreds of videos is expensive.

Secondly, in some applications such as health care, there are privacy and legal issues involved in collecting videos from patients. In order to establish a baseline dataset for further research, we desire freely available videos.

We introduce an Olympics video dataset for action quality assessment. Sports footage has the advantage that it can be obtained freely, and the expert judge’s scores are frequently released publicly. We collected videos from YouTube for two categories of sports, diving and figure skating, from recent Olympics and other worldwide championships. The videos are long with multiple instances of actions performed by multiple people. We annotated the videos with the start and end frame for each instance, and we extracted the judge’s score. The dataset will be publicly available.

8 Pirsiavash, Vondrick, Torralba Fig. 4: Diving Dataset: Some of the best dives from our diving dataset. Each column corresponds to one video. There is a large variation in the top-scoring actions. Hence, providing feedback is not as easy as pushing the action towards a canonical ”good” performance.





Fig. 5: Figure Skating Dataset: Sample frames from our figure skating dataset.

Notice the large variations of routines that the performers attempt. This makes automatic pose estimation challenging.

Diving: Fig.4 shows a few examples of our diving dataset. Our diving dataset consists of 159 videos. The videos are slow-motion from television broadcasting channels, so the effective frame rate is 60 frames per second. Each video is about 150 frames, and the entire dataset consists of 25,000 frames. The ground truth judge scores varies between 20 (worst) and 100 (best). In our experiments, we use 100 instances for training and the rest for testing. We repeated every experiment 200 times with different random splits and averaged the results. In addition to the Olympic judge’s score, we also consulted with the MIT varsity diving coach who annotated which joints a diver should adjust to improve each dive. We use this data to evaluate our feedback system for the quality improvement algorithm.

Figure Skating: Fig.5 shows some frames from our figure skating dataset.

This dataset contains 150 videos captured at 24 frames per second. Each video is almost 4,200 frames, and the entire dataset is 630,000 frames. The judge’s score ranges between 0 (worst) and 100 (best). We use 100 instances for training and the rest for testing. As before, we repeated every experiment 200 times with different random splits and averaged the results. We note that our figure skating tends to be more challenging for pose estimation since it is at a lower frame rate, and has more variation in the human pose and clothing (e.g., wearing skirt).

Assessing the Quality of Actions 9

–  –  –

4.2 Quality Assessment We evaluate our quality assessment on both the figure skating and diving dataset.

In order to compare our results against the ground truth, we use the rank correlation of the scores we predict against the scores the Olympic judges awarded.

Tab.1 and Tab.2 show the mean performance over random train/test splits of our datasets. Our results suggest that pose-based features are competitive, and even obtain the best performance on the diving dataset. In addition, our results indicate that features learned to recognize actions can be used to assess the quality of actions too. We show some of the best and worst videos as predicted by our model in Fig.6.

We compare our quality assessment against several baselines. Firstly, we compare to both space-time interest points (STIP) and pose-based features with Discrete Fourier Transform (DFT) instead of DCT (similar to [24]). Both of these features performed worse. Secondly, we also compare to ridge regression with all feature sets. Our results show that support vector regression often obtains significantly better performance.

We also asked non-expert human annotators to predict the quality of each diver in the diving dataset. Interestingly, after we instructed the subjects to read the Wikipedia page on diving, non-expert annotators were only able to achieve a rank correlation of 19%, which is half the performance of support vector regression with pose features. We believe this difference is evidence that our algorithm is starting to learn which human poses constitute good dives. We note, however, that our method is far from matching Olympic judges since they are able to predict the median judge’s score with a rank correlation of 96%, suggesting that there is still significant room for improvement.2 Olympic diving competitions have two scores: the technical difficulty and the score.

The final quality of the action is then the product of these two quantities. Judges are 10 Pirsiavash, Vondrick, Torralba Fig. 6: Examples of Diving Scores: We show the two best and worst videos sorted by the predicted score. Each column is one video with ground truth and predicted score written below. Notice that in the last place video, the diver lacked straight legs in the beginning and did not have a tight folding pose. These two pitfalls are part of common diving advice given by coaches, and our model has learned this independently.

4.3 Limitations

While our system is able to predict the quality of actions with some success, it has many limitations. One of the major bottlenecks is the pose estimation. Fig.2 shows a few examples of the successes and failures of the pose estimation. Pose estimation in our datasets is very challenging since the performers contort their body in many unusual configurations with significant variation in appearance.

The frequent occlusion by clothing for figure skating noticeably harms the pose estimation performance. When the pose estimation is poor, the quality score is strongly affected, suggesting that advances in pose estimation or using depth sensors for pose can improve our system. Future work in action quality can be made robust against these types of failures as well by accounting for the uncertainty in the pose estimation.

Our system is designed to work for one human performer only, and does not model coordination between multiple people, which is often important for many types of sports and activities. We believe that future work in explicitly modeling team activities and interactions can significantly advance action quality assessment. Moreover, we do not model objects used during actions (such as sports balls or tools), and we do not consider physical outcomes (such as splashes told the technical difficulty apriori, which gives them a slight competitive edge over our algorithms. We did not model the technical difficulty in the interest of building a general system.

Assessing the Quality of Actions 11 Fig. 7: Diving Feedback Proposals: We show feedback for some of the divers.

The red vectors are instructing the divers to move their body in the direction of the arrow. In general, the feedback instructs divers to tuck their legs more and straighten their body before entering the pool.

in diving), which may be important features for some activities. Finally, while our representation captures the movements of human joint locations, we do not explicitly model their synchronization (e.g., keeping legs together) or repetitions (e.g., waving hands back and forth). We suspect a stronger quality assessment model will factor in these visual elements.

4.4 Feedback for Improvement

In addition to quality assessment, we evaluate the feedback vectors that our method provides. Fig.7 and Fig.8 show qualitatively a sample of the feedback that our algorithm suggests. In general, the feedback is reasonable, often making modifications to the extremities of the performer.



Pages:     | 1 || 3 |


Similar works:

«Klinik für Orthopädie und Unfallchirurgie der Technischen Universität München Klinikum rechts der Isar (Direktor: Univ.-Prof. Dr. R. Gradinger) Positronen-Emissions-Tomographie mit F-18-Fluordeoxyglucose (FDG-PET) zur präoperativen Infektionsdiagnostik bei Endoprothesenlockerung Veronika Heizer Vollständiger Abdruck der von der Fakultät für Medizin der Technischen Universität München zur Erlangung des akademischen Grades eines Doktors der Medizin genehmigten Dissertation....»

«Volume 3: 2010-2011 ISSN: 2041-6776 School of English Studies Excreta, Ejaculate and the Emetic: the Role of the Abject in Ulysses Rory Byrne It is [.] not lack of cleanliness or health that causes abjection but what disturbs identity, system, order. What does not respect borders, positions, rules. The in-between, the ambiguous, the composite. —Julia Kristeva, Powers of Horror.1 The abject, as defined by Julia Kristeva, is that which is located at the borders of two positions; it is ‘beyond...»

«Environmental burden of disease Environmental burden of disease associated with inadequate housing associated with inadequate housing burden of disease den of disease A method guide to the quantification of health effects A method guide to the quantification of health effects of selected housing risks in the WHO European Region of selected housing risks in the WHO European Region A method guide to the quantification of health effects Edited by of selected housing risks in the WHO European...»

«Nonprofit or For-profit? Hospital Conversion Considerations By Rebecca Bales, MPA, ASA, Kelly Tiberio, and Tara Tesch, MHSA I. Summary Purpose This white paper was prepared by The Camden Group to provide an overview of the following topics relating to nonprofit hospitals:  Key nonprofit hospital characteristics that differ from that of a for-profit hospital, including requirements for maintaining nonprofit status.  How nonprofit and for-profit hospitals are reacting to the changing...»

«State of California Office of the Attorney Department of Justice General Examination Bulletin P.O. BOX 944255 Sacramento, CA 94244-2550 ASSOCIATE PERSONNEL ANALYST 4JUAB EXAM CODE: CONTINUOUS – PROMOTIONAL EXAM TYPE: SACRAMENTO SPOT LOCATION: $4600 $5758* SALARY INFORMATION: *The salaries used in this bulletin are the latest available from the state controller’s office, but may not reflect the most recent salary adjustment. 5142/KY90 CLASS & SCHEM CODE: EQUAL EMPLOYMENT & DRUG FREE...»

«White Paper EMC DATA DOMAIN EXTENDED RETENTION SOFTWARE A Detailed Review Abstract This white paper introduces EMC® Data Domain® Extended Retention software that increases the storage scalability of a Data Domain system to enable cost-effective long term retention of backup data on deduplicated disk. In addition to the high-speed, inline deduplication needed to satisfy the data protection needs of the enterprises, Data Domain systems now offer significant optimization for long term cost...»

«Georgia Department of Education Health Science Career Cluster Essentials of Healthcare Course Number: 25.44000 Course Description: Anatomy and Physiology is a vital part of most healthcare post-secondary education programs. The Essentials of Healthcare is a medical-focused anatomy course addressing the physiology of each body system, along with the investigation of common diseases, disorders and emerging diseases. The prevention of disease and the diagnosis and treatment that might be utilized...»

«Journal of Social, Evolutionary, and Cultural Psychology www.jsecjournal.com 2010, 4(4), 265-276. Proceedings of the 4th Annual Meeting of the NorthEastern Evolutionary Psychology Society Original Article HOW WE VIEW THOSE WHO DEROGATE: PERCEPTIONS OF FEMALE COMPETITOR DEROGATORS Maryanne Fisher* Department of Psychology, Saint Mary’s University Sarah Shaw Department of Psychology, St. Mary’s University Kerry Worth Faculty of Medicine, University of Ottawa Lauren Smith Department of...»

«Aus dem Institut für Radiologie der Medizinischen Fakultät Charité – Universitätsmedizin Berlin DISSERTATION Diagnostische Wertigkeit und klinischer Einsatz der dynamischen, atemgetriggerten Computertomographie des Thorax im Vergleich zur Lungenfunktionsdiagnostik und Lungenperfusionsszintigraphie zur Erlangung des akademischen Grades Doctor medicinae (Dr. med.) vorgelegt der Medizinischen Fakultät Charité – Universitätsmedizin Berlin von Liv Nora Brune aus Wiesbaden Gutachter/in: 1....»

«Vorbereitung Nach meinem dreimonatigem Krankenpflegepraktikum in Greenock im Westen Schottlands und den sehr aufschlussreichen Erfahrungen, war für mich klar, dass weitere Auslandsaufenthalte im Laufe meines Medizinstudiums folgen sollten. Ich plante das Chirurgie-Tertial (16 Wochen) meines Praktischen Jahres (PJ im elften und zwölften Semester) in einem französischsprachigen Land zu absolvieren. Einerseits um meine Französisch-Kenntnisse aufzufrischen. Andererseits um dem Mythos auf die...»





 
<<  HOME   |    CONTACTS
2016 www.book.dislib.info - Free e-library - Books, dissertations, abstract

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.