University:
Boston University
Course:
ENG EC 410 | Introduction to Electronics
Academic year:

2022
Views:

400

Pages:

6
Author:

Deja Hood

Chapter 5 Practical considerations We have described our proposed framework for action recognition, as well as two examples of localized features. There are still some practical considerations in action recognition that we need to consider (Guo et al., 2010b). 5.1 Processing continuous video In practical cases, we usually need to recognize actions in a continuous video. However, with limited memory, sometimes it is impossible to process the whole video. It is therefore necessary to partition a video into segments (Fig. 5·1), and each time we just need to process a video segment. Additionally, using video segments can enhance the robustness of our action recognition approach. A query video, without being partitioned into segments, can only lead to one feature log-covariance matrix. If it is misclassified, there will be no chance to obtain the correct label for the query action. On the contrary, if the query video is divided into segments, it can still be correctly classified even if a few segments are misclassified. 5.2 Length of segments Many actions, such as walking, running, jumping etc., are roughly repetitive (Fig. 5·2). Therefore, for repetitive actions, an appropriate segment length is the approximate number of frames in an action period. The typical period for many human actions is on the order of 0.4-0.8 second (except for very fast or very slow motion). For a 43 44 Figure 5·1: Illustration of video segments. camera operating at 25 fps, the typical length of an action segment is 10–20 frames. Figure 5·2: Illustration of the repetitive nature of action “walking” 45 5.3 Temporal misalignment Suppose there are two videos A and B that include the same action class. Both of them are partitioned into video segments. It is possible that a video segment from A cannot find a good match in video B, due to temporal misalignment. This motivates the need to break a video sequence into successive overlapping action segments, as shown in Fig. 5·3. By doing so, actions in video segments can be better synchronized and it is more likely to find well-matched segments. Overlapping video segments can also enrich the training set so that action classification can be more reliable. Figure 5·3: Illustration of overlapping segments. 5.4 Majority rule After partitioning a query video into overlapping segments, we can apply our action recognition approach to each video segment and obtain a sequence of action labels. Suppose each query video only involves one action class, then segment-level action labels can be fused into sequence-level decision based on the majority rule, which take the most frequent segment label as the label of sequence. An example is shown in Fig. 5·4. 46 Figure 5·4: Illustration of the majority rule. 5.5 Summary of overall approach Based on our action recognition framework and practical considerations, the overall approach can be summarized as follows: 1. Partition videos into overlapping action segments (Fig. 5·5); 2. Represent each action segment using the log-covariance matrix of features (Fig. 5·6); 3. Classify each query segment based on NN or SLA classification algorithms (Fig. 5·7); 4. Use the majority rule to decide the action label of the query video (Fig. 5·4). 47 Figure 5·5: Illustration of segment partitioning. Figure 5·6: Illustration of action representation. 48 Figure 5·7: Illustration of action classification.