Automatic scene clustering and tracking in videos from multiple sources

Proceedings of EI - Mobile Devices and Multimedia: Enabling Technologies, Algorithms, and Applications 2016

F. L. M. Milotta, S. Battiato, F. Stanco
Image Processing Laboratory - University of Catania
{milotta, battiato, fstanco}

V. D'Amico, G. Torrisi, L. Addesso
JOL WAVE - Telecom Italia
{valeria1.damico, giovanni.torrisi, luca.addesso}

Download input video sequences

We have used RECfusion Dataset (2015) [1,2].


Download Ground Truth and Validation results

We have manually labeled the 3 scenarios of RECfusion Dataset (2015) [1,2], in order to define a set of ground truth clusters to estimate the quality and soundness of the proposed approach. Each scenario consists of 3 clusters.

This archive contains both the the Ground Truth and the Validation results.

Download Ground Truth and Validation results

Video results

This archive contains results of cluster tracking on the 3 scenarios of the dataset. Moreover, it includes also an example of output sequence: it shows what it the most representative video stream for each cluster and what is the most popular video stream [1]. It is useful to compare the advantages of the logged-clusters update phase.

Download video results


Image Demo 1

Figure 1. Example of Cluster Tracking on four different time slots of video sequence ``Foosball Room'' (from RECfusion dataset [1,2]).


Image Demo 2

Figure 2. Example of Cluster Tracking on two different time slots of video sequence ``Meeting'' (from RECfusion dataset [1,2]).

Experimental Results

The comparison phase between computed centroids (clusters) and logged ones implies a definition of distance between centroids and, consequently, of a threshold TLC. We supposed that almost near (similar) centroids, hence clusters, have to be considered as the same semantic cluster. Although TLC=1 in ref.[1], in our experimentations we exploited our new manually labeled dataset to investigate the existence of a better threshold value for cluster tracking purpose. We tested several thresholds values and then we compared True Positive Rate (TPR, also called Recall), True Negative Rate (TNR, also called Specificity) and Accuracy of the Cluster Tracking Method for each tested threshold. Experimentally, we derived the best threshold value for TLC equal to 0.15, in which we gained the highest mean TPR value (86%) and among the highest mean TNR (98%) and mean Accuracy (99%) values.



[1] A. Ortis, G.M. Farinella, V. D’Amico, G. Torrisi, L. Addesso, and S. Battiato, RECfusion: Automatic Video Curation Driven by Visual Content Popularity, Proc. ACM Multimedia, pg. 1179–1182 (2015).
[2] RECfusion website: