Skip to yearly menu bar Skip to main content

Workshop: New Frontiers in Graph Learning

Multimodal Video Understanding using Graph Convolutional Network

Ayush Singh · Vikram Gupta


Majority of existing semantic video understanding methods process every video independently without considering the underlying inter-video relationships. However, videos uploaded by individuals on social media platforms like YouTube, Instagram etc. exhibit inter-video relationship which are a reflection of individual’s interest, geography, culture etc. In this work, we explicitly attempt to model this inter-video relationship, originating from the creators of these videos using Graph Neural Networks (GNN) in a multimodal setup. We perform video classification by leveraging the creators of the videos and semantic similarity between for creating edges between videos and observe improvements of 4% in accuracy

Chat is not available.