Interaction recognition from multi-person videos is a challenging yet essential task in computer vision. Often the videos depict actions with multiple actors involved, some of whom participate in the main event, and the rest are present in the scene without being part of the actual event. This paper proposes a model to tackle the problem of interaction recognition from multi-person videos. Our model consists of a Recurrent Neural Network (RNN) equipped with a time-varying attention mechanism. It receives scene features and localized actors features to predict the interaction class. Additionally, the attention model identifies the people responsible for the main event. We chose penalty classification from ice hockey broadcast videos as our application. These videos are multi-persons and depict complex interactions between players in a non-laboratory recording setup. We evaluate our model on a new dataset of icehockey penalty videos and report 93.93% classification accuracy. We include a qualitative analysis of the attention mechanism by visualizing the attention weights.