A typical movie scene is comprised of a number of different shots, edited together to form a narrative thread. The intricate transitions between different shots within a movie scene allow filmmakers to tell the story or convey a message in a clear and vivid manner. As a result of the complexity in the interactions between individuals and their actions within a movie scene, a major challenge in movie semantic understanding is that of scene segmentation, where the goal is to identify the individual scenes within a movie. A key part of the challenge with scene segmentation is the fact that a movie scene may be comprised of multiple uncut shots filmed over an uninterrupted period of time, leading to a visually discontinuous yet semantically coherent segment. Therefore, while separating a movie into individual shots can be accomplished based on visual continuity between frames, separating a movie into individual scenes requires a much deeper understanding of the semantics of a film and the relationship between shots that are semantically consistent but physically distinct.