In digital media, a standard concept is the idea of layers. Layers enable artist to manipulate independent groups of objects in a disentangled way and organize a scene from back to front. Manually extracting these layers can be time-consuming, since it involves repeatedly segmenting groups of content over many frames. In , Lu et al. coin the problem of automatically decomposing a video into these layers as creating an Omnimatte. More specifically, given an initialization of each layer with a sequence of object masks, everything associated with that object must be attached to the same layer. This includes effects such as shadows and reflections. The current method of extracting Omnimattes takes approximately two hours per video and must be optimized from scratch for every new video. Motivated by the challenge of more quickly producing Omnimattes, we have designed and tested a train-and-evaluate network that generates Omnimattes for new videos with a single forward pass. Our network builds on the idea of learned gradient descent, a setup which has also been applied to generate multi-plane images used to render novel views of a scene. Initial results show that this approach can generate meaningful decompositions of videos into foreground and background layers.