Timezone: »
We are interested in image manipulation via natural language text – a task that is extremely useful for multiple AI applications but requires complex reasoning over multi-modal spaces. Recent work on neuro-symbolic approaches has been quite effective in solving such tasks as they offer better modularity, interpretability, and generalizability. A noteworthy such approach is NSCL [25] developed for the task of Visual Question Answering (VQA). We extend NSCL for the image manipulation task and propose a solution referred to as NeuroSIM. Unlike previous works, which either require supervised data training or can only deal with very simple reasoning instructions over single object scenes; NeuroSIM can perform complex multi-hop reasoning over multi-object scenes and requires only weak supervision in the form of annotated data for the VQA task. On the language side, NeuroSIM contains neural modules that parse an instruction into a symbolic program that guides the manipulation. These programs are based on a Domain Specific Language (DSL) comprising object attributes as well as manipulation operations. On the perceptual side, NeuroSIM contains neural modules which first generate a scene graph of the input image and then change the scene graph representation in accordance with the parsed instruction. To train these modules, we design novel loss functions that are capable of testing the correctness of manipulated object and scene graph representations via query networks that are trained merely on the VQA dataset. An image decoder is trained to render the final image from the manipulated scene graph representation. The entire NeuroSIM pipeline is trained without any intermediate supervision. Extensive experiments demonstrate that our approach is highly competitive with state-of-the-art supervised baselines.
Author Information
Harman Singh (Meta)
AI Resident at Meta
Poorva Garg (University of California, Los Angeles)
Mohit Gupta (Indian Institute of Technology, Delhi)
Kevin Shah (Indian Institute of Technology Delhi)
Arnab Kumar Mondal (Indian Institute of Technology Delhi)
I have received my Bachelor of Engineering in Electronics and Telecommunication from Jadavpur University, India in 2013. Right after my graduation, I joined Centre for Development of Telematics (C-DOT), Delhi, and served as a research engineer there until July 2018. In C-DOT I had the opportunity to participate in cutting-edge projects such as the Dense Wavelength Division Multiplexing (DWDM) and Packet Optical Transport Platform (P-OTP). I joined IIT Delhi as a Ph.D. scholar in July, 2018 under the guidance of Prof. Prathosh AP and Prof. Parag Singla. My research interests lie primarily within the field of deep generative models and applied deep learning. My Ph.D. is supported by the Prime Minister's Research Fellows (PMRF) Scheme by Govt. of India.
Dinesh Khandelwal (IBM Research AI)
I am a Research Scientist in AI Reasoning group working at IBM Research Lab, Delhi, India. I have completed Ph.D. in Machine Learning from IIT Delhi. My primary research interests lie in the area of Deep Learning, Probabilistic Graphical Models, and Question Answering. I also holds a Master's degree in Machine Learning from IISc Bangalore.
Parag Singla (Indian Institute of Technology Delhi)
Dinesh Garg (IBM Research AI, India)
More from the Same Authors
-
2022 : Learning Neuro-symbolic Programs for Language-Guided Robotic Manipulation »
Namasivayam Kalithasan · Himanshu Singh · Vishal Bindal · Arnav Tuli · Vishwajeet Agrawal · Rahul Jain · Parag Singla · Rohan Paul -
2022 : Few Shot Generative Domain Adaptation Via Inference-Stage Latent Learning in GANs »
Arnab Kumar Mondal · Piyush Tiwary · Parag Singla · Prathosh AP -
2022 Poster: A Solver-free Framework for Scalable Learning in Neural ILP Architectures »
Yatin Nandwani · Rishabh Ranjan · - Mausam · Parag Singla -
2020 Poster: Inductive Quantum Embedding »
Santosh Kumar Srivastava · Dinesh Khandelwal · Dhiraj Madan · Dinesh Garg · Hima Karanam · L Venkata Subramaniam -
2019 Poster: Quantum Embedding of Knowledge for Reasoning »
Dinesh Garg · Shajith Ikbal Mohamed · Santosh Kumar Srivastava · Harit Vishwakarma · Hima Karanam · L Venkata Subramaniam -
2019 Poster: A Primal Dual Formulation For Deep Learning With Constraints »
Yatin Nandwani · Abhishek Pathak · Mausam · Parag Singla -
2018 : Spotlights 2 »
Mausam · Ankit Anand · Parag Singla · Tarik Koc · Tim Klinger · Habibeh Naderi · Sungwon Lyu · Saeed Amizadeh · Kshitij Dwivedi · Songpeng Zu · Wei Feng · Balaraman Ravindran · Edouard Pineau · Abdulkadir Celikkanat · Deepak Venugopal -
2015 Poster: Fast Lifted MAP Inference via Partitioning »
Somdeb Sarkhel · Parag Singla · Vibhav Gogate -
2015 Poster: Lifted Inference Rules With Constraints »
Happy Mittal · Anuj Mahajan · Vibhav Gogate · Parag Singla -
2015 Poster: Lifted Symmetry Detection and Breaking for MAP Inference »
Timothy Kopp · Parag Singla · Henry Kautz -
2014 Poster: An Integer Polynomial Programming Based Framework for Lifted MAP Inference »
Somdeb Sarkhel · Deepak Venugopal · Parag Singla · Vibhav Gogate -
2014 Poster: New Rules for Domain Independent Lifted MAP Inference »
Happy Mittal · Prasoon Goyal · Vibhav Gogate · Parag Singla