Topic Models (TM) are statistical models to learn latent topics present in a collection of text documents. These topics are usually not independent and represent concepts that are related hierarchically. Flat TMs such as LDA fail to capture this inherent hierarchy. To overcome these limitations, Hierarchical Topic Model (HTM) have been proposed that discover latent topics while preserving the inherent hierarchical structure between different topics (for example, aspect hierarchies in reviews and research topic hierarchies in academic repositories).
Despite showing great promise, the state-of-the-art HTMs fail to capture coherent hierarchical structures. Also the number of topics in each level is usually unknown and is determined empirically, wasting a lot of time and resources. Finally, HTMs have very long training time, making them unsuitable for real-time production environments. Thus, there is a need for HTMs that (i) offers good hierarchical structures and more meaningful and interpretable topics and that (ii) can automatically find the number of topics at each level without multiple training iterations.
In our work, we address the problems mentioned above by utilizing properties of hyperbolic geometry that has been successfully applied in learning hierarchical structures such as ontologies in Knowledge bases, and latent hierarchy between words. Our initial experiments have yielded promising results where the training time has reduced from weeks to less than an hour, and the quantitative metrics have improved with significantly better hierarchical structures. We have attached an abstract with a brief overview of our approach and results.