Skip to yearly menu bar Skip to main content

Affinity Workshop: Global South in AI

Novel method to preserve Sanskrit Shloka heritage using Transformer Language models and Semantic similarity

Chinmayi Ramasubramanian · Chandra Sekhar Gupta Aravapalli · Nishant Kumar


Sanskrit is an ancient and classical Indo-Aryan language that existed in South Asia since the Bronze Age. It is the sacred language of Hinduism and was used in ancient Vedic scriptures, Hindu philosophy, literature, mythological epics, and historical texts. Sanskrit’s impact on India’s culture is well known but despite efforts of revival, there are no first language speakers of Sanskrit now. However, Sanskrit is prevalent in many traditions of Hindus, Jains, and Buddhism even today in the form of chanting of Shlokas. The word shloka means 'song' and is a couplet of Sanskrit verses, which are repeated for spiritual benefit. Recent research into the effects of chanting has discovered a variety of benefits including gaining peace, feeling calm, and becoming more focused with positive energy. There are specific shlokas for specific benefits like removing obstacles, health, knowledge, happiness, etc. Unfortunately, most of this knowledge of which shloka is meant for which purpose is not well documented but is passed on by word of mouth by elders in the family. For example, a person with health issues consults family priests or grandparents to know which shloka to chant to bring about healing. This implementation methodology employed here consists of data collection, transformation and building an API to calculate semantic similarity. Sanskrit Shloka samples and their corresponding benefit in English were collected from sources and stored in a database. Currently, there is no such database that maps the Sanskrit Shloka to its benefit. There are existing databases with translation of the shlokas but there is no database that maps the Sanskrit shloka to its benefit. For example, there are shlokas that can be associated with both “learning” and “knowledge”. The benefit “knowledge” can have multiple shlokas associated(Eg: विद्या or ज्ञान related shlokas). Embeddings of each of the benefits are computed using BERT, a pre-trained Transformer Language model and stored in the database. An API is built that takes input from the user. The use gives the benefit he wants to gain as input (Eg: “knowledge”) and maps it to the database. Semantic similarity is computed using cosine similarity, using which the system recommends one or more suitable shlokas. (Eg: विद्या ददाति विनयं विनयाद् याति पात्रताम्।). In the future, inputs from different languages can be supported. Thus, this heritage can be preserved for many generations.

Chat is not available.