Skip to yearly menu bar Skip to main content

Affinity Workshop: Women in Machine Learning

P53 in Ovarian Cancer: Heterogenous Analysis of KeyBERT, BERTopic, PyCaret and LDAs methods

Mary Adewunmi · Richard Oveh · Christopher Yeboah · Solomon Olorundare · Ezeobi Peace


In recent times, researchers with Computational backgrounds have found it easier to relate to Artificial Intelligence by advancing the transformer model and unstructured medical data. This paper explores the heterogeneity of keyBERT, BERTopic, PyCaret and LDAs as key phrase generators and topic model extractors with P53 in ovarian cancer as a use case. PubMed abstract on mutant p53 was first extracted with the Entrez-global database and then preprocessed with regex. KeyBERT was used to extract keyphrases, and BERTopic modelling was used for removing the related themes. PyCaret was further used for unigram topics and LDAs for examining the interaction among the topics in the word corpus. Lastly, the Jaccard similarity index was used to check the similarity among the four methods. The results showed no relationship exists with keyBERT, having a score of 0.0, while a relationship exists among the three other topic models with scores of 0.095, 0.235, 0.4 and 0.111. Based on the result, it was observed that keywords, keyphrases, similar topics, and entities embedded in the data could be used in a closely related framework, which can give insights into medical data for modelling.

Chat is not available.