PUBHOMICS: A Multispecies Biological Dataset to Catalyze AI-Driven Toxicity Assessment for Environmental and Public Health
Abstract
Environmental and public health remain under served by the recent data revolution that enabledmajor AI advances in drug discovery. Existing toxicity datasets are biased toward drug-like molecules andare fragmented across repositories, limiting their use for machine learning and cross-species translation.We propose PUBHOMICS, a scalable, openly shareable dataset capturing transcriptional responses toenvironmentally relevant chemical perturbations across cell types, organs, and species. PUBHOMICSwill expand chemical coverage to classes absent from existing resources, enable AI models to predicttranscriptomic responses to novel exposures, and support mechanism-based toxicity prediction withcross-species translation for regulatory decision-making. By advancing exposomics toward causation andproviding a foundation for New Approach Methodologies (NAMs), PUBHOMICS aims to accelerateregulatory adoption and enable “benign-by-design” strategies that bridge exposure science with systemsbiology.