In this study, we propose a deep learning architecture that employs both a text representation of molecules and a graph-based strategy with an attention mechanism to learn and predict aqueous solubility. The core contributions of this work are as follows. (1) We treat aqueous solubility prediction as a translation problem. Our architecture represents an encoder-decoder design. However, in order to learn a latent representation, our main encoder consists of two subencoders, i.e., a graph encoder and an encoder that employs a Transformer. We call this architecture M2M. (2) To address the problem of the availability of limited amounts of high-quality data and to increase the aqueous solubility prediction performance, transfer learning is incorporated. Therefore, we first pretrain the model on pKa dataset that consists of more than 6000 chemical compounds. Then, the learned knowledge is transferred to be used on a smaller water solubility dataset. The final architecture is called TunedM2M. (3) We demonstrate that the proposed method outperforms the state-of-the-art approaches, obtaining an RMSE of 0.587 during both cross-validation and a test on an independent dataset. To be more precise, the model is evaluated on molecules downloaded from the Online Chemical Database and Modeling Environment (OCHEM). Beyond aqueous solubility prediction, the strategy presented in this work may be useful for modeling any kind of (chemical or biological) properties for which there is a limited amount of data available for model training.