Language is vast, and the expressions found throughout the globe are as diverse as there is cultural richness. With new interfaces that require understanding and generation of responses to provide services, models are expected to be resilient and generalizable to be helpful. However, real models, especially ones formulated for text problems, are known to be brittle and difficult to generalize. Several hypotheses exist on improving such models, such as introducing more inductive bias, adding more context to the models, and evaluating unseen distributions and tasks. We aim to dissect the problem of generalization of the classification of sarcasm by evaluating texts from different regions of Latinamerica with different regionalizations of the Spanish languages. This is done with a model and human annotation to assess the agreement between measurements. Results are then explored in the three dimensions of potential areas of improvement mentioned before, providing a guideline for the next steps for bettering the model to increase its resilience.