High resolution and accurate Land Use and Land Cover mapping (LULC) datasets are increasingly important and can be widely used in monitoring climate change impacts in agriculture, deforestation, and the carbon cycle. These datasets represent physical classifications of land types and spatial information over the surface of the Earth. These LULC datasets can be leveraged in a plethora of research topics and industries to mitigate and adapt to environmental changes. High resolution urban classifications can be used to better monitor and estimate building albedo and urban heat island impacts, and accurate representation of forest and vegetation can even be leveraged to better monitor the carbon cycle and climate change through improve land surface modelling. The advent of machine learning (ML) based CV techniques over the past decade provide a viable option to automate LULC mapping. One impediment to this has been the lack of large ML datasets. Large vector datasets for LULC are available, but can’t be used directly by ML practitioners due to a knowledge gap in transforming the input to a dataset of paired satellite images and segmentation masks. We demonstrate a novel end-to-end pipeline for LULC dataset creation which takes vector land cover data and provides a training-ready dataset. We will use Sentinel-2 satellite imagery and the European Urban Atlas LULC data. The pipeline manages everything from downloading satellite data, to creating and storing encoded segmentation masks and automating data checks. We then use the resulting dataset to train a semantic segmentation model. The aim of the pipeline is to provide a way for users to create their own custom datasets using various combinations of multispectral satellite and vector data. In addition to presenting the pipeline, we aim to provide an introduction to multispectral imagery, challenges in using it for ML, and give a brief perspective of the future work in ML for Earth observation.