While paraphrasing is a promising approach for data augmentation in classification tasks, its effect on named entity recognition (NER) is not investigated systematically due to the difficulty of span-level label preservation. In this paper, we utilize simple strategies to annotate entity spans in generations and compare established and novel methods of paraphrasing in NLP such as back translation, specialized encoder decoder models such as Pegasus, and GPT-3 variants for their effectiveness in improving downstream performance for NER across different levels of gold annotations and paraphrasing strength on 5 datasets. We also analyze the quality of generated paraphrases based on their entity preservation and paraphrasing language quality. We find that the choice of the paraphraser greatly impacts NER performance, with one of the larger GPT-3 variants exceedingly capable at generating high quality paraphrases, improving performance in most cases, and not hurting others, while other paraphrasers show more mixed results. We also find inline auto annotations generated by larger GPT-3 to be strictly better than heuristic based annotations. We find diminishing benefits of paraphrasing as gold annotations increase for most datasets. While larger GPT-3 variants perform well by both entity preservation and human evaluation of language quality, those two metrics do not necessarily correlate with downstream performance for other paraphrasers.