RT @johnjnay: LLM Alignment Limitations
-There exist prompts that trigger LLM into outputting practically any bad behavior, with prob increasing w/ prompt length
-RLHF can make things worse
-Mimicking personas that demonstrate bad behavior is efficient way to evoke it