**海行（うみゆき）** @umiyuki@mstdn.soysoftware.net · 2023-04-24T04:16:39Z

海行（うみゆき） @umiyuki@mstdn.soysoftware.net

海行（うみゆき） @umiyuki@mstdn.soysoftware.net

RT @johnjnay: LLM Alignment Limitations

-There exist prompts that trigger LLM into outputting practically any bad behavior, with prob increasing w/ prompt length
-RLHF can make things worse
-Mimicking personas that demonstrate bad behavior is efficient way to evoke it

https://t.co/bqB5uVfNxQ https://t.co/e7MMSRwXrt

Apr 24, 2023, 04:16 · From Twitter · · ·

Resources

Developers

What is Mastodon?

mstdn.soysoftware.net

More…