Follow

RT @_akhaliq: Scaling Transformer to 1M tokens and beyond with RMT

Recurrent Memory Transformer retains information across up to 2 million tokens.

During inference, the model effectively utilized memory for up to 4,096 segments with a total length of 2,048,000 tokens—significantly exceeding… t.co/MbIegSfyb0 t.co/Axggo0nSoH

Sign in to participate in the conversation
Mastodon

海行の個人インスタンスです。
よろしくどうぞ。

ホームページ
http://soysoftware.sakura.ne.jp/