CanViT (Canvas Vision Transformer) is a scalable recurrent architecture for the Active Vision Foundation Model (AVFM) era.

See https://github.com/m2b3/CanViT-PyTorch for more.