This is a PyTorch/GPU implementation of the paper Vision Foundation Models as Effective Visual Tokenizers for Autoregressive Generation, which directly utilizes the features from the frozen ...