Vision-Language Models for Vision Tasks: A Survey Vision-Language Pretraining Methods

New RoboReward dataset and models automate robotic training and evaluation

The advancement of artificial intelligence (AI) algorithms has opened new possibilities for the development of robots that ...

IEEE

Leveraging Vision-Language Models to Select Trustworthy Super-Resolution Samples Generated by Diffusion Models

Abstract: Super-resolution (SR) is an ill-posed inverse problem with many feasible solutions that are consistent with a given low-resolution image. On one hand, regressive SR models aim to balance ...

New Apple model combines vision understanding and image generation with impressive results

Manzano combines visual understanding and text-to-image generation, while significantly reducing performance or quality trade-offs.

GitHub

Awesome Diffusion Language Models

[7 Jan 2023] ROIC-DM: Robust Text Inference and Classification via Diffusion Model ...

GitHub

NEO Series: Native Vision-Language Models

[2026/01] 🔥🔥🔥 The training code of NEO is released ! 🔥 Native Architecture: NEO innovates a native VLM primitive that unifies pixel-word encoding, alignment, and reasoning within a dense, ...

IEEE

Vision–Language Pretraining for Image Captioning Using Facial Expression Recognition

Abstract: This paper presents a novel approach incorporating Facial Expression Recognition (FER) to improve emotional and contextual understanding in Vision-Language Pretraining (VLP) model-generated ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results