Abstract: We present Florence-VL, a new family of multimodal large language models (MLLMs) with enriched visual representations produced by Florence-2 [45], a generative vision foundation model.
Abstract: Point-interactive image colorization is intended to colorize a grayscale image by allowing the user to specify colors at specific locations. The colors provided by the user (user hints) are ...