Fast Text with Keras Multi-Class Text Classification

AMITA: Attribute-Guided Masked Image-Text Alignment for Multi-Label Image Representation

Abstract: Multi-label image classification, which involves recognizing multiple objects within a single image, is a fundamental task in computer vision. Recently, Visual-Language Models (VLMs) have ...

IEEE

DocLayLLM: An Efficient Multi-modal Extension of Large Language Models for Text-rich Document Understanding

Abstract: Text-rich document understanding (TDU) requires comprehensive analysis of documents containing substantial textual content and complex layouts. While Multimodal Large Language Models (MLLMs) ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results

AMITA: Attribute-Guided Masked Image-Text Alignment for Multi-Label Image Representation

DocLayLLM: An Efficient Multi-modal Extension of Large Language Models for Text-rich Document Understanding

Trending now