Abstract: Recent open-world representation learning approaches have leveraged CLIP to enable zero-shot 3D object recognition. However, performance on real point clouds with occlusions still falls ...
Whether you want to build a document scanner, digitize receipts, or add text recognition to your mobile app, this project is a perfect starting point. This project is provided for educational and ...
Abstract: In multimodal emotion recognition, the diversity and temporal unalignment of speech and text modalities pose significant challenges for effective fusion. To address this issue, Multi-level ...
Pocket TTS is an open-source text-to-speech model that runs on CPUs, clones voices from 5 seconds of audio, and keeps voice ...
A real-time face recognition-based attendance system built with Flask, OpenCV, and face_recognition. This project enables automatic attendance marking, user management, live monitoring, and ...