We propose HtmlRAG, which uses HTML instead of plain text as the format of external knowledge in RAG systems. To tackle the long context brought by HTML, we propose Lossless HTML Cleaning and Two-Step ...
Abstract: In an era where social media platforms burgeon with diverse content, compelling moderation is imperative to filter harmful materials. Traditional methods often grapple with the dual ...
Abstract: Social Media Content Classification and Community Detection (SMCCCD) classify content and identify communities through deep learning, and NLP. Traditional models are weak in scalability and ...