Parsing Documents Using Python

OpenOCR: An Open-Source Toolkit for General-OCR Research and Applications

OpenOCR is an open-source toolkit developed by the OCR team from FVL Lab, Fudan University, under the guidance of Prof. Yu-Gang Jiang and Prof. Zhineng Chen. It focuses on 「General-OCR」 tasks, ...

10d

How to perform web scraping at scale

Web scraping is a process that extracts massive amounts of data from websites automatically, with a scraper collecting thousands of data points in a matter of seconds. It grabs the Hypertext Markup ...

marktechpost

A Coding Implementation on Document Parsing Benchmarking with LlamaIndex ParseBench Using Python, Hugging Face, and Evaluation Metrics

In this tutorial, we explore how to use the ParseBench dataset to evaluate document parsing systems in a structured, practical way. We begin by loading the dataset directly from Hugging Face, ...

IEEE

DocTTT: Test-Time Training for Handwritten Document Recognition Using Meta-Auxiliary Learning

Abstract: Despite recent significant advancements in Handwritten Document Recognition (HDR), the efficient and accurate recognition of text against complex backgrounds, diverse handwriting styles, and ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results