vietocr.net是c#編寫的字符識別程序,首先進行jpg,gif,bmp到tiff的轉換,這個用自帶的畫圖就可以。然后使用VietOCR.NET-4.3進行多張tiff的merge。VietOCR.NET是基于OCR的應用,旨在幫助您執(zhí)行打字機打印或掃描的圖像轉換為可編輯的文本。
主窗口由2個小組,左邊一個就可以查看您要處理的照片,另外可以分析從圖片中提取出文字。除了預覽掃描的信息,右邊也是該地區(qū)在那里你可以進行必要的修改文本。
Features
Java & .NET GUI frontends for Tesseract OCR engine
Supports all languages provided by Tesseract
Supports automatic download and installation of language packs
PDF, TIFF, JPEG, GIF, PNG, BMP image formats
Paste image from clipboard
Selection box for Region of Interest (ROI)
File drag-and-drop
Bulk & batch operations
Text replacement postprocessing
Integrated scanning support
Spellcheck with Hunspell
Make Box Files。在orderNo.tif所在的目錄下打開一個命令行,輸入
C:Program FilesTesseract-OCR>tesseract.exe lang.jhy.exp8.TIF lang.jhy.exp8 batch.nochop makebox
使用jTessBoxEditor打開orderNo.tif文件,需要記住的是第2步生成的orderNo.box要和這個orderNo.tif文件同在一個目錄下。逐個校正文字,后保存。
下載jTessBoxEditor工具進行每個自的糾正(注意有nextpage逐頁進行糾正)
官方介紹:
PDF, TIFF, JPEG, GIF, PNG, BMP image formats
Multi-page TIFF images
Screenshots
Selection box
File drag-and-drop
Paste image from clipboard
Postprocessing for Vietnamese to boost accuracy rate
Vietnamese input methods
Localized user interface for many languages (Localization project)
Integrated scanning support
Watch folder monitor for support of batch processing
Custom text replacement in postprocessing
Spellcheck with Hunspell
Support for downloading and installing language data packs and appropriate spell dictionaries
Bravenet Counter Stats
Powered by Bravenet
View Statistics