[Python] 이미지파일의 한글 읽어오기(pytesseract)

[Python] 이미지파일의 한글 읽어오기(pytesseract)

2023. 12. 30. 23:22ㆍPYTHON

1. Python에서 이미지의 한글을 읽어오기 위한 방법 설정

이미지 파일의 내용을 한글로 읽어와야 하는 일이 생겼다. 찾아보니 윈도우와 리눅스가 설치방법이 다른데 윈도우 기준으로 적어본다.

링크에서 다운로받아서 설치가 필요하다.

https://digi.bib.uni-mannheim.de/tesseract/

Index of /tesseract

debian/2018-01-10 17:33 - Debian packages used for cross compilation

digi.bib.uni-mannheim.de

나의 경우에 tesseract-ocr-w64-setup-v5.1.0.20220510.exe 를 다운로드 받아 설치하였다.

https://digi.bib.uni-mannheim.de/tesseract/tesseract-ocr-w64-setup-v5.1.0.20220510.exe

설치 후 path를 지정한다.

원격설정에서 path를 추가한다. 아래 이미지 참고

그리고 한글을 읽기위한 사항으로 아래의 링크에서 다운로드 받아서

https://github.com/tesseract-ocr/tessdata/blob/main/kor.traineddata

GitHub - tesseract-ocr/tessdata: Trained models with support for legacy and LSTM OCR engine

Trained models with support for legacy and LSTM OCR engine - GitHub - tesseract-ocr/tessdata: Trained models with support for legacy and LSTM OCR engine

github.com

다운로드 후 아래 위치의 해당 폴더에 넣는다.

2. Python에서 이미지의 한글을 읽어오기 사용방법

C:\Program Files\Tesseract-OCR\tessdata

이제 테스트만 남았다.

import os
import cv2
import pytesseract
from PIL import Image
import time

class ImageCaptureReaderClass:
    def __init__(self):
        print(f" ____ INIT ____")
        self.getImageReader()

    def getImageReader(self):
        try:
            #   검색 후 다음에 있는지 없는지를 판단한다. 캡쳐 후 이미리조 판단.
            pytesseract.pytesseract.tesseract_cmd = r'C:\\Program Files\\Tesseract-OCR\\tesseract'
            config = ('-l kor+eng --oem 3 --psm 11')
            ImgPath = "part_screenshot_guess_2.png"

            image = cv2.imread(ImgPath)
            gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
            cv2.imwrite(ImgPath, gray)
            text = pytesseract.image_to_string(Image.open(ImgPath), config=config)
            # print(f"text ____ {text}")

            text = self.setStringReplacer(text)
            print(f" ________ {text}")

            time.sleep(1)
        except Exception as e:
            print(f" _________ getImageReader __________ {e}")



    def setStringReplacer(self, strText):
        try:
            strText = strText.replace('\n', '')
            strText = strText.replace('$', '')
            strText = strText.replace('TT', '')
            strText = strText.replace(' ', '')
            strText = strText.replace('Tr', '')
            strText = strText.replace('tr', '')
            strText = strText.replace(';', '')
            strText = strText.replace('Guess', '')
            strText = strText.replace('G', '')
            return strText

        except Exception as e:
            print(f" ____ setStringReplacer _____ {e}")



if __name__ == "__main__":
    ImageCaptureReaderClass()

실제 이미지

결과가 출력되었다.

저작자표시 (새창열림)

'PYTHON' 카테고리의 다른 글

[Python] 을 이용한 pyautogui 사용방법 (1)	2023.12.31
[Python] 파이썬 네이버 API 뉴스 연동(2) (4)	2023.12.31
[Python] 파이썬 네이버 API 뉴스 연동(1) (0)	2023.12.31
[Python] shutil을 사용한 주기적 전체백업 (1)	2023.12.27
[Python] Pykrx를 통한 데이터 가져오기 (2)	2023.12.25

나의 지식 이야기

나의 지식 이야기

태그

최근글

댓글

공지사항

아카이브

1. Python에서 이미지의 한글을 읽어오기 위한 방법 설정

2. Python에서 이미지의 한글을 읽어오기 사용방법

'PYTHON' 카테고리의 다른 글

관련글

티스토리툴바