Trouble fetching checkbox and radio fields with PyPDF2

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • prahladyeri
    New Member
    • Aug 2022
    • 1

    Trouble fetching checkbox and radio fields with PyPDF2

    My project involves reading text from a bunch of PDF form files for which I'm using PyPDF2 open source library. There is no issue in getting the text data as follows:

    [code=python]
    reader = PdfReader("data/test.pdf")
    cnt = len(reader.page s)
    print("reading pdf (%d pages)" % cnt)
    page = reader.pages[cnt-1]
    lines = page.extract_te xt().splitlines ()
    print("%d lines extracted..." % len(lines))
    [/code]

    However, this text doesn't contain the checked statuses of the radio and checkboxes. I just get normal text (like "Yes No" for example) instead of these values.

    I also tried the reader.get_fiel ds() and reader.get_form _text_fields() methods as described in their documentation but they return empty values. I also tried reading it through annotations but no "/Annots" found on the page. When I open the PDF in a notepad++ to see its meta data, this is what I get:

    [code=bash]
    %PDF-1.4
    %²³´µ
    %Generated by ExpertPdf v9.2.2
    [/code]
    It appears to me that these checkboxes aren't usual form fields used in PDF but appear similar to HTML elements. Is there any way to extract these fields using python?
Working...