Often we would like to retrieve the meta data that is stored for a given pdf here I show you two ways to do that : using pyPdf and pdfminer.
1) pyPdf
Install pyPdf using pip
then use this code:
from pyPdf import PdfFileReader
pdf_toread = PdfFileReader(open("doc2.pdf", "rb"))
pdf_info = pdf_toread.getDocumentInfo()
print str(pdf_info)
Also you might not get all the meta data that you like for instance in my case I was looking for number of page. if you check the functions in pdf_toread you can find right method. for example:
print pdf_toread.getNumPages()
2) pdfminer
install odfminer using pip and then:
from pdfminer.pdfparser import PDFParser
from pdfminer.pdfdocument import PDFDocument
fp = open('diveintopython.pdf', 'rb')
parser = PDFParser(fp)
doc = PDFDocument(parser)
parser.set_document(doc)
print doc.info # The "Info" metadata
No comments:
Post a Comment