Through the Car Library project website, I found ExifTool:
free software by Phil Harvey which can analyze thousands of PDFs, extract informations (misc. and metadata) and generate a tabular report easily usable in LibreOffice Calc or MS Excel.
Method sent to Harry. Applied to his 294 Go collection, analyzed 18420 files in 2212 folders. The processing time took 3 hours and generated a csv file of 52 Mo.
- Download the software
- Install it
- Extract
exiftool(-k).exe
from zip - Rename file into
exiftool.exe
- Copy it in
c:/windows/
- Extract
- Start > Run > type cmd then Enter
- Copy-paste this code :
exiftool -csv -r -Encrypt -Info -Root -Linearized -All -ext pdf -m -t c:\collection > report.csv
c:\collection
contains the PDFsreport.csv
is generated at root of User- Informations extracted : name of file, name of folder, size of file, number of pages, metadatas
- I still need to know how to get: native PDF ou scanned PDF; if scanned: OCR or not; if scanned: quality of scan; PDF/A (yes/no). There is the software PDF-Analyzer Pro 5.0 by Ingo Schmoekel but I didn't buy it.
Method sent to Harry. Applied to his 294 Go collection, analyzed 18420 files in 2212 folders. The processing time took 3 hours and generated a csv file of 52 Mo.
No comments:
Post a Comment