Spoing dropped this bit of informative info into the bin: "Last week, a friend of mine griped that he didn't know of an easy way -- short of getting Adobe Capture and paying per-use licence fees -- of creating searchable PDFs. I scoffed, and told him I've done it many times, and it was free -- as in beer and speech. Dumbfounded, he pushed me to show him how, and I did; print to a Postscript file, and run ps2pdf on it...done! Since every document could be output as Postscript, his problem was solved. If he wanted to batch process the documents, he could set up a few scripts to simplify the task. While he was impressed, he ended up asking what seemed like an easy question; 'Can you do the same with a scanned image?'" And therein lies the question... "After a week of on/off searching, I did find some good references as well as nearly all the parts necessary for the job, including open source OCR engines, PDF and Postscript tools, search engines, and the like. Unfortunately, I came up with only two solutions -- neither of them Open Source, and most quite costly (premium beer); Adobe Capture or dedicated "PDF scanners" like this one. My question to the Slashdot crowd is this: Is there a cost-effective way of moving existing dead-tree documents into either HTML, PDF, or other searchable mixed text and graphics format? We all deal with a mix of electronic and printed documents -- and you're like me you've paid for some of them in both formats. If you're like me, you buy new documents in electronic, searchable, format when you can. How many of us have O'Reilly's Networking Bookshelf, or some other CD texts ready to search on our notebooks and networks? Yet, I have a four foot wide stack of technical documents and books that just isn't going to come with me on each plane trip. I'm not going to get rid of them -- they are still valuable -- but I can't figure out how to make them useful more often. The available tools for capturing paper and converting it into searchable PDFs is costly, and is geared toward corporations that can justify the costs by the number of users. To me, a per-use licence of Adobe's Capture -- Adobe Capture - Prices Adobe Capture - Features -- is just not cost effective. If the document is already a text document -- even if it's in some word processor I don't use -- generating PDF files is easy and cheap; Print a document to a Postscript file, or create one. For example a simple text document is trivial; enscript file.txt -p file.ps Convert the resulting Postscript file to PDF; ps2pdf file.ps file.pdf Converting a paper document to PDF is also easy. Just scan the image and use tiff2ps or jpeg2ps to create the Post script file. The only problem is that the resulting PDF is a bitmap image and isn't searchable. Interestingly enough, TIFF -- a format used extensively for scanned documents -- does support TIFF+Text, but usually as an extention to TIFF and isn't really an optimal format; The Unofficial TIFF Home Page. So, if you want to search the documents and keep the formatting and diagrams, you're back to paying Adobe for Capture or some other nearly as expensive method. "