Jump to content
xisto Community
FirefoxRocks

Document Digitization How to approach this?

Recommended Posts

I have been wondering if there is a fast way of doing this. I have 3 piles of documents that I would like to store digitally on the computer as PDFs or whatever. This translates into approximately 3000 documents, maybe a bit less.

 

Relevant hardware and software I have are:

Lexmark X5070 all-in-one

Windows XP SP3/Windows Vista SP1/Windows 7 Beta/Ubuntu 9

PDF printer software driver

Anything that came with the all-in-one

Windows Live Photo Gallery

- I can also download free software from Download.com if necessary

The whole process of scanning the document, waiting for the computer processing the image and saving it takes a total of 50 seconds or so per document. By going non-stop, I estimate this will take at least 42 hours to do this.

 

I was wondering if there was a way for the all-in-one to take, say 60 papers, scan them one by one and save them with image1.png, image2.png or whatever. I need to do this automatically so I can leave it unattended for an hour without sitting there putting papers in and watching the progress bar over and over.

 

The documents contain a lot of textual information (right now I'm not focused on newspaper clippings) and typing them up would take even longer. All I need is PNGs or PDFs automatically created, then I'll sort through them in My Documents. The first priority is getting rid of the physical papers and recycling them!!

 

Any idea on how to do this?

Edited by FirefoxRocks (see edit history)

Share this post


Link to post
Share on other sites

The real problem is : how are your documents made ? For books, can you cut the bindings in order to have free leaves ?I would say that, for your problem, the most comfortable way is to ask a professional guy to do it.The last high-end professional scanner I used was supposed to scan 5000 documents per minute (less than one second per document, recto-verso). And of course everything was in a giant PDV or Crosoft Word file...Just have a look at the professional scanners specs, it's really impressive.Of course, only big companies (like national Social Security) could pay $40000 (yes, forty thousands dollars) for a scanner, that's why I told you "don't try with your home small toy, have it done on a real scanner".

Share this post


Link to post
Share on other sites

5000, five thousand docs per minute. That is like flipping a book and its done with scanning. Amazing, the hardware would be really high end to process something like that. They also need good hard disks to write data that fast and efficiently. Hope somebody may soon suggest a better solution on digitizing those texts. I don't know, I am referring to something totally other thing here, you can have a look of gutenberg scanned copies. May be somebody in those forums may help.

Share this post


Link to post
Share on other sites

It isn't books, 90% of it is in regular 8.5"*11" letter sized paper. Some of them are on 11"*17" but those are not that common.Anyways, I have found a solution to this problem that will create one huge PDF file approximately 100 sheets at a time.

Share this post


Link to post
Share on other sites

Hey FirefoxRocks,

There are solutions, although huge files created from these programs is very likely, but that's to be expected.

There's DocsVault or the Open Source KnowledgeTree.

This eliminates scanning one page, and saving it one by one. You can scan all your documents, stacking them inside 1 file, as a per page by page basis in different file formats, scan them all individually, then stack them ontop of one another, and possibly other features that I haven't delved into yet.

There's no requirement for a PDF printer, these programs can create that format with the scanned files. Depending on the quality you need the documents to be at, the lowest resolution, black and white, will make the files smaller but readibility could go up.


Cheers,


MC

Share this post


Link to post
Share on other sites

If the scanner is big enough, you can put as many documents as you could at one time, say 10. This way at least it could reduce your labors down to one 10th since your documents 90% of it is in regular 8.5"*11" letter sized paper. And I am so curious about the solution to this problem you found, could you sharing it with us? Thank you!

Share this post


Link to post
Share on other sites

If the scanner is big enough, you can put as many documents as you could at one time, say 10. This way at least it could reduce your labors down to one 10th since your documents 90% of it is in regular 8.5"*11" letter sized paper. And I am so curious about the solution to this problem you found, could you sharing it with us? Thank you!

The solution: I am requesting permission from administration to use the school photocopier to do this, which can take approximately 100 documents each time and create a PDF file with 100 pages. The PDF will be emailed to me when it is complete. This way I will only have to reload the documents 20 times or so (I found out I had less documents than estimated).

Share this post


Link to post
Share on other sites

Congrats. You found the real solution.A big photocopy engine very often has an embedded scanner, and some of them allow saving the scanned file, yes, it's a smart solution.

Share this post


Link to post
Share on other sites

Congrats. You found the real solution.

A big photocopy engine very often has an embedded scanner, and some of them allow saving the scanned file, yes, it's a smart solution.

It turned out to take much longer than I expected because of 2 things:

Crinkled/ripped paper jams easily in the feeder

I didn't know that so many of my documents had staples in them.

Nonetheless, it is quite fast and now I have to download 50 PDF files and split them apart.

Share this post


Link to post
Share on other sites

The real key to scanning a large quantity of loose leaf paper is a printer that has a document feeder which automatically feeds document into the scanner to be scanned to the computer. Without this, you would spending a lot more time and labour in order to get it down properly.

Share this post


Link to post
Share on other sites

The real key to scanning a large quantity of loose leaf paper is a printer that has a document feeder

You probably wanted to say "a scanner that has a document feeder". Some fast scanners have a document feeder without being printers. And some printers have a very slow scanner.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×
×
  • Create New...

Important Information

Terms of Use | Privacy Policy | Guidelines | We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.