Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
OCR again
#21
Hi,
Currently rewriting and testing for a number of different document formats,
and the speed is beyond expectation!
Still, I would like to make 1 improvement, and I am looking for a suggestion:

When I launch the function "worker" it does it's job.
I can print the starttime, and when it is done, the endtime, but in between
the user has no clue how long this batch is ging to take.
If I print the tif filename from within "worker", that contains a sequence number,
but due to the multiprocessing, it prints "erratic", not very nice.

I could also be very arrogant, and print a "Predicted end time", based on some tests,
and the average time per document, knowing how many there are in the batch.

Something more clever ?
thx,
Paul
It is more important to do the right thing, than to do the thing right.(P.Drucker)
Better is the enemy of good. (Montesquieu) = French version for 'kiss'.
Reply
#22
For whom it may concern, i ran tests on a batch of 100 tifs,
with OCR and rather intricate analysis of the OCR text. PC has 6 cores.
Sensitivity analysis on max_workers =
1 : 197 seconds
2 : 97 seconds
3 : 69 seconds
4 : 54 seconds # more or less linear : 1000 tifs = 581 seconds (4 cores)
5 : 47 seconds
6 : 43 seconds
7 : 40 seconds
8 : 38 seconds

But, once you start on larger batches of tifs, like 6000, you run into problems:
"Too many files open", although I close everything i can possibly close during runtime.
?
Paul

Edit: the "too many files open" can be overcome by opening the image with a with... statement.
That will close it automatically at the end of the function.
It is more important to do the right thing, than to do the thing right.(P.Drucker)
Better is the enemy of good. (Montesquieu) = French version for 'kiss'.
Reply
#23
Again, to whom it may concern:
There are obviously more parameters that influence a sensitivity "test" like this.
The previous test was performed on an i5 processor, 16GB ram, 6 cores.

My laptop has an AMD ryzen, 16GB ram, 8 cores. (100 tifs)
max_workers =
4 = 95 seconds
5 = 79 seconds
6 = 68 seconds
7/
8 = 53 seconds

so more cores make up for a slower processor.
Paul
It is more important to do the right thing, than to do the thing right.(P.Drucker)
Better is the enemy of good. (Montesquieu) = French version for 'kiss'.
Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020