Unidecode issue - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Forum & Off Topic (https://python-forum.io/forum-23.html) +--- Forum: Bar (https://python-forum.io/forum-27.html) +--- Thread: Unidecode issue (/thread-40653.html) |
Unidecode issue - DPaul - Sep-02-2023 Hi, In some pdfs I encounter references to the original parish register, like so: ref = ' RP 477; p. 148 r° ' I perform unidecode on all strings in the document : fieldUni = unidecode.unidecode(field).upper() This has never caused any problems, except in the above case, when i get this: ' RP 477; P. 148 RDEG ' The " ° " has been "translated" into DEG. That is not what is meant here. How do I avoid this translation in python (other then a manual ctrl-H replace '°' with ... etc.) in the text document? thx, Paul RE: Unidecode issue - Gribouillis - Sep-02-2023 (Sep-02-2023, 06:42 AM)DPaul Wrote: How do I avoid this translation in python (other then a manual ctrl-H replace '°' with ... etc.) in the text document?Which translation do you want instead of replacing '°' with 'deg'
RE: Unidecode issue - DPaul - Sep-03-2023 (Sep-02-2023, 08:45 AM)Gribouillis Wrote: Which translation do you want insteadFair question. Let me do some research, because I have to find out if the 'degrees' symbol was meant to be there and has some genealogy meaning. Or is it a faulty translation of something earlier, if the original text was eg. in access of lotus 123.. Paul RE: Unidecode issue - DPaul - Sep-03-2023 (Sep-02-2023, 08:45 AM)Gribouillis Wrote: Which translation do you want insteadOK, there is a hidden meaning , only known to genealogists I suppose. 148 is the folio nr. r° is recto , and... v° means verso. So, recto, verso would be the right translations. I have checked the document, and indeed, some records are r°, others v° ? Paul RE: Unidecode issue - Gribouillis - Sep-03-2023 Use re.sub() for example>>> import re >>> dic = {'r°': 'recto', 'v°': 'verso'} >>> def repl(match): ... return dic[match.group(0)] ... >>> s = ' RP 477; p. 148 r° ' >>> >>> re.sub('[rv]°', repl, s) ' RP 477; p. 148 recto ' RE: Unidecode issue - DPaul - Sep-04-2023 (Sep-03-2023, 06:20 PM)Gribouillis Wrote: Use re.sub() for exampleI thought I had to fiddle around with unidecode parameters, but this is nice and concise. Thanks again, Paul |