Reading in of line not working? - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: General Coding Help (https://python-forum.io/forum-8.html) +--- Thread: Reading in of line not working? (/thread-40761.html) |
Reading in of line not working? - garynewport - Sep-19-2023 Just posted about my file not working and realised, as soon as I finished, that I already knew the solution. However, I'd like a more flexible solution. Some of my files are being encoded using UTF-16 LE, others are likely to be UTF-8 and variants inbetween. So that I can read various files, how might I easily use the correct form of opening of a file, based upon the file's encoding? For example, my original line of Python read: file = open(datafile, 'r')To cater for the UTF=16 LE files, I adapted this to: file = open(datafile, 'r', encoding='utf-16-le')Which now means I cannot read in the older files. RE: Reading in of line not working? - DPaul - Sep-19-2023 (Sep-19-2023, 08:01 AM)garynewport Wrote: Which now means I cannot read in the older files.Hi, I once had that problem, and I remember that there is a module called "chardet" which tells you the type of encoding. I do not know if it is still maintained to today's standards, but it is worth a try. Paul RE: Reading in of line not working? - snippsat - Sep-19-2023 I would see if convert files to utf-8 eg online Convert files to UTF-8. Chardet will detect/guess on encoding. G:\div_code\file_test λ chardetect file_1.txt file_1.txt: ascii with confidence 1.0 # The file i test in code G:\div_code\file_test λ chardetect file_2.txt file_2.txt: utf-8 with confidence 0.99 G:\div_code\file_test λ chardetect file_le.txt file_le.txt: UTF-16 with confidence 1.0An option is to read file with try: except if no error will use file,if decode error(UnicodeDecodeError ) will go on and try next encoding.try: with open("file_2.txt", encoding='utf-16-le') as fp: content = fp.read() print(content) except Exception as error: print(f'{error}\n') try: with open("file_2.txt", encoding='cp1252') as fp: content = fp.read() print(content) except Exception as error: print(f'{error}\n') try: with open("file_2.txt", encoding='utf-8', errors='ignore') as fp: content = fp.read() print(content) except Exception as error: print(f'{error}\n')
|