![]() |
.dat file to xlsxl - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: General Coding Help (https://python-forum.io/forum-8.html) +--- Thread: .dat file to xlsxl (/thread-25678.html) |
RE: .dat file to xlsxl - buran - Apr-23-2020 pass index=False when calling df.to_excel()
RE: .dat file to xlsxl - Makada - Apr-23-2020 Yes, thats working ![]() Thanks RE: .dat file to xlsxl - Makada - Apr-23-2020 Hi, It takes some time to load the data every time. Is it possible to only add new data to the .xlsxl output file every minute, so it wont have to load data already in the output file? And when the script hasnt run for some time, the output file is automatically updated with the new data beginning from where the script has stopped? The source file is at: "C:\\Campbellsci\\LoggerNet\\CR1000_Table1.dat" The output file is at: "C:\\Users\\Makada\\Desktop\\Map1.xlsx" With kind regards. RE: .dat file to xlsxl - buran - Apr-23-2020 Actually we don't know what you are doing. Why do you read everything every time? Do you need the data? When writing to excel file You can specify the start row by supplying startrow . Of course you should know or find the number of rows already in the filePlease, check the documentation for pandas.to_excel RE: .dat file to xlsxl - buran - Apr-23-2020 Also, if the excel file serves as some sort of database consider storing data to real database. even simple sqlite (lightweight disk-based database) would be a good choice. RE: .dat file to xlsxl - Makada - Apr-23-2020 Hi, Yes i need all the data for archive. So its not needed to load all data every one minute update, but add only new data on top of all previous data. So it will load much faster. I know how to use it shown in the code below where the output file has unlimited data, while the source can only have about 30 days of data. But i dont know how to use it when the outputfile is .xlsx and has some changes you suggested and worked. import time import schedule def task1(): existingLines = set(line.strip() for line in open("C:\\Users\\Makada\\Desktop\\CR1000_Table1 - kopie.dat")) outfile = open("C:\\Users\\Makada\\Desktop\\CR1000_Table1 - kopie.dat", "a+") for content in open("C:\\Campbellsci\\LoggerNet\\CR1000_Table1.dat", "r"): if content.strip() not in existingLines: # to void duplicate lines outfile.write(content) existingLines.add(content) outfile.close() schedule.every().minute.at(":01").do(task1) while True: schedule.run_pending() time.sleep(1) refresh() RE: .dat file to xlsxl - buran - Apr-23-2020 I would return from task1 the last date/time/some-other-identifying-field in the records in the output file and number of total records. Then on the next call, I will pass them again as arguments (line increased by one) and write from there. and if it has 30 days of records why run every 1 minute (if it is just an archive)? and again - database would be better choice than excel file. RE: .dat file to xlsxl - Makada - Apr-23-2020 Hi buran, Sorry i lost you here:( Yes i know about sql possibilities, but id like to have it in Excel. And from there save daily, monthly, yearly files. With kind regards. RE: .dat file to xlsxl - Makada - Apr-24-2020 Hi buran, Do you have an example from what you last posted? With kind regards. RE: .dat file to xlsxl - buran - Apr-24-2020 I bit different from what I initially suggested but in my opinion more robust solution import pandas as pd from datetime import datetime import time def read_data(dat_file, last_update): df = pd.read_table(dat_file, delimiter=',', parse_dates=[1,], low_memory=False, skiprows=1) df.drop("RECNBR", axis=1, inplace=True) df["TMSTAMP"] = pd.to_datetime(df["TMSTAMP"]) cols = df.columns.drop(labels=["TMSTAMP"]) df[cols] = df[cols].apply(pd.to_numeric, errors='coerce') return df[df['TMSTAMP'] > last_update] def write_xlsx(xlsx_file, start_row, df): df.to_excel(xlsx_file, startrow=start_row, index=False) dat_file = 'foo.dat' xlsx_file = 'foo.xlsx' start_row = 0 # change this to actual start_row when run for first time, e.g. if you have 30 recors in the xlsx file this will be 30 (first row is at index 0) last_update = datetime.strptime('2020-03-25 21:11:00', '%Y-%m-%d %H:%M:%S') # change to last dateime already in xlsx file while True: df = read_data(dat_file=dat_file, last_update=last_update) num_rows = df.shape[0] # get number of rows in dataframe if num_rows: print(f'Writing {num_rows} records to {xlsx_file}') write_xlsx(xlsx_file=xlsx_file, start_row=start_row, df=df) start_row += num_rows last_update = df.TMSTAMP.iat[-1] print(f'Last update in xlsx file is {last_update}, start_row is {start_row}') else: print('No new data id dat file.') time.sleep(60) # wait 1 minute |