Python Forum
Working with text data
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Working with text data
#1
Hey everyone,

I am working with text data which has the following format in excel:

Interview 1
Speaker
Date
Text

Interview 2
Speaker
Date
Text

Interview 3
Speaker
Date
Text

... and so forth (there are 400 such interviews).

Having read the xlsx file into Python I would now like to structure the text data as follows:

interview_number Speaker Date Text
1 A 01/02/1999
1 A 01/02/1999
1 A 01/02/1999
1 A 01/02/1999
2 B 05/08/2000
2 B 05/08/2000


I am afraid I’m struggling to come up with a Python code/loop that helps me split up the text into such columns. Could you please help me with this?

Thanks a lot!
Reply
#2
Looks like homework, make an attempt and we will be glad to help.
Hint: look at https://pypi.org/project/openpyxl/
Reply
#3
Hey, thanks a lot for the support!
I know openpyxl, have been working with it before, it’s just that I have never worked with data in this format, i.e. all information in rows and only one single column… My idea was to extract those specific rows, interview_number, speaker, date, and then create new columns so that they are all split up. However, I haven’t even been successful at extracting those rows. What I’d ideally do is establish that all the rows follow the same format, interview number, speaker, date text, and then loop over interview numbers 1 to 400 to repeat the process. I have been unsuccessfully trying something along those lines:

for row in rows:
    sheet.append(row)
    
for row in sheet.iter_rows(min_row=1, min_col=1, max_row=6, max_col=3):
    for cell in row:
        print(cell.value, end=" ")
    print()    

book.save('iterbyrows.xlsx')
However, I don’t know how to tell Python which rows “are special” to me, i.e. the first 3 rows after an empty row, interview number, speaker, date. Shall I loop over the interview number? Or use the fact that between interviews there is an empty row?

If I looped trough the rows how could I tell Python which ones to extract?

file = "enter_path_to_file_here"
wb = openpyxl.load_workbook(file, read_only=True)
ws = wb.active

for row in ws.iter_rows("E"):
    for cell in row:
        if cell.value == "interview":
            print(ws.cell(row=cell.row, column=2).value) 
I’m honestly at my wits’ end.
Reply
#4
There's a lot of example code here: https://openpyxl.readthedocs.io/en/stable/usage.html
Reply
#5
First of all, is it realy excel file (because you said text data)? Maybe it's a file that you just happen to view in Excel? If it is not xlsx file, using openpyxl may not be necessary.
In nay case the main issue is to parse the data you have. So the question is: Is the file structure consistent - always 4 lines, blank line, 4 lines of data, etc.
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Login Module Help - Comparing data in a text file to data held in a variable KeziaKar 0 2,294 Mar-08-2018, 11:41 AM
Last Post: KeziaKar

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020