Array manipulation - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: General Coding Help (https://python-forum.io/forum-8.html) +--- Thread: Array manipulation (/thread-24328.html) |
Array manipulation - 0zMeister - Feb-09-2020 Hi, I've got some data in a csv file in the following format: ID Number a 1 a 2 a 3 a 4 b 5 b 6 b 7 b 8 c 9 c 10 c 11 c 12 I need it in a matrix/tabular format i.e.: a b c 1 5 9 2 6 10 3 7 11 4 8 12 I can read the csv into a dataframe, etc. I'm struggling to convert the array into the matrix using numpy and/or panda array manipulations. I can isolate a specific set of data using something like data.Number[data.ID='a']. I can also get the unique ID's data.ID.unique. I was thinking of using a for uniqueID in data.ID.unique loop to then append the data to a new array in amatrix format. Any sugestions on how to append the arrays and allocate headers to form the matrix? Cheers, OM RE: Array manipulation - Larz60+ - Feb-09-2020 what have you tried so far? show code working or not RE: Array manipulation - perfringo - Feb-09-2020 I would read it into Python dictionary (where key is letter and value is list of digits). From that it's easy to convert to dataframe of required format if needed. RE: Array manipulation - 0zMeister - Feb-09-2020 I've found a 'work around' method. It works in this case, because each 'ID' has the same amount of values. It's not ideal since you might have a case where not all ID's have the same amount of values: df = pd.read_csv("MBV2rawdata.csv") #csv file with all the data cpid = df.CPID.unique() # get unique IDs xnp = df.to_numpy() xval = xnp[:,2] # only need the values in column 2 of the dataframe xrs = np.reshape(xval, (round(len(df)/len(cpid)), len(cpid))) RE: Array manipulation - jefsummers - Feb-09-2020 OK, don't have time this am to debug this - I KNOW it will not work as written, but posting it to convey the ideas behind it. Import the csv into a pandas dataframe, then create 3 dataframes from that using conditionals to get your a, b, c frames. Then reassemble. I doubt I am doing the conditionals correctly - leave that to you to debug. Adding the columns should work. import pandas as pd import numpy as np # read the csv file into a dataframe df = pd.read_csv('frame.csv') #create 3 dataframes for the 3 letters dfA = df[df.iloc[0] == 'a'] dfB = df[df.iloc[0] == 'b'] dfC = df[df.iloc[0] == 'c'] #now combine the dataframes dfA['b'] = dfB dfA['c'] = dfC RE: Array manipulation - 0zMeister - Feb-09-2020 Thanks jefsummers, that's exactly the idea I have in terms of concept. The only difference being that in this case I've got about 40 'a', 'b', 'c's etc. Thus I'd like to do it with a for loop instead of manually. Although I've been programming since forever, I'm new to Python and struggling a bit with the syntax, commands, etc. RE: Array manipulation - baquerik - Feb-09-2020 You could create a set extracting all the unique IDs. Then create a new dataframe and in a loop add every new column quering the old dataframe for every element in the set. RE: Array manipulation - 0zMeister - Feb-15-2020 Ok, so I've managed to get it working, although it might not be the most elegant solution. I build the new dataframe based on data extracted from the old dataframe. Innitially I got a lot of 'nan's in the new dataframe. I traced it to an indexing issue where the data I extract from the old dataframe keep it's index, which causes a mismatch with the new dataframe's index, therefore causing the 'nan'. So before adding the data extracted from the old dataframe, I needed to reindex it to align with the new dataframe's index. See code below. As I'm new to python I welcome comments on how to do this better, more efficiently. For some reason the reindexing takes quite a bit of processing time. # read csv into dataframe df = pd.read_csv("MBV2rawdata.csv") #CPID, Date, Value, Class # I need this in the format Date, CPID#1, CPID#2, ... , CPID#n, Class cpid = df.CPID.unique() sf = df.Date[df.CPID == 24021] #Sf is the newly created dataframe sf = pd.DataFrame(sf) i=0 for colname in cpid: col = pd.DataFrame(df.Value[df.CPID==colname]) #get the column from df for j in range(len(col)): #have to reindex col to align with sf's index col.rename(index = { j + len(col)*i : j}, inplace=True) sf.insert(i+1, colname, col, True) #add col to sf i+=1 sf['Class']=df.Class[df.CPID == 24021] # add the 'Class' vlaues to sf |