Python Forum
Array manipulation - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: Array manipulation (/thread-24328.html)



Array manipulation - 0zMeister - Feb-09-2020

Hi,

I've got some data in a csv file in the following format:
ID Number
a 1
a 2
a 3
a 4
b 5
b 6
b 7
b 8
c 9
c 10
c 11
c 12

I need it in a matrix/tabular format i.e.:
a b c
1 5 9
2 6 10
3 7 11
4 8 12

I can read the csv into a dataframe, etc. I'm struggling to convert the array into the matrix using numpy and/or panda array manipulations.

I can isolate a specific set of data using something like data.Number[data.ID='a']. I can also get the unique ID's data.ID.unique.

I was thinking of using a for uniqueID in data.ID.unique loop to then append the data to a new array in amatrix format.

Any sugestions on how to append the arrays and allocate headers to form the matrix?

Cheers,

OM


RE: Array manipulation - Larz60+ - Feb-09-2020

what have you tried so far? show code working or not


RE: Array manipulation - perfringo - Feb-09-2020

I would read it into Python dictionary (where key is letter and value is list of digits). From that it's easy to convert to dataframe of required format if needed.


RE: Array manipulation - 0zMeister - Feb-09-2020

I've found a 'work around' method. It works in this case, because each 'ID' has the same amount of values. It's not ideal since you might have a case where not all ID's have the same amount of values:

df = pd.read_csv("MBV2rawdata.csv") #csv file with all the data
cpid = df.CPID.unique() # get unique IDs
xnp = df.to_numpy()
xval = xnp[:,2] # only need the values in column 2 of the dataframe
xrs = np.reshape(xval, (round(len(df)/len(cpid)), len(cpid)))



RE: Array manipulation - jefsummers - Feb-09-2020

OK, don't have time this am to debug this - I KNOW it will not work as written, but posting it to convey the ideas behind it.
Import the csv into a pandas dataframe, then create 3 dataframes from that using conditionals to get your a, b, c frames. Then reassemble.
I doubt I am doing the conditionals correctly - leave that to you to debug. Adding the columns should work.
import pandas as pd
import numpy as np
# read the csv file into a dataframe
df = pd.read_csv('frame.csv')
#create 3 dataframes for the 3 letters
dfA = df[df.iloc[0] == 'a']
dfB = df[df.iloc[0] == 'b']
dfC = df[df.iloc[0] == 'c']
#now combine the dataframes
dfA['b'] = dfB
dfA['c'] = dfC



RE: Array manipulation - 0zMeister - Feb-09-2020

Thanks jefsummers, that's exactly the idea I have in terms of concept. The only difference being that in this case I've got about 40 'a', 'b', 'c's etc. Thus I'd like to do it with a for loop instead of manually. Although I've been programming since forever, I'm new to Python and struggling a bit with the syntax, commands, etc.


RE: Array manipulation - baquerik - Feb-09-2020

You could create a set extracting all the unique IDs.

Then create a new dataframe and in a loop add every new column quering the old dataframe for every element in the set.


RE: Array manipulation - 0zMeister - Feb-15-2020

Ok, so I've managed to get it working, although it might not be the most elegant solution. I build the new dataframe based on data extracted from the old dataframe. Innitially I got a lot of 'nan's in the new dataframe. I traced it to an indexing issue where the data I extract from the old dataframe keep it's index, which causes a mismatch with the new dataframe's index, therefore causing the 'nan'. So before adding the data extracted from the old dataframe, I needed to reindex it to align with the new dataframe's index. See code below. As I'm new to python I welcome comments on how to do this better, more efficiently. For some reason the reindexing takes quite a bit of processing time.

# read csv into dataframe
df = pd.read_csv("MBV2rawdata.csv") #CPID, Date, Value, Class

# I need this in the format Date, CPID#1, CPID#2, ... , CPID#n, Class


cpid = df.CPID.unique()


sf = df.Date[df.CPID == 24021] #Sf is the newly created dataframe

sf = pd.DataFrame(sf)


i=0
for colname in cpid:
    col = pd.DataFrame(df.Value[df.CPID==colname]) #get the column from df   
    for j in range(len(col)):   #have to reindex col to align with sf's index
        col.rename(index = { j + len(col)*i : j}, inplace=True)
    sf.insert(i+1, colname, col, True) #add col to sf
    i+=1


sf['Class']=df.Class[df.CPID == 24021] # add the 'Class' vlaues to sf