I have a file with linked data each two columns ( in this example is just 3, but could be more), in a tab delimited file:
Names SampleA Names SampleB Names SamplesC
Name1 5 Name3 7 Name1 8
Name2 9 Name2 1 Name2 2
Name4 4 Name4 8 Name3 8
And so on, what I want is to have a single column with the columns Names with not redundant data, and in this case 3 columns with samples; in those samples that are not values for x name will be fit with 0:
Names SampleA SampleB SampleC
Name1 5 0 8
Name2 9 1 2
Name3 0 7 8
Name4 4 8 0
How can I approach this matrix with pandas ???, I jus have tried with R and Perl, but I think will be easer with python using Pandas !!!
Thanks so much !!!!
You can do it as follows:
Make the column names unambiguous, so you have no columns with the same name (probably you could avoid this step also if you like, if you access the columns by index, but I would make them unambiguous): do something like:
your_df.columns = ['NamesA', 'SampleA', 'NamesB', 'SampleB', 'Names', 'SamplesC']
Create dataframes from the column pairs
Join the dataframes with the column parts together and collapse the Names columns into one column
Fill the na values
Testdata:
Here some example code (beginning with step2):