How can I use docker without sudo?

Question

abraham

Asked: 2019-06-04 21:05:31 +0800 CST2019-06-04 21:05:31 +0800 CST 2019-06-04 21:05:31 +0800 CST

Matrix with Python and Pandas

772

I have a file with linked data each two columns ( in this example is just 3, but could be more), in a tab delimited file:

Names   SampleA   Names   SampleB   Names   SamplesC
Name1      5      Name3      7      Name1         8
Name2      9      Name2      1      Name2         2
Name4      4      Name4      8      Name3         8

And so on, what I want is to have a single column with the columns Names with not redundant data, and in this case 3 columns with samples; in those samples that are not values for x name will be fit with 0:

Names     SampleA     SampleB    SampleC
Name1          5          0        8
Name2         9           1        2
Name3         0           7        8
Name4         4           8         0

How can I approach this matrix with pandas ???, I jus have tried with R and Perl, but I think will be easer with python using Pandas !!!

Thanks so much !!!!

1 Answers

Voted

jottbe · Answer 1 · 2019-06-06T00:52:25+08:00

You can do it as follows:

Make the column names unambiguous, so you have no columns with the same name (probably you could avoid this step also if you like, if you access the columns by index, but I would make them unambiguous): do something like:

your_df.columns = ['NamesA', 'SampleA', 'NamesB', 'SampleB', 'Names', 'SamplesC']
Create dataframes from the column pairs
Join the dataframes with the column parts together and collapse the Names columns into one column
Fill the na values

Testdata:

import pandas as pd
your_df= pd.DataFrame({
        'NamesA': ['Name1', 'Name2', 'Name4'], 
        'SampleA': [5, 9, 4], 
        'NamesB':['Name3', 'Name2', 'Name4'], 
        'SampleB': [5, 9, 4], 
        'NamesC':['Name1', 'Name2', 'Name3'], 
        'SampleC': [8, 2, 8]
        })

Here some example code (beginning with step2):

all_cols= list(your_df.columns)
joined_df= None
while all_cols:
    name_col, sample_col, *all_cols= all_cols
    # in case not all columns in your df are filled
    # you need to handle na values
    filled_indexer= ~your_df[name_col].isna()
    # Step 2:
    col_pair_df= your_df.loc[filled_indexer, [name_col, sample_col]]
    # Step 3:
    if joined_df is None:
        joined_df= col_pair_df
        joined_df.columns= ['Names', sample_col]
    else:
        joined_df= joined_df.merge(col_pair_df, how='outer', left_on='Names', right_on=name_col)
        # now we need to populate the one names column and remove the Names* columns
        names_na_indexer= joined_df['Names'].isna()
        joined_df.loc[names_na_indexer, 'Names']= joined_df[name_col]
        joined_df.drop(name_col, axis='columns', inplace=True)
# Step 4:
joined_df.fillna(0, inplace=True)

Matrix with Python and Pandas

How to install Google Chrome

Is there a command to list all users? Also to add, delete, modify users, in the terminal?

How to delete a non-empty directory in Terminal?

How to unzip a zip file from the Terminal?

How can I copy the contents of a folder to another folder in a different directory using terminal?

How do I install a .deb file via the command line?

How do I run .sh scripts?

How do I install a .tar.gz (or .tar.bz2) file?

How to list all installed packages

Unable to lock the administration directory (/var/lib/dpkg/) is another process using it?