【发布时间】:2020-09-07 04:21:02
【问题描述】:
每个人。 这个问题已经被别人问过了。 Splitting dictionary/list inside a Pandas Column into Separate Columns
我已经问过这个问题了。但这并没有解决。 How to use pandas to build a column which are in a dataframe
现在,我有一个数据框。看起来像这样。
intron_id octamer
0 >ENSG00000183943.1 AGCCATGC:1 AGUAGCUG:1 GCCUGGCC:1 AGAUGAUG:1 AG...
1 >ENSG00000183943.2 CATATTTC:1 UCCCAAAA:1 AAGCCATA:1 TATTTTGC:1 TA...
2 >ENSG00000183943.3 AGUAGCUG:4 UCAACAGG:1 CCUUUCAU:1 UACCUUUU:1 GC...
3 >ENSG00000183943.4 AUGAGCAC:1 UCCUACGG:1 GGAGGATC:1 AUAGGGUG:1 CC...
4 >ENSG00000183943.5 UUGCCAAU:1 AUGCUGGG:1 ACUAUUUU:1 GGAGGATC:3 UG...
现在,我想把它改成这样。
intron_id AGCCATGA AGUAGCUG GCCUGGCC ......
>ENSG00000183943.1 1 1 1
>ENSG00000183943.2 0 0 0
>ENSG00000183943.3 0 0 0
但是当我尝试使用 apply(pd.Series) 或 df.octamer.values.tolist() 时,它们都不起作用。我很困惑。希望你能给我一些建议。先感谢您。我的代码如下。
import pandas as pd
df=pd.read_csv('~/10genomic/elife/octamer/intron_seq/count.txt',delimiter='\t',header=None)
df.rename(columns={0:"intron_id",1:"octamer"},inplace=True)
df['octamer']=df['octamer'].apply(lambda x:str(x))
print(df)
intron_id octamer
0 >ENSG00000183943.1 AGCCATGC:1 AGUAGCUG:1 GCCUGGCC:1 AGAUGAUG:1 AG...
1 >ENSG00000183943.2 CATATTTC:1 UCCCAAAA:1 AAGCCATA:1 TATTTTGC:1 TA...
2 >ENSG00000183943.3 AGUAGCUG:4 UCAACAGG:1 CCUUUCAU:1 UACCUUUU:1 GC...
3 >ENSG00000183943.4 AUGAGCAC:1 UCCUACGG:1 GGAGGATC:1 AUAGGGUG:1 CC...
4 >ENSG00000183943.5 UUGCCAAU:1 AUGCUGGG:1 ACUAUUUU:1 GGAGGATC:3 UG...
df.drop(labels=[2370,3967,5728,11875,14464],axis=0,inplace=True)
def builddict(x):
dictls=[]
for item in x.split(" "):
dictls.append(item.split(":"))
return(dict(dictls))
df['octamer']=df['octamer'].apply(builddict)
print(df)
intron_id octamer
0 >ENSG00000183943.1 {'AGCCATGC': '1', 'AGUAGCUG': '1', 'GCCUGGCC':...
1 >ENSG00000183943.2 {'CATATTTC': '1', 'UCCCAAAA': '1', 'AAGCCATA':...
2 >ENSG00000183943.3 {'AGUAGCUG': '4', 'UCAACAGG': '1', 'CCUUUCAU':...
3 >ENSG00000183943.4 {'AUGAGCAC': '1', 'UCCUACGG': '1', 'GGAGGATC':...
4 >ENSG00000183943.5 {'UUGCCAAU': '1', 'AUGCUGGG': '1', 'ACUAUUUU':...
print(df['octamer'].apply(pd.Series))
0
0 {'AGCCATGC': '1', 'AGUAGCUG': '1', 'GCCUGGCC':...
1 {'CATATTTC': '1', 'UCCCAAAA': '1', 'AAGCCATA':...
2 {'AGUAGCUG': '4', 'UCAACAGG': '1', 'CCUUUCAU':...
3 {'AUGAGCAC': '1', 'UCCUACGG': '1', 'GGAGGATC':...
4 {'UUGCCAAU': '1', 'AUGCUGGG': '1', 'ACUAUUUU':...
当我尝试如下解决它时,它产生了这个错误。我真的很困惑。
df=pd.read_csv('~/10genomic/elife/octamer/intron_seq/countdict.txt',delimiter=',',index_col=0)
df=df.iloc[:3,:]
print(df)
intron_id octamer
0 >ENSG00000183943.1 {'AGCCATGC': '1', 'AGUAGCUG': '1', 'GCCUGGCC':...
1 >ENSG00000183943.2 {'CATATTTC': '1', 'UCCCAAAA': '1', 'AAGCCATA':...
2 >ENSG00000183943.3 {'AGUAGCUG': '4', 'UCAACAGG': '1', 'CCUUUCAU':...
temp_df=pd.DataFrame.from_records(df.pop("octamer"))
print(temp_df)
0 1 2 3 4 5 ... 73895 73896 73897 73898 73899 73900
0 { ' A G C C ... None None None None None None
1 { ' C A T A ... None None None None None None
2 { ' A G U A ... : ' 1 ' }
【问题讨论】:
-
请提供来自 count.txt 的样本数据,以便我们进行测试。
-
你的预期结果是什么?
-
我已经更改了这个问题。希望你能给我一些建议。谢谢!@Mike67
-
我已经更改了这个问题。希望你能给我一些建议。谢谢 ! @juanpa.arrivillaga