【发布时间】:2019-07-19 13:43:10
【问题描述】:
我有两个数据 csv 第一个:
word,centroid
she,1
great,0
good,3
mother,2
father,2
After,4
before,4
.....
第二个:
sentences,label
good mother,1
great father,1
我想根据聚类结果检查每个句子
因此,如果centroid 3 上的句子是good mother good,则数组将为[0,0,0,1,0],centroid 2 上的单词mother 则数组将为[0, 0,1,1,0]...
我有复杂而错误的代码...谁能帮帮我
这是我的代码:
import pandas as pd
import re
array=[]
data = pd.read_csv('data/data_komentar.csv',encoding = "ISO-8859-1")
df = pd.read_csv('data/hasil_cluster.csv',encoding = "ISO-8859-1")
for index,row in data.iterrows():
kalimat=row[0]
words=re.sub(r'([^\s\w]|_)', '', str(kalimat))
words= re.sub(r'[0-9]+', '', words)
for word in words.split():
kata=word.lower()
df = df[df.eq(kata)]
if df.empty:
print("empty")
else:
print(kata)
if df['centroid;'] is 0:
array=array+[1,0,0,0,0]
if df['centroid'] is 1:
array=array+[0,1,0,0,0]
if df['centroid'] is 2:
array=array+[0,0,1,0,0]
if df['centroid;'] is 3:
array=array+[0,0,0,1,0]
if df['centroid;'] is 4:
array=array+[0,0,0,0,1]
print(array)
【问题讨论】: