计算 LSA 时出现“预期二维数组，得到一维数组”错误答案

【问题标题】：Getting an "Expected a 2D array, got a 1D array" error when computing LSA计算 LSA 时出现“预期二维数组，得到一维数组”错误
【发布时间】：2020-11-17 23:11:11
【问题描述】：

我正在为 LSA（潜在语义分析）编写自然语言处理的预处理函数。 tfidf、remove_stopwords 等所有其他函数都可以与我创建的单元测试一起使用。但是，LSA 功能在测试其功能时不断给我以下错误：

"预期的二维数组，得到一维数组： array=['我在橄榄园吃晚饭', '我们要买房', '我没有在橄榄园吃晚饭'，'我们的邻居正在买房子']。如果您的数据具有单个特征，则使用 array.reshape(-1, 1) 重塑您的数据，如果它包含单个样本，则使用 array.reshape(1, -1)。"

这是我的 LSA 函数代码和测试代码：

import pandas as pd
import nltk
import string
import sklearn
from sklearn.decomposition import TruncatedSVD
from sklearn.preprocessing import Normalizer
from sklearn.feature_extraction.text import TfidfVectorizer

def LSA(data, tfidf = True, remove_stopwords=True):
    # done with stop word removal and tf-idf weighting keeping the 100 most common concepts
    text = data.iloc[:,-1] #isolate text column
    
     
    #Define the LSA function
    vectors = sklearn.decomposition.TruncatedSVD(n_components = 2, algorithm = 'randomized', n_iter = 100, random_state = 100)

    vectors.fit(text.tolist())
    svd_matrix = vectors.fit_transform(text.tolist())
    svd_matrix = Normalizer(copy=False).fit_transform(text.tolist())

    dense = svd_matrix.todense()
    denselist = dense.tolist()
    
    data["cleaned_vectorized_document"] = denselist
    return data

这是我正在使用的引发错误的测试代码：

p = pd.DataFrame({'two':[1,2,3,4],'test':['I ate dinner at Olive Garden', 'we are buying a house',
'I did not eat dinner at Olive Garden', 'our neighbors are buying a house']})

print(LSA(p))

【问题讨论】：

标签： python nlp sklearn-pandas lsa

【解决方案1】：

我不确定这是否是您的问题，但您的数组在项目之间缺少逗号，这至少会引发以下错误：

ValueError: 数组的长度必须相同

试试这个：

p = pd.DataFrame({'two':[1,2,3,4],'test':['I ate dinner at Olive Garden', 'we are buying a house', 'I did not eat dinner at Olive Garden', 'our neighbors are buying a house']})

【讨论】：