【问题标题】:Found input variables with inconsistent numbers of samples: [24, 25]发现样本数量不一致的输入变量:[24, 25]
【发布时间】:2020-03-14 02:08:05
【问题描述】:

我需要帮助来重塑我的输入以匹配我的输出。我相信我的问题与我的目标变量有关。我收到标题中所述的错误。我尝试过 .reshape 和 .flatten()。请帮忙,提前谢谢

NEnews_train = []
for line in open('/Users/db/Desktop/NE1.txt', 'r'):
    NEnews_train.append(line.strip())



REPLACE_NO_SPACE = re.compile("[.;:!\'?,\"()\[\]]")
REPLACE_WITH_SPACE = re.compile("(<br\s*/><br\s*/>)|(\-)|(\/)")

def preprocess_reviews(reviews):
    reviews = [REPLACE_NO_SPACE.sub("", line.lower()) for line in reviews]
    reviews = [REPLACE_WITH_SPACE.sub(" ", line) for line in reviews]


    return reviews

NE_train_clean = preprocess_reviews(NEnews_train)

from nltk.corpus import stopwords

english_stop_words = stopwords.words('english')
def remove_stop_words(corpus):
    removed_stop_words = []
    for review in corpus:
        removed_stop_words.append(
            ' '.join([word for word in review.split() 
                      if word not in english_stop_words])
        )
    return removed_stop_words

no_stop_words = remove_stop_words(NE_train_clean)



ngram_vectorizer = CountVectorizer(binary=True, ngram_range=(1, 2))
ngram_vectorizer.fit(no_stop_words)
X = ngram_vectorizer.transform(no_stop_words)
X_test = ngram_vectorizer.transform(no_stop_words)

target = [1 if i < 12 else 0 for i in range(25)]

X_train, X_val, y_train, y_val = train_test_split(
    X, target, train_size = 0.75
)

这是错误

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-14-281ec07b46bb> in <module>
      2 
      3 X_train, X_val, y_train, y_val = train_test_split(
----> 4     X, target, train_size = 0.75
      5 )

~/opt/anaconda3/lib/python3.7/site-packages/sklearn/model_selection/_split.py in train_test_split(*arrays, **options)
   2094         raise TypeError("Invalid parameters passed: %s" % str(options))
   2095 
-> 2096     arrays = indexable(*arrays)
   2097 
   2098     n_samples = _num_samples(arrays[0])

~/opt/anaconda3/lib/python3.7/site-packages/sklearn/utils/validation.py in indexable(*iterables)
    228         else:
    229             result.append(np.array(X))
--> 230     check_consistent_length(*result)
    231     return result
    232 

~/opt/anaconda3/lib/python3.7/site-packages/sklearn/utils/validation.py in check_consistent_length(*arrays)
    203     if len(uniques) > 1:
    204         raise ValueError("Found input variables with inconsistent numbers of"
--> 205                          " samples: %r" % [int(l) for l in lengths])
    206 
    207 

ValueError: Found input variables with inconsistent numbers of samples: [24, 25]

我看到人们有类似的错误,但他们的代码与我的有点不同,所以我在尝试解决时有点困惑

【问题讨论】:

  • 请在问题中添加错误堆栈跟踪。
  • 添加了堆栈跟踪以便更好地理解@mitter
  • 这里X的形状是什么?看起来Xtarget 的长度不同。 train_test_split 要求 X.shape[0] == target.shape[0]True

标签: python nlp data-science train-test-split


【解决方案1】:

在您的X 列表中,项目总数为24。但是在您的target 数组中,您正在使用25 值(因为range(25) 返回一个包含[0, 1, 2, ..., 24] 总计'25' 值的数组,不包括25)这就是train_test_split 给出上述错误的原因,因为train_test_split要求X.shape[0] == target.shape[0] 为真。

解决方案:

target = [1 if i &lt; 12 else 0 for i in range(25)]

如果你想从1开始改成`25(included)``

target = [1 if i &lt; 12 else 0 for i in range(1,26)]

或者如果你想从0开始到23(included)那么

target = [1 if i &lt; 12 else 0 for i in range(24)]

【讨论】:

    猜你喜欢
    • 2021-06-20
    • 2018-06-25
    • 1970-01-01
    • 2021-05-06
    • 2020-11-19
    • 2019-12-11
    • 2020-10-21
    • 2020-10-26
    • 2023-03-22
    相关资源
    最近更新 更多