【发布时间】:2020-03-14 02:08:05
【问题描述】:
我需要帮助来重塑我的输入以匹配我的输出。我相信我的问题与我的目标变量有关。我收到标题中所述的错误。我尝试过 .reshape 和 .flatten()。请帮忙,提前谢谢
NEnews_train = []
for line in open('/Users/db/Desktop/NE1.txt', 'r'):
NEnews_train.append(line.strip())
REPLACE_NO_SPACE = re.compile("[.;:!\'?,\"()\[\]]")
REPLACE_WITH_SPACE = re.compile("(<br\s*/><br\s*/>)|(\-)|(\/)")
def preprocess_reviews(reviews):
reviews = [REPLACE_NO_SPACE.sub("", line.lower()) for line in reviews]
reviews = [REPLACE_WITH_SPACE.sub(" ", line) for line in reviews]
return reviews
NE_train_clean = preprocess_reviews(NEnews_train)
from nltk.corpus import stopwords
english_stop_words = stopwords.words('english')
def remove_stop_words(corpus):
removed_stop_words = []
for review in corpus:
removed_stop_words.append(
' '.join([word for word in review.split()
if word not in english_stop_words])
)
return removed_stop_words
no_stop_words = remove_stop_words(NE_train_clean)
ngram_vectorizer = CountVectorizer(binary=True, ngram_range=(1, 2))
ngram_vectorizer.fit(no_stop_words)
X = ngram_vectorizer.transform(no_stop_words)
X_test = ngram_vectorizer.transform(no_stop_words)
target = [1 if i < 12 else 0 for i in range(25)]
X_train, X_val, y_train, y_val = train_test_split(
X, target, train_size = 0.75
)
这是错误
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-14-281ec07b46bb> in <module>
2
3 X_train, X_val, y_train, y_val = train_test_split(
----> 4 X, target, train_size = 0.75
5 )
~/opt/anaconda3/lib/python3.7/site-packages/sklearn/model_selection/_split.py in train_test_split(*arrays, **options)
2094 raise TypeError("Invalid parameters passed: %s" % str(options))
2095
-> 2096 arrays = indexable(*arrays)
2097
2098 n_samples = _num_samples(arrays[0])
~/opt/anaconda3/lib/python3.7/site-packages/sklearn/utils/validation.py in indexable(*iterables)
228 else:
229 result.append(np.array(X))
--> 230 check_consistent_length(*result)
231 return result
232
~/opt/anaconda3/lib/python3.7/site-packages/sklearn/utils/validation.py in check_consistent_length(*arrays)
203 if len(uniques) > 1:
204 raise ValueError("Found input variables with inconsistent numbers of"
--> 205 " samples: %r" % [int(l) for l in lengths])
206
207
ValueError: Found input variables with inconsistent numbers of samples: [24, 25]
我看到人们有类似的错误,但他们的代码与我的有点不同,所以我在尝试解决时有点困惑
【问题讨论】:
-
请在问题中添加错误堆栈跟踪。
-
添加了堆栈跟踪以便更好地理解@mitter
-
这里
X的形状是什么?看起来X和target的长度不同。train_test_split要求X.shape[0] == target.shape[0]为True
标签: python nlp data-science train-test-split