【发布时间】:2017-06-27 02:21:31
【问题描述】:
我正在阅读Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems。在一个示例中,我在 for 循环中看到了这种语法。
from sklearn.model_selection import StratifiedShuffleSplit
split = StratifiedShuffleSplit(n_splits=1, test_size=0.2, random_state=42)
for train_index, test_index in split.split(housing, housing["income_cat"]):
strat_train_set = housing.loc[train_index]
strat_test_set = housing.loc[test_index]
我打印出了 train_index 和 test_index,它们是索引数组。 这个for循环是什么意思? train_index 和 test_index 有不同数量的元素,迭代如何工作? 这段代码是否等同于下面的代码?
from sklearn.model_selection import StratifiedShuffleSplit
split = StratifiedShuffleSplit(n_splits=1, test_size=0.2, random_state=42)
train_index, test_index = split.split(housing, housing["income_cat"]):
strat_train_set = housing.loc[train_index]
strat_test_set = housing.loc[test_index]
【问题讨论】:
-
我猜
split.split(housing, housing["income_cat"])返回一个二元组值,在for循环中执行train_index, test_index将这两个值分别解包到两个变量中。