from sklearn.model_selecting import train_test_spilt()
参数stratify: 依据标签y,按原数据y中各类比例,分配给train和test,使得train和test中各类数据的比例与原数据集一样。

例如:A:B:C=1:2:3
split后,train和test中,都是A:B:C=1:2:3
将stratify=X就是按照X中的比例分配
将stratify=y就是按照y中的比例分配
一般都是=y

http://scikit-learn.org/stable/modules/generated/sklearn.cross_validation.train_test_split.html

 

 TF-IDF (Term Frequency - Inverse Document Frequency)

TfidfVectorizer 参数意义:

 

https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html#sklearn.feature_extraction.text.TfidfVectorizer.build_tokenizer

 

 

详细解释:

https://scikit-learn.org/stable/modules/feature_extraction.html#text-feature-extraction

相关文章:

  • 2022-01-07
  • 2022-12-23
  • 2022-12-23
  • 2022-12-23
  • 2022-12-23
  • 2021-07-09
  • 2021-06-06
  • 2021-04-07
猜你喜欢
  • 2022-12-23
  • 2022-12-23
  • 2022-12-23
  • 2021-07-20
  • 2022-12-23
  • 2021-08-01
相关资源
相似解决方案