我有这个 LDA 代码，当我运行它时，我不断收到一个难以跟踪的错误答案

【问题标题】：I have this code for LDA when I run it i keep getting an error which is difficult to track我有这个 LDA 代码，当我运行它时，我不断收到一个难以跟踪的错误
【发布时间】：2021-08-26 03:48:46
【问题描述】：

下面是我尝试运行的代码，但我不断收到代码下方的错误，但我很难理解浮点变量的确切访问位置是代码中的变量还是代码中的某个地方数据？如果有人理解这个问题，请帮助我。

import numpy as np
import tqdm
grid = {}
grid['Validation_Set'] = {}
# Topics range
min_topics = 10
max_topics = 20
step_size = 5
topics_range = range(min_topics, max_topics, step_size)
# Alpha parameter
alpha = list(np.arange(0.01, 1, 0.3))
alpha.append('symmetric')
alpha.append('asymmetric')
# Beta parameter
beta = list(np.arange(0.01, 1, 0.3))
beta.append('symmetric')
# Validation sets
num_of_doc = len(corpus)
num_of_docs = int(num_of_doc)
corpus_sets = [# gensim.utils.ClippedCorpus(corpus, num_of_docs*0.25), 
               # gensim.utils.ClippedCorpus(corpus, num_of_docs*0.5), 
               gensim.utils.ClippedCorpus(corpus, num_of_docs*0.75), 
               corpus]
corpus_title = ['75% Corpus', '100% Corpus']
model_results = {'Validation_Set': [],
                 'Topics': [],
                 'Alpha': [],
                 'Beta': [],
                 'Coherence': []
                }

if 1 == 1:
    pbar = tqdm.tqdm(total=540)
    
    # iterate through validation corpuses
    for i in range(len(corpus_sets)):
        # iterate through number of topics
        for k in topics_range:
            # iterate through alpha values
            for a in alpha:
                # iterare through beta values
                for b in beta:
                    # get the coherence score for the given parameters
                    cv = compute_coherence_values(corpus=corpus_sets[i], dictionary=id2word, 
                                                  k=k, a=a, b=b)
                    # Save the model results
                    model_results['Validation_Set'].append(corpus_title[i])
                    model_results['Topics'].append(k)
                    model_results['Alpha'].append(a)
                    model_results['Beta'].append(b)
                    model_results['Coherence'].append(cv)
                    
                    pbar.update(1)
    pd.DataFrame(model_results).to_csv('lda_tuning_results.csv', index=False)
    pbar.close()

以下是我一直陷入的错误如下：

>     >  0%|          | 0/540 [00:00<?, ?it/s]
>     ---------------------------------------------------------------------------
> 
>     
> 
> > TypeError                                 Traceback (most recent call
> > last)
> >     /usr/local/lib/python3.7/dist-packages/gensim/models/ldamulticore.py
> > in update(self, corpus, chunks_as_numpy)
> >         212         try:
> >     --> 213             lencorpus = len(corpus)
> >         214         except TypeError:
> > 
> > 
> > TypeError: 'float' object cannot be interpreted as an integer
> >     
> >     During handling of the above exception, another exception occurred:
> >     
> >     ValueError                                Traceback (most recent call last)
> >     5 frames
> >     /usr/local/lib/python3.7/dist-packages/gensim/utils.py in __iter__(self)
> >         992 
> >         993     def __iter__(self):
> >     --> 994         return itertools.islice(self.corpus, self.max_docs)
> >         995 
> >         996     def __len__(self):
> >     
> >     ValueError: Stop argument for islice() must be None or an integer: 0 <= x <= sys.maxsize.
> >     
> >

【问题讨论】：

标签： python machine-learning nlp lda

【解决方案1】：

当您在 corpus_sets 中创建验证集时，您需要添加一个 int 以便该集不再是浮点数：

gensim.utils.ClippedCorpus(corpus, int(num_of_docs*0.75))

【讨论】：