matlab中单元格数组的比较答案

【问题标题】：Comparision of cell arrays in matlabmatlab中单元格数组的比较
【发布时间】：2016-01-20 06:42:57
【问题描述】：

我有两个元胞数组，每个元胞都存储一元和二元，它们是我从文本文件中提取的。现在我必须将每个 unigram 与 bigram 进行比较，以找到 bigram 中出现的 unigram 的计数和后来的概率。谁能帮我解决这个问题，我已经使用了 strcmp 但它不起作用。我在下面写我的代码：

for i = 1
    for j = 1:bigramRow
       bigram1 = regexp(splitBigramCellsA{j},'<s>|\w*|</s>','match');
       b1 = cellfun(@(x,y)[x], bigram1(1:end-1)','un',0)
       match = strcmp(splitUnigramCellsA, splitBigramCellsA{j,1});

        if match ==1
           bigram1count = splitbigramCellsB{j};
            unigram1count = splitUnigramCellsB{j};
            disp(bigram1count)
            disp(unigram1count)
        end
 end
end

【问题讨论】：

你能解释一下一元和二元是什么吗？ splitBigramCells 包含什么？
Unigrams 是句子中每个唯一的单词。 Bigrams是一次取两个词。例如：'It is a lovely day'，包含二元组、'It is'、'is a'、'a lovely'、'lovely day'。

标签： arrays matlab

【解决方案1】：

如果您可以将文本放入内存中，您可以执行以下操作：

创建一个包含所有单词的元胞数组（按顺序）
在元胞数组上调用 unique 并捕获第三个输出。即表示为索引数组的原始文本，其中每个索引引用一个 unigram。
将所有二元组创建为bigrams = [indices(1:2:largestEven),indices(2:2:largestEven);indices(2:2:largestOdd),indices(3:2:largestOdd)]，其中largestEven 是2*floor(length(indices)/2)，largestOdd 是2*floor((length(indices)+1)/2)+1。
计算例如二元组中每个一元组的频率为tabulate(bigrams(:))

【讨论】：