二叉搜索树频率计数器答案

【问题标题】：Binary Search Tree Frequency Counter二叉搜索树频率计数器
【发布时间】：2018-11-09 16:30:43
【问题描述】：

我需要读取一个文本文件，去掉不必要的标点符号，将单词小写，并使用二叉搜索树函数来制作一个由文件中的单词组成的单词二叉搜索树。

我们被要求计算重复单词的频率，并询问总字数和总唯一字数。

到目前为止，我已经解决了标点符号，文件读取完成，小写完成，二叉搜索树基本完成，我只需要弄清楚如何在代码中实现“频率”计数器。

我的代码如下：

class BSearchTree :
class _Node :
    def __init__(self, word, left = None, right = None) :
        self._word = word
        self._count = 0
        self._left = left
        self._right = right

def __init__(self) :
    self._root = None
    self._wordc = 0
    self._each = 0

def isEmpty(self) :
    return self._root == None


def search(self, word) :
    probe = self._root
    while (probe != None) :
        if word == probe._word :
            return probe
        if word < probe._value :
            probe = probe._left
        else : 
            probe = probe._right
    return None     

def insert(self, word) :
    if self.isEmpty() :
        self._root = self._Node(word)
        self._root._freq += 1 <- is this correct?
        return

    parent = None               #to keep track of parent
                                #we need above information to adjust 
                                #link of parent of new node later

    probe = self._root
    while (probe != None) :
        if word < probe._word :     # go to left tree
            parent = probe          # before we go to child, save parent
            probe = probe._left
        elif word > probe._word :   # go to right tree
            parent = probe          # before we go to child, save parent
            probe = probe._right

    if (word < parent._word) :      #new value will be new left child
        parent._left = self._Node(word)
    else :    #new value will be new right child
        parent._right = self._Node(word)

因为格式化要了我的命，这是它的后半部分。

class NotPresent(Exception) :
pass

def main():
t=BST()

file = open("sample.txt")           
line = file.readline()                      
file.close()                            


#for word in line:
#   t.insert(word)
# Line above crashes program because there are too many 
# words to add. Lines on bottom tests BST class
t.insert('all')
t.insert('high')
t.insert('fly')
t.insert('can')
t.insert('boars')
#t.insert('all') <- how do i handle duplicates by making 
t.inOrder()        #extras add to the nodes frequency?

感谢您的帮助/试图提供帮助！

【问题讨论】：

标签： python python-3.x search tree binary

【解决方案1】：

首先，最好将Node 的_freq 初始化为1，而不是在BST 的insert() 中进行初始化

（还有1个：在python编码约定中，不建议在写入默认参数值时使用空格。）

    def __init__(self, word, left=None, right=None) :
        self._word = word
        self._freq = 1
        self._left = left
        self._right = right

只需添加最后 3 行：

    probe = self._root
    while (probe != None) :
        if word < probe._word :     # go to left tree
            parent = probe          # before we go to child, save parent
            probe = probe._left
        elif word > probe._word :   # go to right tree
            parent = probe          # before we go to child, save parent
            probe = probe._right
        else:
            probe._freq += 1
            return

【讨论】：

谢谢！现在我只需要能够显示每个单词的频率计数器并同时显示单词总数和唯一单词。
@bluekozo 您已经编写了按顺序遍历。剩下的任务比实施 BST 容易得多。
我能够弄清楚如何在它们旁边打印单词和频率。关于如何计算所有单词和所有唯一单词/字符串的任何提示？
@bluekozo 在打印时累积每个单词的频率。频率为 1 的单词是唯一的。所以你需要两个计数器。