【问题标题】:Removing the helping words from sentence从句子中删除帮助词
【发布时间】:2015-11-25 16:39:35
【问题描述】:

我制作了一个简单的程序,其中写了一个短语并显示匹配单个单词的视频。假设我输入了“我去上学”。在这里它应该从句子中删除单词“to”并只返回三个单词。 这是我尝试过的代码!它工作正常,但是当我输入一些短语时,它会删除帮助动词,除此之外,它会替换一个空字符串,这会产生问题。大家有什么建议

代码

 class MyPlayer
        {
        string complete_name;
        string root;
        string[] supportedExtensions;
        string videoname;
         public MyPlayer(string snt)
            {
                videoname = snt;
            }
        public List<VideosDetail> test()
            {
                complete_name = videoname.ToLower() + ".wmv";
                root = System.IO.Path.GetDirectoryName(@"C:\Users\Administrator\Desktop\VideosFrame\VideosFrame\Model\");
              supportedExtensions = new[] { ".wmv" };
                var files = Directory.GetFiles(Path.Combine(root, "Videos"), "*.*").Where(s => supportedExtensions.Contains(Path.GetExtension(s).ToLower()));

            List<VideosDetail> videos = new List<VideosDetail>();
            VideosDetail id;
            bool flagefilefound = false;
            foreach (var file in files)
                {
             id = new VideosDetail()
                    {

                        Path = file,
                        FileName = Path.GetFileName(file),
                        Extension = Path.GetExtension(file)
                    };
                    FileInfo fi = new FileInfo(file);
                 if (id.FileName == complete_name)
                    {

                        id.FileName = fi.Name;
                        id.Size = fi.Length;
                        videos.Add(id);
                        flagefilefound = true;
                    }


                    if (flagefilefound)
                        break;
                }

                if (!flagefilefound)
                {
                   MessageBox.Show("no such video is available. ");
                }
               return  videos;
            }

        }

        private void play_Click(object sender, RoutedEventArgs e)
        {
            List<string> chk = new List<string>();
            chk.Add("is");
            chk.Add("am");
            chk.Add("are");
            chk.Add("were");
            chk.Add("was");
            chk.Add("do");
            chk.Add("does");
            chk.Add("has");
            chk.Add("have");

            chk.Add("an");
            chk.Add("the");
            chk.Add("to");
            chk.Add("of");
           string sen = vdo.Text;
           List<string> tmp = new List<string>();
            string[] split = sen.Split(' ');
            foreach (var item in split)
            {
                tmp.Add(item);
            }
            foreach (var item in chk)
            {
                if( sen.Contains(item) )
                {
                    int index = sen.IndexOf(item);
                    sen = sen.Remove(index,item.Length);
                };

            }
         foreach (var i in tmp)
            {

                MyPlayer player = new MyPlayer(i);
                VideoList.ItemsSource = player.test();

            }

        }

【问题讨论】:

  • 看不到您的问题...
  • 那么这段代码的结果是什么?你的问题是什么
  • @MohitShrivastava 编辑了我的问题

标签: c# wpf


【解决方案1】:

您实际上在做的是消除所谓的停用词,并且可能创建词袋

private static HashSet<String> s_StopWords = 
  new HashSet<String>(StringComparer.OrdinalIgnoreCase) {
    "is", "am", "are", "were", "was", "do", "does", "to", "from", // etc.
};

private static Char[] s_Separators = new Char[] {
  '\r', '\n', ' ', '\t', '.', ',', '!', '?', '"', //TODO: check this list 
};

...

String source = "I go to school";

// ["I", "go", "school"] - "to" being a stop word is removed
String[] words = source
  .Split(s_Separators, StringSplitOptions.RemoveEmptyEntries)
  .Where(word => !s_StopWords.Contains(word))
  .ToArray();

// Combine back: "I go school"
String result = String.Join(" ", words);

【讨论】:

  • 我们说过 yesterday 不能仅通过字符串拆分来解析句子。当然,它可以满足这个作业要求,但在现实世界中,你使用的是自然语言处理库。
猜你喜欢
  • 2018-01-15
  • 1970-01-01
  • 2021-08-20
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多