【问题标题】:Finding word combinations on domain names在域名上查找单词组合
【发布时间】:2015-07-19 19:06:23
【问题描述】:

我是 PHP 新手,需要一些帮助来完成我的脚本。我有一个可以从域名中获取所有单词的 PHP 脚本。我需要脚本能够找到最有可能是域名关键字的词。

这是我的脚本:

<?php

$domain = trim(htmlspecialchars('where-amigoing.togoto.com'));
preg_match('/(.*?)((\.co)?.[a-z]{2,4})$/i', $domain, $m);
$ext = isset($m[2]) ? $m[2]: '';

$replace = array($ext,'-','.');
$domainWords = str_replace($replace,'',$domain);

//Find Word in Dictionary
function pspell_icheck($dictionary_link, $word) {
  return ( pspell_check($dictionary_link, $word) ||
    strtolower(reset(pspell_suggest($dictionary_link, $word))) == strtolower($word) );
}

//Find Words
function getwords( $string ) {
    if( strpos($string,"xn--") !== false ) {
        return false;
    }
    $pspell = pspell_new( 'en' );
    $check = 0;
    $words = array();
    for( $j = 0; $j < ( strlen( $string ) ); $j++ ) {
        for( $i = 0; $i < strlen( $string ); $i++ ) {
            if( pspell_icheck( $pspell, substr( $string, $j, $i ) ) ) {
                $check++;
                $words[] = substr( $string, $j, $i );
            }
        }
    }
    $words = array_unique( $words );
    if( $check > 0 ) {
        return $words;
    }
    return false;
}

echo 'domain name: '.$domain .'<br>';
echo 'domain words: '.$domainWords .'<br>';
echo 'domain extension: '.$ext .'<br>';
print_r ( getWords( $domainWords ) );

?>

代码输出如下:

域名:where-amigoing.togoto.com 领域词:whereamigoingtogoto 域名后缀:.com 数组 ( [0] => [1] => w [2] => where [4] => h [5] => he [6] => her [7] => here [9] => e [ 10] => er [11] => er [13] => r [14] => re [15] => rea [16] => ream [19] => ea [21] => a [22] => am [23] => ami [24] => amigo [26] => m [27] => mi [28] => mig [30] => i [32] => g [33] =>去 [34] => 去 [36] => o [37] => oi [40] => 在 [42] => n [45] => gt [47] => t [48] => 到 [ 49] => tog [50] => togo [56] => 得到 [59] => ot

我想取数组,找到没有任何单词重叠的单词组合,来确定域名关键字。

有人知道怎么做吗?我知道我需要遍历单词并根据原始域检查它们,但这似乎有点过头了。

【问题讨论】:

    标签: php text-extraction domain-name keyword-search


    【解决方案1】:

    首先你必须从数组中取消设置空白值。 第二次取消设置所有没有任何意义的字母作为单词。 然后试试我的代码:

    <?php
    
    class domainWordsCutter
    {
        private $words;
        private $wordsArray = array();
    
        public function __construct($words)
        {
            $this->words = $words;
        }
    
        public function cutWords($domainWords)
        {
            if(empty($domainWords))
            {
                return true;
            }
            foreach($this->words as $word)
            {
                $wordLen = strlen($word);
                if
                (
                    $wordLen <= strlen($domainWords) && 
                    substr($domainWords, 0, $wordLen) == $word && 
                    $this->cutWords(substr($domainWords, $wordLen))
                )
                {
                    $this->wordsArray[] = $word;
                    return true;
                }
            }
            return false;
        }
    
        public function getWordsArray()
        {
            return $this->wordsArray;
        }
    }
    
    $domainWordsCutter = new domainWordsCutter(array ( 2 => 'where', 5 => 'he', 6 => 'her', 7 => 'here', 10 => 'er', 11 => 'ere', 14 => 're', 15 => 'rea', 16 => 'ream', 19 => 'ea', 21 => 'a', 22 => 'am', 23 => 'ami', 24 => 'amigo', 27 => 'mi', 28 => 'mig', 30 => 'i', 33 => 'go', 34 => 'going', 37 => 'oi', 40 => 'in', 45 => 'gt', 48 => 'to', 49 => 'tog', 50 => 'togo', 56 => 'got', 59 => 'ot', ));
    if($domainWordsCutter->cutWords('whereamigoingtogoto'))
    {
        var_dump($domainWordsCutter->getWordsArray());
    }
    else
    {
        echo 'Not found';
    }
    

    输出:

    array(7) { [0]=> string(2) "to" [1]=> string(2) "go" [2]=> string(2) "to" [3]=> string(5) "going" [4]=> string(2) "mi" [5]=> string(1) "a" [6]=> 字符串(5) "哪里" }

    注意反转顺序。

    【讨论】:

      猜你喜欢
      • 2019-08-20
      • 1970-01-01
      • 2016-08-10
      • 2014-01-30
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2015-09-20
      • 2012-07-07
      相关资源
      最近更新 更多