在PHP中扩展str_word_count函数
PHP中的str_word_count()函数完全可以实现它所说的功能。此功能的默认值是仅计算出现的单词数。取以下字符串。
$str="Thisisa'string'containingm0rethanoneword.Thisisa'string'containingm0rethanoneword.";
如果将其传递给str_word_count()不带其他参数的函数,则将获得单词数。
echostr_word_count($str);//prints20
第二个参数是从函数返回的值的类型。默认值为0,但1和2也可用。使用1作为第二个参数将返回一个数组,其中包含在字符串中找到的所有单词。使用2将返回一个关联数组,其中键是单词在字符串中的数字位置,而值是实际单词本身。这是将第二个参数设置为1的结果。
print_r(str_word_count($str, 1)); /* prints Array ( [0] => This [1] => is [2] => a [3] => 'string' [4] => containing [5] => m [6] => re [7] => than [8] => one [9] => word [10] => This [11] => is [12] => a [13] => 'string' [14] => containing [15] => m [16] => re [17] => than [18] => one [19] => word ) */
这是将第二个参数设置为2的结果。
print_r(str_word_count($str, 2)); /* prints Array ( [0] => This [5] => is [8] => a [10] => 'string' [19] => containing [30] => m [32] => re [35] => than [40] => one [44] => word [50] => This [55] => is [58] => a [60] => 'string' [69] => containing [80] => m [82] => re [85] => than [90] => one [94] => word ) */
第三个参数是应视为单词的字符列表。请注意,该字符串包含单词“m0re”,o替换为0。此函数将其分为两个单词,即“m”的前部和“re”的后部。要强制此功能使用零作为单词的一部分,请将其作为第三个参数包含在字符串中。
print_r(str_word_count($str, 1, '0')); /* prints Array ( [0] => This [1] => is [2] => a [3] => 'string' [4] => containing [5] => m0re [6] => than [7] => one [8] => word [9] => This [10] => is [11] => a [12] => 'string' [13] => containing [14] => m0re [15] => than [16] => one [17] => word ) */
那么如何扩展此功能呢?好吧,假设您只想打印一些文本的摘录,则可以将此函数用作另一个函数的一部分,如下所示。
function limit_text($text,$limit) { $text = strip_tags($text); $words = str_word_count($text, 2); $pos = array_keys($words); if ( count($words) > $limit ) { $text = trim(substr($text, 0, $pos[$limit])).'...'; }; return $text; }
您可以通过以下方式使用此功能。
echo limit_text($str, 12); // prints - This is a 'string' containing m0re than one word. This is...
如果您要创建一个脚本来生成RSS提要,或者在站点的另一页面上显示来自网页的文本的起始位,则此功能非常有用。
如果您想计算每个单词出现的次数,或者作为关键字密度计算的一部分,请使用以下代码。
$wordfreq = array_count_values(str_word_count($str, 1, '0')); print_r($wordfreq); /*prints Array ( [This] => 2 [is] => 2 [a] => 2 ['string'] => 2 [containing] => 2 [m0re] => 2 [than] => 2 [one] => 2 [word] => 2 ) */