【问题标题】:How can I induce Term::Readline to set the UTF8 flag one the results from readline?如何诱导 Term::Readline 将 UTF8 标志设置为 readline 的结果之一?
【发布时间】:2013-01-15 10:36:04
【问题描述】:

如何诱导Term::Readline 将UTF8 标志设置为readline 的结果之一?

#!/usr/local/bin/perl
use warnings FATAL => qw(all);
use strict;
use 5.10.1;
use utf8;
use open qw( :encoding(UTF-8) :std );
use Term::ReadLine;
use Devel::Peek;

my $term = Term::ReadLine->new( 'test', *STDIN, *STDOUT );
$term->ornaments( 0 );
my $char;

$char = $term->readline( 'Enter char: ' );
Dump $char;

print 'Enter char: ';
$char = <>;
chomp $char;
Dump $char;

输出:

Enter char: ü                                                                                                                                                                                 
SV = PV(0x11ce4c0) at 0x1090078
REFCNT = 1
FLAGS = (PADMY,POK,pPOK)
PV = 0x14552c0 "\374"\0
CUR = 1
LEN = 16
Enter char: ü
SV = PV(0x11ce4c0) at 0x1090078
REFCNT = 1
FLAGS = (PADMY,POK,pPOK,UTF8)
PV = 0x14552c0 "\303\274"\0 [UTF8 "\x{fc}"]
CUR = 2
LEN = 16

评论:

当我在mysql 数据库中搜索时(启用mysql_enable_utf8):

my $stmt = "SELECT * FROM $table WHERE City REGEXP ?";
say $stmt;

# my $term = Term::ReadLine->new( 'table_watch', *STDIN, *STDOUT );
# $term->ornaments( 0 ); 
# my $arg = $term->readline( 'Enter argument: ' ); # ü -> doesn't find 'München'

print "Enter argument: ";
my $arg = <>; # ü -> finds 'München'
chomp $arg;

【问题讨论】:

    标签: perl utf-8 readline


    【解决方案1】:

    为什么?这两个字符串是等价的。这就像 0 存储为 IV 与存储为 UV。

    嗯,您可能必须处理有缺陷的 XS 代码。如果是这种情况,utf8::upgrade($s)utf8::downgrade($s) 可用于更改字符串在标量中的存储方式。

    与编码和解码不同,utf8::upgradeutf8::downgrade 不改变字符串,只改变它的存储方式。

    $ perl -MDevel::Peek -E'
       $_="\xFC";
       utf8::downgrade($d=$_); Dump($d);
       utf8::upgrade($u=$_);   Dump($u);
       say $d eq $u ?1:0;
    '
    SV = PV(0x86875c) at 0x4a9214
      REFCNT = 1
      FLAGS = (POK,pPOK)
      PV = 0x8699b4 "\374"\0
      CUR = 1
      LEN = 12
    SV = PV(0x868784) at 0x4a8f44
      REFCNT = 1
      FLAGS = (POK,pPOK,UTF8)
      PV = 0x869d14 "\303\274"\0 [UTF8 "\x{fc}"]
      CUR = 2
      LEN = 12
    1
    

    【讨论】:

    • 我在我的问题下方写了一条评论。
    • 啊,是的,许多 DBD 确实存在“Unicode 错误”。假设 Term::ReadLine 确实解码了,utf8::upgrade 绝对是你想要的。
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2018-06-16
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多