来日してる US の TypePad エンジニア Garth と話をしているときにでてきた正規表現の話。
$char =~ m/\p{Han}|\p{Hiragana}|\p{Katakana}/; #NG
$char =~ m/[\p{Han}\p{Hiragana}\p{Katakana}]/; #OK
return if $char =~ m/abc|def|ghi/; #NG
return if ($char =~ m/abc/ or $char =~ m/def/ or $char =~ m/ghi/); #OK
"|"(パイプ)を使った正規表現はめちゃくちゃ遅いから使わないように、ということです。確かにベンチマークを取ると32倍速いです。
#!/usr/local/bin/perl
use strict;
use warnings;
use Benchmark;
my $text = ';lskjdf;klvckxv;zijxcv;oa;vlkaefiuqewizlkvnzlxkcnv'
. '.z,xmc v/z.x,cmv.z,xnvlafda isjdnfl aksjdfauerfaie'
. 'jnlfakjdsn;akj;v akjdfvoaijdhfvoiaheriufahpsdiufhaeuhr'
. ' iuahriufhairuhfapsidfalksjfhaiuphrofiankfjas;dofha[s9'
. 'hfskjdf;ase;f,sedhfaiuwhefs,dnvflk dfis fapoisf fqjr';
my $count = 500_000;
timethese($count, {
'00_pipe' => sub { $text =~ m/abc|def|ghi/ },
'01_nopipe' => sub { $text =~ m/abc/ or $text =~ m/def/ or $text =~ m/ghi/ },
});
__END__
$ perl regex.pl
Benchmark: timing 500000 iterations of 00_pipe, 01_nopipe...
00_pipe: 33 wallclock secs (32.48 usr + 0.01 sys = 32.49 CPU) @ 15389.35/s (n=500000)
01_nopipe: 1 wallclock secs ( 1.73 usr + 0.00 sys = 1.73 CPU) @ 289017.34/s (n=500000)

Comments