二、字符解码
从Python 2.x升级到Python 3.0的时候,Unicode的处理有了明显的改善。以下程序将对一个存放在缓冲区的1,000,000个希伯来语单词“shalom”在UTF-8和UTF-16之间来回进行编码和解码处理。此缓冲区总大小为五百万字节。
from __future__ import with_statement
import sys
import time
def test_encode_decode():
shalom = ' \u05dd\u05d5\u05dc\u05e9'
text = shalom * 1000000
start = time.time()
text_utf8 = text.encode('utf-8')
text_utf16 = text.encode('utf-16')
assert text_utf8.decode() == text
assert text_utf16.decode('utf-16') == text
end = time.time() - start
print (shalom, end)
return end
test = test_encode_decode
if __name__=='__main__':
times = [test() for i in range(10)]
times.remove(max(times))
times.remove(min(times))
print('Average:', sum(times) / len(times))
import sys
import time
def test_encode_decode():
shalom = ' \u05dd\u05d5\u05dc\u05e9'
text = shalom * 1000000
start = time.time()
text_utf8 = text.encode('utf-8')
text_utf16 = text.encode('utf-16')
assert text_utf8.decode() == text
assert text_utf16.decode('utf-16') == text
end = time.time() - start
print (shalom, end)
return end
test = test_encode_decode
if __name__=='__main__':
times = [test() for i in range(10)]
times.remove(max(times))
times.remove(min(times))
print('Average:', sum(times) / len(times))
当分别在Python 2.5、2.6、3.0和3.1下运行该程序的时候,得到的结果如下所示:
* Python 2.5 - 1.6552573442459106
* Python 2.6 - 1.6100345551967621
* Python 3.0 - 0.280230671167
* Python 3.1 - 0.205590486526
* Python 2.6 - 1.6100345551967621
* Python 3.0 - 0.280230671167
* Python 3.1 - 0.205590486526
在Python 2.5和2.6下运行此程序的时候,两种的速度大体相当;然而,Python 3.0却要快得多(大约快了5-6倍),而Python 3.1则要比Python 2.x快了近八倍,比Python 3.0快40%左右。