Python自学笔记 教材:Oreilly.Introducing.Python.2014_Bill_Lubanovic 翻译整理txt作者: yingshaoxo —————————————— 教材自述: This book will introduce you to the Python programming language. 这本书将为你介绍Python编程语言。 It’s aimed at beginning programmers, based on Python version 3.3. 它为初学者而写,基于Python3.3。 —————————————— Python的运用领域: You’ll find Python in many computing environments, including the following: 你将发现Python在许多计算机环境,包括如下: • The command line in a monitor or terminal window • 在控制器里的命令行或终端窗口 • Graphical user interfaces, including the Web • 图像用户界面,包括网页上的 • The Web, on the client and server sides • 网络,在客户端和服务端 • Backend servers supporting large popular sites • 支撑大型流行网站的服务 • The cloud (servers managed by third parties) • 云平台 • Mobile devices • 移动设备 • Embedded devices • 内植设备 We’ll look at its uses in websites, system administration, and data manipulation. 我们将看到它使用在网站,系统管理,数据分析。 We’ll also look at specific uses of Python in the arts, science, and business. 我们也将看到Python用在艺术,科学,商业这些特殊领域。 —————————————— 一句话定义Python: Python是一种类似自然语言的高级解释型脚本编程语言。 —————————————— 安装Python: 一、Windows 1.访问以下网址: https://www.python.org/downloads/windows/ 2.下载最新版本(建议下载x86版本) 3.解压并安装(具体可谷歌或百度"安装Python") 4.打开cmd.exe,输入"Python"并回车,可见版本信息 5.安装成功 二、Android Google play商店里安装QPython3 —————————————— say hello: 运行cmd,在终端窗口输入 >>>python并回车 接着输入 >>>print ("hello, world!") hello world —————————————— Python中简单的数据类型: • booleans (which have the value True or False) • 布尔值(值为真或假) • integers (whole numbers such as 42 and 100000000) • 整数型(类似42和100000000这样的整数) • floats (numbers with decimal points such as 3.14159, or sometimes exponents like 1.0e8, which means one times ten to the eighth power, or 100000000.0) • 浮点数(像3.14159一样带着小数点的数,或者像1.0e8,或者像100000000.0) • strings (sequences of text characters) • 文本型(一串字符序列) —————————————— 对象: In Python, everything—booleans, integers, floats, strings, even large data structures, functions, and programs—is implemented as an object. 在Python中,一切逻辑型,整数型,浮点型,文本型,甚至大型数据结构,函数,程序——都是作为一个对象执行。 Also, Python is strongly typed, which means that the type of an object does not change, even if its value is mutable. 同时,Python是一种强类型语言,那意味着对象的类型不能改变,即使它的值是可变的。 —————————————— 变量声明与赋值: >>>number=5 将整数5赋给number这个变量 由于值为整数,所以number也自然叫做整数型变量number python通常不需要单独声明变量,在赋值时将自动声明 —————————————— 输出变量: >>>a=25 >>>print (a) 25 —————————————— type("what am I?") 获取某值的类型 如: >>>type(58) >>>type(99.9) >>>type('abc') —————————————— 类型与对象: A class is the definition of an object. 类是对象的定义。 In Python, “class” and “type” mean pretty much the same thing. 在Python中,“类”和“型”的意义非常像一回事。 —————————————— 变量命名规则,如下: 一、允许包含 1.大小写字母a到z 2.数字0到9 3.下划线_ 二、禁止使用 1.数字开头 2.使用如下词语: False,class,finally,is,return,None,continue,for,lambda,try,True,def,from,nonlocal,while,and,del,global,not,with,as,elif,if,or,yield assert,else,import,pass,break,except,in,raise —————————————— 数学运算符: addition+加 >>>5 + 8 13 subtraction-减 >>>90 - 10 80 multiplication*乘 >>>4 * 7 28 floating point division/小数除法 >>>7 / 2 3.5 integer (truncating) division//整数除法 >>>7 // 2 3 modulus (remainder)%求余数 >>>7 % 3 1 exponentiation**求幂(某数的多少次方) >>>3 ** 4 81 Special example(特殊使用): >>>a=95 >>>a-=3 >>>a 92 >>>a+=8 >>>a 100 >>>a*=2 >>>a 200 >>>a/=3 >>>a 66.66666666666667 >>>a=13 >>>a//=4 >>>a 3 —————————————— int('123') 可将其它类型数值转化为int整数型 float('98.6') 可将其它类型数值转化为float浮点型 str(520) 可将其它类型数值转化为string文本型 >>>a = 7 >>>float(a) 7.0 —————————————— print ('\n') 输出换行符 print ('\t') 输出TAB符 print ('\'') 或 print ('\"') 输出' 或 " print ('\\') 输出\ —————————————— 拼接字符串: >>>print ('I ' + 'love ' + 'you.') I love you. —————————————— 字符串乘法: >>>print ('ok ' * 3 +'you win!') ok ok ok you win! —————————————— 用[]从字符串提取一个字符: >>>Astr="do you do!" Astr[0]表示d Astr[1]表示o Astr[3]表示y … Astr[-1]表示! >>>print ('Just ' + Astr[0] + Astr[1] + Astr[-1] ) Just do! —————————————— 替换字符串中字符: >>>sex = 'woman' >>>sex = sex.replace('wo','') >>>print (sex) man —————————————— 用[ start : end : step ]取字符: 假设从1开始数字符个数, [ 所取字符首字母下标值-1 : 所取字符尾字母下标值 : 每隔多少字符取一个+1 ] • [:] 取整个字符串 • [ start :] 从指定字符位置到全文结束位置 • [: end ] 从全文开始位置到指定字符位置 • [ start : end ] 从指定开始位置到指定结束位置 • [ start : end : step ] 从指定开始位置到指定结束位置,每隔step-1个字符取一个字符 ABC = 'abcdefghijklmnopqrstuvwxyz' ABC[20:]为 'uvwxyz' ABC[10:]为 'klmnopqrstuvwxyz' ABC[12:15]为 'mno' ABC[-3:]为 'xyz' ABC[18:-3]为 'stuvw' ABC[::7]为 'ahov' ABC[4:20:3]为 'ehknqt' >>>print (ABC[:7]) abcdefg >>>print (ABC[1:2]) b —————————————— len('How long?') 取字符串长度 —————————————— list(列表)的定义: A list is a sequence of values, separated by commas and surrounded by square brackets. list就是由多个值组成的序列,并且它被方括号所包含,每个值用逗号隔开。 like this ['YS','Pea'] —————————————— 用split()分割文本并转化为list: >>>text = 'dog,cat,bird' >>>Alist = text.split(',') >>>print (Alist) ['dog', 'cat', 'bird'] —————————————— 用join()把list变为文本: >>>Alist = ['dog','cat','bird'] >>>Astring = '; '.join(Alist) >>>print (Astring) 'dog; cat; bird' —————————————— 玩玩文本: >>>poem='''Rain is falling all around, It falls on field and tree, It rains on the umbrella here, And on the ships at sea.''' 得到前15个字符 >>>poem[:15] 'Rain is falling' 得到诗歌的总字符数,包括空格与换行符 >>>len (poem) 114 以Rain开始? >>>poem.startswith('Rain') True 以sea.结束? >>>poem.endswith('sea.') True 寻找on第一次在文中出现的位置 >>>poem.find('on') 38 寻找on最后一次在文中出现的位置 >>>poem.rfind('on') 94 on在文中出现了几次呢? >>>poem.count('on') 3 文中只有字母或数字吗? >>>poem.isalnum() False —————————————— 删除文中指定字符串: >>>text = 'Smile...' >>>text.strip('.') 'Smile' —————————————— 英文大小写处理: >>>en = 'i love English' 大写第一个单词的第一个字母 >>>en.capitalize() 'I love English' 大写每个单词的第一个字母 >>>en.title() 'I Love English' 将所有字母变成大写 >>>en.upper() 'I LOVE ENGLISH' 将所有字母变成小写 >>>en.lower() 'i love english' 大小写反转 >>>en.swapcase() 'I LOVE eNGLISH' —————————————— 文本对齐: text = '>_<' 中对齐 >>>text.center(30) ' >_< ' 左对齐 >>>text.ljust(30) '>_< ' 右对齐 >>>text.rjust(30) ' >_<' —————————————— 小练习:输出一天有多少秒 day.py day = 60*60*24 print ('seconds of a single day is :' + str(day)) —————————————— 关于Lists Lists are good for keeping track of things by their order, especially when the order and contents might change. 列表是好的,对于一些按顺序排列的事物,尤其当数据顺序和数据内容有可能改变时。 lists are mutable. 列表是可变的。 You can change a list in-place, add new elements, and delete or overwrite existing elements. 可以更改列表,添加新的元素,删除或重写现有元素。 The same value can occur more than once in a list. 在列表中同样的值可以出现多次。 —————————————— 用[]或list()创建列表: >>>empty_list=[] >>>weekdays=['Monday','Tuesday','Wednesday','Thursday','Friday'] >>>mixture=['text',66,'string',25.8,''] >>>another_empty_list=list() —————————————— 将其它类型转为list: 从文本 >>>list('cat') ['c','a','t'] 从元组 >>>a_tuple=('do','not','worry') >>>list(a_tuple) ['do','not','worry'] 用分割法 >>>birthday='1/6/1952' >>>birthday.split('/') ['1','6','1952'] >>>splitme='a/b//c/d///e' >>>splitme.split('/') ['a','b','','c','d','','','e'] >>>splitme='a/b//c/d///e' >>>splitme.split('//') ['a/b','c/d','/e'] —————————————— 从列表中取出元素: >>>Man=['YS','XiaoLi','Pea'] >>>Man[0] 'YS' >>>Man[1] 'XiaoLi' >>>Man[2] 'Pea' >>>Man[-1] 'Pea' >>>Man[-2] 'XiaoLi' >>>Man[-3] 'YS' >>>Man[77] IndexError:list index out of range —————————————— 多重列表: >>>sky = ['sparrow', 'batterfly'] >>>ground = ['tiger', 'monkey'] >>>sea = ['whale', 'shark', 666] >>>all = [sky, ground, sea] >>>all[0][1] 'batterfly' >>>all[2][2] 666 —————————————— 改变list列表的某项值: >>>Alist = ['Get', 'up!'] >>>Alist[1] = 'down!' >>>print (Alist) ['Get', 'down!'] —————————————— 用[strat:end;step]取list中一部分: >>>Alist = [1,2,3,4,5,6,7] >>>Alist[0:1] [1] >>>Alist[0:7:2] [1, 3, 5, 7] —————————————— append(): 为list添加元素 >>>Alist = ['A','B','C'] >>>Alist.append('D') >>>print (Alist) ['A', 'B', 'C', 'D'] 为list追加list >>>Alist = ['A'] >>>another_list = ['GG', 'boy'] >>>Alist.append(another_list) >>>print (Alist) ['A', ['GG', 'boy']] —————————————— 使用extend() 或 += 为list增加单个元素(不建议使用): >>>marxes=['Groucho','Chico','Harpo','Zeppo'] >>>others=['Gummo','Karl'] >>>marxes.extend(others) >>>marxes ['Groucho','Chico','Harpo','Zeppo','Gummo','Karl'] >>>marxes=['Groucho','Chico','Harpo','Zeppo'] >>>others=['Gummo','Karl'] >>>marxes+=others >>>marxes ['Groucho','Chico','Harpo','Zeppo','Gummo','Karl'] —————————————— Add an Item by Offset with insert(): 用下标为列表添加一个项目: >>>marxes=['Groucho','Chico','Harpo','Zeppo'] >>>marxes.insert(3,'Gummo') >>>marxes ['Groucho','Chico','Harpo','Gummo','Zeppo'] —————————————— Delete an Item by Offset with del: 删除一个项目通过,del list_name[下标]: >>>human = ['woman','man','child'] >>>del human[1] >>>human ['woman', 'child'] —————————————— Delete an Item by Value with remove() 删除一个项目通过 .remove(值) >>>human = ['woman','man','child'] >>>human.remove('man') >>>human ['woman', 'child'] —————————————— 通过下标显示并删除列表中的一项: >>>marxes = ['Groucho','Chico','Harpo'] >>>marxes.pop(1) 'Chico' >>>marxes ['Groucho','Harpo'] —————————————— Find an Item’s Offset by Value with index(): 查明一个项目在列表的下标,通过index(值): >>>country=['China','Japan','germany'] >>>country.index('China') 0 —————————————— 用in检测列表中的某个值是否存在: >>>country=['China','Japan','germany'] >>>'Japan' in country True —————————————— 取得一个值在列表出现的次数: >>>country=['China','Japan','germany'] >>>country.count('China') 1 —————————————— 重温join()与split(): >>>friends=['C++','Python','E'] >>>separator=' ** ' >>>joined=separator.join(friends) >>>joined 'C++ ** Python ** E' >>>separated=joined.split(separator) >>>separated ['C++','Python','E'] >>>separated==friends True —————————————— 用sort()重排序列表: 改变原列表,默认升序、字母表排列 >>>Alist = ['C','B','A'] >>>Alist.sort() >>>Alist ['A', 'B', 'C'] 新建列表,默认升序、字母表排列 >>>Alist = ['C','B','A'] >>>new_list = sorted(Alist) >>>new_list ['A', 'B', 'C'] 只能用sort排序含不同类型的列表: >>>numbers=[2,1,4.0,3] >>>numbers.sort() >>>numbers [1,2,3,4.0] 反转,降序排列 >>>numbers=[2,1,4.0,3] >>>numbers.sort(reverse=True) >>>numbers [4.0,3,2,1] —————————————— len()取列表总项数 >>>marxes=['Groucho','Chico'] >>>len(marxes) 2 —————————————— 赋值列表的变与不变: 改变原列表 >>>a=[1,2,3] >>>b=a >>>b[0]='surprise' >>>a ['surprise',2,3] 不改变原列表 a=[1,2,3] b=a.copy() b[0]='surprise' a a=[1,2,3] b=list(a) b[0]='surprise' a a=[1,2,3] b=a[:] b[0]='surprise' a —————————————— Tuples,类似固定的list: Similar to lists, tuples are sequences of arbitrary items. 类似于列表,tuples也是一种任意类型的序列。 Unlike lists, tuples are immutable, meaning you can’t add, delete, or change items after the tuple is defined. 不同于列表,tuples是不可变的,意味着在序列被确定之后,你不能添加、删除,或者改变序列。 So, a tuple is similar to a constant list. 所以,一个tuple类似于一个固定的列表。 —————————————— 通过()创建tuple: >>>empty_tuple=() >>>empty_tuple () >>>many_tuple='Groucho','Chico','Harpo' >>>many_tuple ('Groucho','Chico','Harpo') >>>mix_tuple=('Groucho',77,'Harpo') >>>mix_tuple) ('Groucho',77,'Harpo') —————————————— 利用tuple快速赋值同类型的值: >>>Atuple=('good','well') >>>a,b=Atuple >>>a 'good' >>>b 'well' —————————————— tuple快速赋值: >>>he_said='shit' >>>I_meanings='wow' >>>He,I=he_said,I_meanings >>>He 'shit' >>>I 'wow' —————————————— 用tuple()得到一个tuple: >>>marx_list=['Groucho','Chico','Harpo'] >>>tuple(marx_list) ('Groucho','Chico','Harpo') —————————————— Dictionaries(dict) 字典 A dictionary is similar to a list, but the order of items doesn’t matter, and they aren’t selected by an offset such as 0 or 1. 字典与列表相似,但是它的顺序并不重要,它们不是通过诸如0或1这样的下标确定的。 Instead, you specify a unique key to associate with each value. 作为代替的,你指定一个特别的key来连接每个值。 This key is often a string, but it can actually be any of Python’s immutable types: boolean, integer, float, tuple, string, and others that you’ll see in later chapters. key通常是文本,但是它确实能是任何Python的类型,如:逻辑型、整数型、浮点型、元组、文本型,和一些你会在后面看到的其它类型。 Dictionaries are mutable, so you can add, delete, and change their key-value elements. 字典是可变的,所以你可以添加、删除,和改变它们的key值所对应的元素。 —————————————— 用{}创建一个dict: >>>empty_dict={} >>>empty_dict {} >>>about_me={'name':'YS','years':'18'} >>>about_me {'name': 'YS', 'years': '18'} —————————————— 将其它类型转为dict: 从list >>> a=[[1,2],[3,4]] >>> dict(a) {1: 2, 3: 4} 从tuple >>>tol=(['a','b'],['c','d'],['e','f']) >>>dict(tol) {'c':'d','a':'b','e':'f'} 从文本list >>>los=['ab','cd','ef'] >>>dict(los) {'c':'d','a':'b','e':'f'} 从文本tuple >>>tos=('ab','cd','ef') >>>dict(tos) {'c':'d','a':'b','e':'f'} —————————————— Add or Change an Item by [ key ]: 通过[key]添加或改变dict中的一项: 如果dict中没有指定的key,添加 >>>Adict={'A':1,'B':2} >>>Adict['C']=3 >>>Adict {'A': 1, 'C': 3, 'B': 2} 如果dict中有指定的key,更新 >>>Adict={'A':1,'B':2} >>>Adict['A']=999 >>>Adict {'A': 999, 'B': 2} —————————————— Combine Dictionaries with update(): 用update()结合更新dict: >>>dict1={'A':1,'B':2} >>>dict2={'C':3,'D':4,'A':999} >>>dict1.update(dict2) >>>dict1 {'A': 999, 'C': 3, 'B': 2, 'D': 4} —————————————— 删除一个项目通过del name.[key]: >>>Adict={'A':1,'B':2} >>>del Adict['A'] >>>Adict {'B': 2} —————————————— 用.clear()删除dict所有项目: >>>Adict={'A':1,'B':2} >>>Adict.clear() >>>Adict {} —————————————— Test for a Key by Using in: 用in检查一个key是否存在: >>>Adict={'A':1,'B':2} >>>'A' in Adict True —————————————— 得到dict值的两种方式: >>>Adict={'A':1,'B':2} 通过[],未知key会报错 >>>Adict['A'] 1 通过get(),未知key不会报错 >>>Adict.get('B') 2 >>>Adict.get('god') —————————————— 得到dict所有的key或值: >>>signals={'green':'go','yellow':'go faster','red':'smile for the camera'} 得到所有key: >>>list(signals.keys()) ['green','red','yellow'] 得到所有value: >>>list(signals.values()) ['go','smile for thecamera','go faster'] 得到所有key与value >>>list(signals.items()) [('green','go'),('red','smile for the camera'),('yellow','go faster')] —————————————— copy()对dict同样适用: >>>a={1:'A',2:'B'} >>>b=a.copy() {1: 'A', 2: 'B'} —————————————— A set is like a dictionary with its values thrown away, leaving only the keys. —————————————— Create with set(): >>>set_numbers={0,2,4,6,8} >>>set_numbers {0,8,2,4,6} —————————————— Convert from Other Data Types to set: >>>set(['Dark','White','Painter']) {'Dark','White','Painter'} —————————————— Then, some interesting things happened. —————————————— set中的功能函数: >>>a={1,2} >>>b={2,3} 两种方式得到交集: >>>a&b {2} >>>a.intersection(b) {2} 两种方式得到并集: >>>a|b {1,2,3} >>>a.union(b) {1,2,3} a减去b后剩余的元素: >>>a-b {1} >>>a.difference(b) {1} a⊆b? >>>a<=b False >>>a.issubset(b) False a⊇b? >>>a>=b False >>>a.issuperset(b) False —————————————— Compare Data Structures 比较几种数据结构 To review: you make a list by using square brackets ([]), a tuple by using commas, and a dictionary by using curly brackets ({}). In each case, you access a single element with square brackets: >>>marx_list=['Groucho','Chico','Harpo'] >>>marx_tuple='Groucho','Chico','Harpo' >>>marx_dict={'Groucho':'banjo','Chico':'piano','Harpo':'harp'} >>>marx_list[2] 'Harpo' >>>marx_tuple[2] 'Harpo' >>>marx_dict['Harpo'] 'harp' For the list and tuple, the value between the square brackets is an integer offset. For the dictionary, it’s a key. For all three, the result is a value. —————————————— Make Bigger Data Structures 创建更大的数据结构 >>>marxes=['Groucho','Chico','Harpo'] >>>pythons=['Chapman','Cleese','Gilliam','Jones','Palin'] >>>stooges=['Moe','Curly','Larry'] 用list作元素,创建tuple >>>tuple_of_lists=marxes,pythons,stooges >>>tuple_of_lists (['Groucho','Chico','Harpo'], ['Chapman','Cleese','Gilliam','Jones','Palin'], ['Moe','Curly','Larry']) 用list作元素,创建list >>>list_of_lists=[marxes,pythons,stooges] >>>list_of_lists [['Groucho','Chico','Harpo'], ['Chapman','Cleese','Gilliam','Jones','Palin'], ['Moe','Curly','Larry']] 用list作元素,创建字典 >>>dict_of_lists={'Marxes':marxes,'Pythons':pythons,'Stooges':stooges} >>dict_of_lists {'Stooges':['Moe','Curly','Larry'], 'Marxes':['Groucho','Chico','Harpo'], 'Pythons':['Chapman','Cleese','Gilliam','Jones','Palin']} —————————————— 总结: In this chapter, you saw more complex data structures: lists, tuples, dictionaries, and sets. —————————————— Now you’ll see how to structure Python code, not just data. 现在你将看到如何组织Python代码,而不只是数据。 —————————————— Python is unusual in this use of white space to define program structure. Python因使用空格来定义程序结构而显得不寻常。 —————————————— Comment with # 用#添加注释 print ('YS') #只是输出个人名而已 —————————————— Continue Lines with \ 用反斜线继续执行该行代码 如果一行代码超过80个字符,显得太长,不妨在该行结尾加一个\,这样在下一行写代码也相当于在同一行写。 >>>1+2+\ ...3 6 —————————————— Compare with if, elif, and else 比较if, elif,和else 只是如果,否则 >>>disaster=True >>>if disaster: ... print("Woe!") ...else: ... print("Whee!") ... Woe! • Assigned the boolean value True to the variable named disaster • Performed a conditional comparison by using if and else, executing different code depending on the value of disaster • Called the print() function to print some text 如果,否则如果(else if),否则如果…否则 >>>color="puce" >>>if color=="red": ... print("It's a tomato") ... elif color=="green": ... print("It's a green pepper") ... elif color=="bee purple": ... print("I don't know what it is, but only bees can see it") ... else: ... print("I've never heard of the color") ... I've never heard of the color —————————————— 一般每个代码块前的空格以4为单位。 —————————————— Python’s comparison operators are: Python中的比较运算符是: equality == inequality != less than < less than or equal <= greater than > greater than or equal >= membership in … —————————————— 比较运算符事例: >>>x=7 Now, let’s try some tests: >>>x==5 False >>>x==7 True >>>5>>x<10 True >>>5>>5>>510 False >>>510 True >>>5>>5>>count = 1 >>>while count <= 5: ... print(count) ... count += 1 ... 1 2 3 4 5 >>> 判断为真继续循环,判断为假退出循环 —————————————— 用break退出while循环: >>> x = 1 >>> while x<= 100: ... print (x) ... if (x==7): ... break ... x += 1 ... 1 2 3 4 5 6 7 >>> —————————————— 用continue跳过下面的代码,进入新的一次循环 >>> x = 1 >>> while x< 10: ... x += 1 ... if (x<=7): ... continue ... print (x) ... 8 9 10 >>> —————————————— 以后的代码就不再以编辑框的形式展出,直接以源码的形式给出。 至于运行结果,需要你自己尝试。 —————————————— 用for来重复list: A=['S','B','is','not','me'] for i in A: print(i) —————————————— 用for来重复dict: Adict={'A':1,'B':2,'C':3} 得到keys for someone in Adict.keys(): print (someone) 得到values for someone in Adict.values(): print (someone) 得到整个items for someone in Adict.items(): print (someone) —————————————— 在for循环中依然可以使用break跳出整个循环;continue跳过其下的代码开始下一次循环。 —————————————— Check break Use with else 用else检测break 在while或for的循环中,If break was not called, the else statement is run. for i in [1,2,3,4,5]: if (i==-99): break print (i) else: print ('You finished it, no break, isn\'t it?') —————————————— Iterate Multiple Sequences with zip() 用zip()循环复合队列 days=['Monday','Tuesday','Wednesday'] fruits=['banana','orange','peach'] drinks=['coffee','tea','beer'] desserts=['tiramisu','ice cream','pie','pudding'] for day,fruit,drink,dessert in zip(days,fruits,drinks,desserts): print(day,": drink",drink,"- eat",fruit,"- enjoy",dessert) zip() stops when the shortest sequence is done. so no one gets any pudding unless we extend the other lists. —————————————— 用zip()结合tuples,进而转为list或dict: english='Monday','Tuesday','Wednesday' french='Lundi','Mardi','Mercredi' #Use zip() to pair these tuples. A=list(zip(english,french)) print (A) #Feed the result of zip() directly to dict() B=dict(zip(english,french)) print (B) —————————————— Generate Number Sequences with range() 用range()得到普通的数字序列 range( start, stop, step ) for i in range(0,3): print (i) #取得0到2的数。 —————————————— comprehension 理解力 包含力 理解练习 领悟能力 包括 含蓄 概括公理 【逻】内包 【修辞学】推知法 【宗】包容政策 会意 综合 领悟 学问 —————————————— iterable 网络可迭代的; 可重复的; 迭代的; —————————————— List Comprehensions (递推式构造列表) looking something like this: [ expression for item in iterable if condition ] [表达式 for 得值变量 in 数组 if 条件] Let’s make a new comprehension that builds a list of only the odd numbers between 1 and 5 (remember that number % 2 is True for odd numbers and False for even numbers): a_list=[number for number in range(1,6) if number%2==1] print (a_list) —————————————— 对for进行嵌套: rows=range(1,4) cols=range(1,3) for row in rows: for col in cols: print(row,col) —————————————— 以Comprehension的形式对for进行嵌套: [表达式 for 得值变量 in 数组 ... ] Now, let’s use a comprehension and assign it to the variable cells, making it a list of (row, col) tuples: rows=range(1,4) cols=range(1,3) cells=[(row,col) for row in rows for col in cols] for cell in cells: print(cell) 注意,这里的comprehension构造,在括号里是从左向右运行的。 —————————————— 两个嵌套的结构不同,结果确是一样的 为了好看懂,可以用原始的for嵌套 为了代码少,可以用comprehension形式 —————————————— Dictionary Comprehensions (递推式构造字典) Not to be outdone by mere lists, dictionaries also have comprehensions. The simplest form looks familiar: { key_expression : value_expression for expression in iterable } Similar to list comprehensions, dictionary comprehensions can also have if tests and multiple for clauses: >>>word='letters' >>>letter_counts={letter : word.count(letter) for letter in word} >>>letter_counts {'l':1,'e':2,'t':2,'r':1,'s':1} We are running a loop over each of the seven letters in the string 'letters' and counting how many times that letter appears. Two of our uses of word.count(letter) are a waste of time because we have to count all the e’s twice and all the t’s twice. But, when we count the e’s the second time, we do no harm because we just replace the entry in the dictionary that was already there; the same goes for counting the t’s. So, the following would have been a teeny bit more Pythonic: >>>word='letters' >>>letter_counts={letter : word.count(letter) for letter in set(word)} >>>letter_counts {'t':2,'l':1,'e':2,'r':1,'s':1} The dictionary’s keys are in a different order than the previous example, because iterating set(word) returns letters in a different order than iterating the string word. word.count(letter)的含义是,数letter表示的单一字符在word表示的字符串中出现的次数。 其中为什么不出现两个t和e?那是因为字典的key值不允许重复,重复就跳过它。 —————————————— 可以把以前的list comprehension看作是有理解力的,带表达式、带逻辑判断的list。 同理,dict comprehensions也是有"理解力"的dict —————————————— Set Comprehensions { expression for expression in iterable } >>>a_set={number for number in range(1,6) if number%3==1} >>>a_set {1,4} 意为取1到5,除3得到的余数为1的数 —————————————— Functions A function can take any number and type of input parameters and return any number and type of output results. You can do two things with a function: • Define it • Call it —————————————— Define a Python function, and use it: >>>def How_about_your_feeling(): ...print('Feels great!') ... >>>How_about_your_feeling() Feels great! —————————————— return something >>>def echo(anything): ... return anything + ' ' + anything ... >>>echo('you know') 'you know you know' —————————————— None Is Useful None is a special Python value that holds a place when there is nothing to say. Remember that zero-valued integers or floats, empty strings (''), lists ([]), tuples ((,)), dictionaries ({}), and sets(set()) are all False, but are not equal to None. —————————————— 利用参数名称调用函数: def division(dividend, divisor): return dividend/divisor print (division(divisor=2, dividend=8)) #get 4 —————————————— Specify Default Parameter Values 指定预设参数值 def division(dividend, divisor=2): return dividend/divisor print (division(dividend=16)) #get 8 —————————————— 用*得到函数参数的tuples def print_args(A, *args): print ('Positional argument tuple:',args) print (print_args('first one')) #get nothing print (print_args(3,2,1,'wait!','uh...')) #get a tuple —————————————— 用** from dict得到formatting of 函数参数 def print_kwargs(**kwargs): print('Keyword arguments:',kwargs) print (print_kwargs(girl='Alice',place='bed',act='sleeping')) —————————————— 参数的英文: 1. parameter是指函数定义中的参数,而argument指的是函数调用时传递的实际参数。 2. 简略描述为:parameter=形参(formal parameter), argument=实参(actual parameter)。 —————————————— 给函数添加说明,并得到说明: def echo(anything): 'This is an introduction about the function: echo returns its input argument' return anything print (help(echo)) #get help print (echo.__doc__) #get the raw help text —————————————— 用函数名传递函数 def answer(): print(77) def run_something(func): func() run_something(answer) #will print 77 Notice that you passed answer, not answer(). In Python, those parentheses mean call this function. With no parentheses, Python just treats the function like any other object. That’s because, like everything else in Python, it is an object. —————————————— 传递函数名+其它类型: 普通类型 def add_args(arg1,arg2): print(arg1+arg2) def run_something_with_args(func,arg1,arg2): func(arg1,arg2) run_something_with_args(add_args,5,9) #get 14 *传递tuple类型的参数 def sum_args(*args): return sum(args) def run_with_positional_args(func,*args): return func(*args) run_with_positional_args(sum_args,1,2,3,4) #get 10 —————————————— Inner Functions 内部函数 def outer(a,b): def inner(c,d): return c+d return inner(a,b) print (outer(4,7)) #get 11 —————————————— Closures 闭合式函数 def knights2(saying): def inner2(): return"We are the knights who say: "+saying return inner2 a=knights2('Duck') b=knights2('Hasenpfeffer') print (a()) print (b()) —————————————— Anonymous Functions: the lambda() Function def edit_story(words,func): for word in words: print(func(word)) stairs=['thud','meow','thud','hiss'] #traditional way def enliven(word):# give that prose more punch return word.capitalize()+'!' print (edit_story(stairs,enliven)) #points coming! same with the above. edit_story(stairs,lambda word:word.capitalize()+'!') —————————————— Generators A generator is a Python sequence creation object. Every time you iterate through a generator, it keeps track of where it was the last time it was called and returns the next value. This is different from a normal function, which has no memory of previous calls and always starts at its first line with the same state. If you want to create a potentially large sequence, and the code is too large for a generator comprehension, write a generator function. It’s a normal function, but it returns its value with a yield statement rather than return. 它只返回带yield状态的变量值。 —————————————— Let’s write our own version of range(): def my_range(first=0,last=10,step=1): number=first while number>>from collections import Counter >>>breakfast=['spam','spam','eggs','spam'] >>>breakfast_counter=Counter(breakfast) >>>breakfast_counter Counter({'spam':3,'eggs':1}) The most_common() function returns all elements in descending order, or just the top count elements if given a count: >>>breakfast_counter.most_common() [('spam',3),('eggs',1)] >>>breakfast_counter.most_common(1) [('spam',3)] we’ll make a new list called lunch, and a counter called lunch_counter: >>>lunch=['eggs','eggs','bacon'] >>>lunch_counter=Counter(lunch) >>>lunch_counter Counter({'eggs':2,'bacon':1}) The first way we combine the two counters is by addition, using +: >>>breakfast_counter+lunch_counter Counter({'spam':3,'eggs':3,'bacon':1}) Subtract one counter from another by using -: >>>breakfast_counter-lunch_counter Counter({'spam':3}) Okay, now what can we have for lunch that we can’t have for breakfast? >>>lunch_counter-breakfast_counter Counter({'bacon':1,'eggs':1}) You can get common items by using the intersection operator &: >>>breakfast_counter & lunch_counter Counter({'eggs':1}) You can get all items by using the union operator |: >>>breakfast_counter | lunch_counter Counter({'spam':3,'eggs':2,'bacon':1}) The item 'eggs' was again common to both. Unlike addition, union didn’t add their counts, but picked the one with the larger count. —————————————— Order by Key with OrderedDict() >>>from collections import OrderedDict >>>quotes=OrderedDict([ ... ('Moe','A wise guy, huh?'), ... ('Larry','Ow!'), ... ('Curly','Nyuk nyuk!'), ... ]) >>> >>>for stooge in quotes: ... print(stooge) ... Moe Larry Curly —————————————— palindrome n.回文(指顺读和倒读都一样的词语) deque 双队列;队列;双端队列容器 Stack 栈(stack)在计算机科学中是限定仅在表尾进行插入或删除操作的线性表。 Queue 队列是一种特殊的线性表,它只允许在表的前端(front)进行删除操作,而在表的后端(rear)进行插入操作。 —————————————— Stack + Queue == deque A deque (pronounced deck) is a double-ended queue, which has features of both a stack and a queue. It’s useful when you want to add and delete items from either end of a sequence. Here, we’ll work from both ends of a word to the middle to see if it’s a palindrome. The function popleft() removes the leftmost item from the deque and returns it; pop() removes the rightmost item and returns it. Together, they work from the ends toward the middle. As long as the end characters match, it keeps popping until it reaches the middle: >>>def palindrome(word): ... from collections import deque ... dq=deque(word) ... while len(dq)>1: ... if dq.popleft()!=dq.pop(): ... return False ...return True ... ... >>>palindrome('a') True >>>palindrome('racecar') True >>>palindrome('') True >>>palindrome('radar') True >>>palindrome('halibut') False I used this as a simple illustration of deques. If you really wanted a quick palindrome checker, it would be a lot simpler to just compare a string with its reverse. Python doesn’t have a reverse() function for strings, but it does have a way to reverse a string with a slice, as illustrated here: >>>def another_palindrome(word): ... return word==word[::-1] ... >>>another_palindrome('radar') True >>>another_palindrome('halibut') False —————————————— Iterate over Code Structures with itertools itertools contains special-purpose iterator functions. Each returns one item at a time when called within a for … in loop, and remembers its state between calls. chain() runs through its arguments as though they were a single iterable: >>>import itertools >>>for item in itertools.chain([1,2],['a','b']): ... print(item) ... 1 2 a b cycle() is an infinite iterator, cycling through its arguments: >>>import itertools >>>for item in itertools.cycle([1,2]): ... print(item) ... 1 2 1 2 . . . …and so on. accumulate() calculates accumulated values. By default, it calculates the sum: >>>import itertools >>>for item in itertools.accumulate([1,2,3,4]): ... print(item) ... 1 3 6 10 You can provide a function as the second argument to accumulate(), and it will be used instead of addition. The function should take two arguments and return a single result. This example calculates an accumulated product: >>>import itertools >>>def multiply(a,b): ... return a*b ... >>>for item in itertools.accumulate([1,2,3,4],multiply): ... print(item) ... 1 2 6 24 The itertools module has many more functions, notably some for combinations and permutations that can be time savers when the need arises. —————————————— Print Nicely with pprint() >>>from collections import OrderedDict >>>from pprint import pprint >>>quotes=OrderedDict([('Moe','A wise guy, huh?'),('Larry','Ow!'),('Curly','Nyuk nyuk!')]) >>> >>>pprint(quotes) OrderedDict([('Moe', 'A wise guy, huh?'), ('Larry', 'Ow!'), ('Curly', 'Nyuk nyuk!')]) —————————————— Third-party Python software: https://pypi.python.org/pypi —————————————— Chapter 6. Oh Oh: Objects and Classes No object is mysterious. The mystery is your eye. — Elizabeth Bowen Take an object. Do something to it. Do something else to it. — Jasper Johns Up to this point, you’ve seen data structures such as strings and dictionaries, and code structures such as functions and modules. In this chapter, you’ll deal with custom data structures: objects. —————————————— What Are Objects? An object contains both data (variables, called attributes) and code (functions, called methods). It represents a unique instance of some concrete thing. For example, the integer object with the value 7 is an object that facilitates methods such as addition and multiplication, as is demonstrated in Numbers. 8 is a different object. This means there’s an Integer class in Python, to which both 7 and 8 belong. The strings 'cat' and 'duck' are also objects in Python, and have string methods that you’ve seen, such as capitalize() and replace(). When you create new objects no one has ever created before, you must create a class that indicates what they contain. Think of objects as nouns and their methods as verbs. An object represents an individual thing, and its methods define how it interacts with other things. Unlike modules, you can have multiple objects at the same time, each one with different values for its attributes. They’re like super data structures, with code thrown in. —————————————— initialization n.设定初值,初始化 —————————————— Define a Class with class: class OneClass(): def __init__(self,name): self.get_name=name OneObject=OneClass('YS') print(OneObject.get_name) Here’s what these codes does: • Looks up the definition of the OneClass class • Instantiates (creates) a new object in memory • Calls the object’s __init__ method, passing this newly-created object as self and the other argument ('YS') as name • Stores the value of name in the object • Returns the new object • Attaches the name OneObject to the object 1.self is just represent the object itself. You can use anything you like to replace it, like 'myself'. 2.The __init__ is short for initialization. —————————————— Inheritance n.继承; 遗传; 遗产; —————————————— Inheritance When you’re trying to solve some coding problem, often you’ll find an existing class that creates objects that do almost what you need. What can you do? You could modify this old class or write a new class, cutting and pasting from the old one and merging your new code. But all you do will make it more complicated. The solution is inheritance: creating a new class from an existing class but with some additions or changes. It’s an excellent way to reuse code. When you use inheritance, the new class can automatically use all the code from the old class but without copying any of it. You define only what you need to add or change in the new class, and this overrides the behavior of the old class. The original class is called a parent, superclass, or base class; the new class is called a child, subclass, or derived class. These terms are interchangeable in object-oriented programming. —————————————— instance n.实例 —————————————— Let’s inherit something: class Car(): def exclaim(self): print("I'm a Car!") class Yugo(Car): pass give_me_a_car=Car() give_me_a_yugo=Yugo() >>>give_me_a_car.exclaim() I'm a Car! >>>give_me_a_yugo.exclaim() I'm a Car! The object named give_me_a_yugo is an instance of class Yugo, but it also inherits whatever a Car can do. (Without doing anything special, Yugo inherited the exclaim() method from Car.) —————————————— Override 重写;重载;方法重写 覆盖了一个方法并且对其重写 —————————————— Override a Method >>>class Car(): ... def exclaim(self): ... print("I'm a Car!") ... >>>class Yugo(Car): ... def exclaim(self): ... print("I'm a Yugo! Much like a Car, but more Yugo-ish.") ... #Now, make two objects from these classes: >>>give_me_a_car=Car() >>>give_me_a_yugo=Yugo() #What do they say? >>>give_me_a_car.exclaim() I'm a Car! >>>give_me_a_yugo.exclaim() I'm a Yugo! Much like a Car, but more Yugo-ish. In these examples, we overrode the exclaim() method. We can override any methods, including __init__(). —————————————— Override __init__() >>>class Person(): ... def __init__(self,name): ... self.name=name ... >>>class MDPerson(Person): ... def __init__(self,name): ... self.name="Doctor "+name ... >>>class JDPerson(Person): ... def __init__(self,name): ... self.name=name+", SB" ... In these cases, the initialization method __init__() takes the same arguments as the parent Person class but stores the value of name differently inside the object instance: >>>person=Person('YS') >>>doctor=MDPerson('YS') >>>lawyer=JDPerson('YS') >>>print(person.name) YS >>>print(doctor.name) Doctor YS >>>print(lawyer.name) YS,SB —————————————— Add a Method To Subclass The child class can also add a method that was not present in its parent class. Going back to classes Car and Yugo, we’ll define the new method need_a_push() for class Yugo only: >>>class Car(): ... def exclaim(self): ... print("I'm a Car!") ... >>>class Yugo(Car): ... def exclaim(self): ... print("I'm a Yugo! Much like a Car, but more Yugo-ish.") ... def need_a_push(self): ... print("A little help here?") ... Next, make a Car and a Yugo: >>>give_me_a_car=Car() >>>give_me_a_yugo=Yugo() A Yugo object can react to a need_a_push() method call: >>>give_me_a_yugo.need_a_push() A little help here? But a generic Car(parent class) object cannot. —————————————— Get Help(codes) from Your Parent with super class Person(): def __init__(self,name): self.name=name*2 class EmailPerson1(Person): def __init__(self,name,email): super().__init__(name) self.email=email class EmailPerson2(Person): def __init__(self,name,email): self.name=name*2 self.email=email A=EmailPerson1('YS', '1576570260@qq.com') B=EmailPerson2('YS', '1576570260@qq.com') print(A.name, A.email) print(B.name, B.email) What's the difference between EmailPerson1 and EmailPerson2? EmailPerson1: • The super() gets the definition of the parent class, Person. • The __init__() method calls the Person.__init__() method. It takes care of passing the self argument to the superclass, so you just need to give it any optional arguments. In our case, the only other argument Person() accepts is name. • If the definition of Person changes in the future, using super() will ensure that the attributes and methods that EmailPerson inherits from Person will reflect the change. • Obviously, we call this inheritance. EmailPerson2: • No inheritance —————————————— self Defense One criticism of Python is the need to include self as the first argument to instance methods. Python uses the self argument to find the right object’s attributes and methods. —————————————— Get and Set Attribute Values with Properties(1) Some object-oriented languages support private object attributes that can’t be accessed directly from the outside; programmers often need to write getter and setter methods to read and write the values of such private attributes. Python doesn’t need getters or setters, because all attributes and methods are public, and you’re expected to behave yourself. If direct access to attributes makes you nervous, you can certainly write getters and setters. But be Pythonic—use properties. —————————————— Get and Set Attribute Values with Properties(2) In this example, we’ll define a Duck class with a single attribute called hidden_name. We don’t want people to access this directly, so we’ll define two methods: a getter (get_name()) and a setter (set_name()). I’ve added a print() statement to each method to show when it’s being called. Finally, we define these methods as properties of the name attribute: >>>class Duck(): ... def __init__(self,input_name): ... self.hidden_name=input_name ... def get_name(self): ... print('inside the getter') ... return self.hidden_name ... def set_name(self,input_name): ... print('inside the setter') ... self.hidden_name=input_name ... name=property(get_name,set_name) The new methods act as normal getters and setters until that last line; it defines the two methods as properties of the attribute called name. The first argument to property() is the getter method, and the second is the setter. Now, when you refer to the name of any Duck object, it actually calls the get_name() method to return it: >>>fowl=Duck('Howard') >>>fowl.name inside the getter 'Howard' You can still call get_name() directly, too, like a normal getter method: >>>fowl.get_name() inside the getter 'Howard' When you assign a value to the name attribute, the set_name() method will be called: >>>fowl.name='Daffy' inside the setter >>>fowl.name inside the getter 'Daffy' You can still call the set_name() method directly: >>>fowl.set_name('Daffy') inside the setter >>>fowl.name inside the getter 'Daffy' —————————————— Get and Set Attribute Values with Properties(3) Another way to define properties is with decorators. In this next example, we’ll define two different methods, each called name() but preceded by different decorators: • @property, which goes before the getter method • @name.setter, which goes before the setter method Here’s how they actually look in the code: >>>class Duck(): ... def __init__(self,input_name): ... self.hidden_name=input_name ... @property ... def name(self): ... print('inside the getter') ... return self.hidden_name ... @name.setter ... def name(self,input_name): ... print('inside the setter') ... self.hidden_name=input_name You can still access name as though it were an attribute, but there are no visible get_name() or set_name() methods: >>>fowl=Duck('Howard') >>>fowl.name inside the getter 'Howard' >>>fowl.name='Donald' inside the setter >>>fowl.name inside the getter 'Donald' —————————————— Get and Set Attribute Values with Properties(4) In both of the previous examples, we used the name property to refer to a single attribute (ours was called hidden_name) stored within the object. A property can refer to a computed value, as well. Let’s define a Circle class that has a radius attribute and a computed diameter property: >>>class Circle(): ... def __init__(self,radius): ... self.radius=radius ... @property ... def diameter(self): ... return 2*self.radius ... We create a Circle object with an initial value for its radius: >>>c=Circle(5) >>>c.radius 5 We can refer to diameter as if it were an attribute such as radius: >>>c.diameter 10 Here’s the fun part: we can change the radius attribute at any time, and the diameter property will be computed from the current value of radius: >>>c.radius=7 >>>c.diameter 14 If you don’t specify a setter property for an attribute, you can’t set it from the outside. This is handy for read-only attributes: >>>c.diameter=20 Traceback(mostrecentcalllast): File"",line1,in AttributeError:can't set attribute There’s one more big advantage of using a property over direct attribute access: if you ever change the definition of the attribute, you only need to fix the code within the class definition, not in all the callers. —————————————— Name Mangling for Privacy In the Duck class example in the previous section, we called our (not completely) hidden attribute hidden_name. Python has a naming convention for attributes that should not be visible outside of their class definition: begin by using with two underscores (__). Let’s rename hidden_name to __name, as demonstrated here: >>>class Duck(): ... def __init__(self,input_name): ... self.__name=input_name ... @property ... def name(self): ... print('inside the getter') ... return self.__name ... @name.setter ... def name(self,input_name): ... print('inside the setter') ... self.__name=input_name ... Take a moment to see if everything still works: >>>fowl=Duck('Howard') >>>fowl.name inside the getter 'Howard' >>>fowl.name='Donald' inside the setter >>>fowl.name inside the getter 'Donald' Looks good. And, you can’t access the __name attribute: >>>fowl.__name Traceback(mostrecentcalllast): File"",line1,in AttributeError:'Duck'objecthasnoattribute'__name' This naming convention doesn’t make it private, but Python does mangle the name to make it unlikely for external code to stumble upon it. If you’re curious and promise not to tell everyone, here’s what it becomes: >>>fowl._Duck__name 'Donald' Notice that it didn’t print inside the getter. Although this isn’t perfect protection, name mangling discourages accidental or intentional direct access to the attribute. —————————————— Method Types Some data (attributes) and functions (methods) are part of the class itself, and some are part of the objects that are created from that class. When you see an initial self argument in methods within a class definition, it’s an instance method. These are the types of methods that you would normally write when creating your own classes. The first parameter of an instance method is self, and Python passes the object to the method when you call it. In contrast, a class method affects the class as a whole. Any change you make to the class affects all of its objects. Within a class definition, a preceding @classmethod decorator indicates that that following function is a class method. Also, the first parameter to the method is the class itself. The Python tradition is to call the parameter cls, because class is a reserved word and can’t be used here. Let’s define a class method for A that counts how many object instances have been made from it: >>>class A(): ... count=0 ... def __init__(self): ... A.count+=1 ... def exclaim(self): ... print("I'm an A!") ... @classmethod ... def kids(cls): ... print("A has",cls.count,"little objects.") ... >>> >>>easy_a=A() >>>breezy_a=A() >>>wheezy_a=A() >>>A.kids() A has 3 little objects. Notice that we referred to A.count (the class attribute) rather than self.count (which would be an object instance attribute). In the kids() method, we used cls.count, but we could just as well have used A.count. A third type of method in a class definition affects neither the class nor its objects; it’s just in there for convenience instead of floating around on its own. It’s a static method, preceded by a @staticmethod decorator, with no initial self or class parameter. Here’s an example that serves as a commercial for the class CoyoteWeapon: >>>class CoyoteWeapon(): ... @staticmethod ... def commercial(): ... print('This CoyoteWeapon has been brought to you by Acme') ... >>> >>>CoyoteWeapon.commercial() This CoyoteWeapon has been brought to you by Acme Notice that we didn’t need to create an object from class CoyoteWeapon to access this method. Very class-y. —————————————— Duck Typing Python has a loose implementation of polymorphism; this means that it applies the same operation to different objects, regardless of their class. Let’s use the same __init__() initializer for all three Quote classes now, but add two new functions: • who() just returns the value of the saved person string • says() returns the saved words string with the specific punctuation And here they are in action: >>>class Quote(): ... def __init__(self,person,words): ... self.person=person ... self.words=words ... def who(self): ... return self.person ... def says(self): ... return self.words+'.' ... >>>class QuestionQuote(Quote): ... def says(self): ... return self.words+'?' ... >>>class ExclamationQuote(Quote): ... def says(self): ... return self.words+'!' ... >>> We didn’t change how QuestionQuote or ExclamationQuote were initialized, so we didn’t override their __init__() methods. Python then automatically calls the __init__() method of the parent class Quote to store the instance variables person and words. That’s why we can access self.words in objects created from the subclasses QuestionQuote and ExclamationQuote. Next up, let’s make some objects: >>>hunter=Quote('Elmer Fudd',"I'm hunting wabbits") >>>print(hunter.who(),'says:',hunter.says()) Elmer Fudd says:I'm hunting wabbits. >>>hunted1=QuestionQuote('Bugs Bunny',"What's up, doc") >>>print(hunted1.who(),'says:',hunted1.says()) Bugs Bunny says:What's up, doc? >>>hunted2=ExclamationQuote('Daffy Duck',"It's rabbit season") >>>print(hunted2.who(),'says:',hunted2.says()) Daffy Duck says:It's rabbit season! Three different versions of the says() method provide different behavior for the three classes. This is traditional polymorphism in object-oriented languages. Python goes a little further and lets you run the who() and says() methods of any objects that have them. Let’s define a class called BabblingBrook that has no relation to our previous woodsy hunter and huntees (descendants of the Quote class): >>>class BabblingBrook(): ... def who(self): ... return 'Brook' ... def says(self): ... return'Babble' ... >>>brook=BabblingBrook() Now, run the who() and says() methods of various objects, one (brook) completely unrelated to the others: >>>def who_says(obj): ... print(obj.who(),'says',obj.says()) ... >>>who_says(hunter) Elmer Fudd says I'm hunting wabbits. >>>who_says(hunted1) Bugs Bunny says What's up, doc? >>>who_says(hunted2) Daffy Duck says It's rabbit season! >>>who_says(brook) Brook says Babble This behavior is sometimes called duck typing, after the old saying: If it walks like a duck and quacks like a duck, it’s a duck. — A Wise Person —————————————— implement 实施,执行;使生效,实现 concatenation 一系列互相关联的事物,连结 —————————————— Special Methods(1) You can now create and use basic objects, but now let’s go a bit deeper and do more. When you type something such as a = 3 + 8, how do the integer objects with values 3 and 8 know how to implement +? Also, how does a know how to use = to get the result? You can get at these operators by using Python’s special methods (you might also see them called magic methods). You don’t need Gandalf to perform any magic, and they’re not even complicated. The names of these methods begin and end with double underscores (__). You’ve already seen one: __init__ initializes a newly created object from its class definition and any arguments that were passed in. Suppose that you have a simple Word class, and you want an equals() method that compares two words but ignores case. That is, a Word containing the value 'ha' would be considered equal to one containing 'HA'. The example that follows is a first attempt, with a normal method we’re calling equals(). self.text is the text string that this Word object contains, and the equals() method compares it with the text string of word2 (another Word object): >>>class Word(): ... def __init__(self,text): ... self.text=text ... ... def equals(self,word2): ... return self.text.lower()==word2.text.lower() ... Then, make three Word objects from three different text strings: >>>first=Word('ha') >>>second=Word('HA') >>>third=Word('eh') When strings 'ha' and 'HA' are compared to lowercase, they should be equal: >>>first.equals(second) True But the string 'eh' will not match 'ha': >>>first.equals(third) False We defined the method equals() to do this lowercase conversion and comparison. It would be nice to just say if first == second, just like Python’s built-in types. So, let’s do that. We change the equals() method to the special name __eq__() (you’ll see why in a moment): >>>class Word(): ... def __init__(self,text): ... self.text=text ... def __eq__(self,word2): ... return self.text.lower()==word2.text.lower() ... Let’s see if it works: >>>first=Word('ha') >>>second=Word('HA') >>>third=Word('eh') >>>first==second True >>>first==third False Magic! All we needed was the Python’s special method name for testing equality, __eq__(). —————————————— Special Methods(2) Tables 6-1 and 6-2 list the names of the most useful magic methods. Table 6-1. Magic methods for comparison __eq__( self, other ) self == other __ne__( self, other ) self != other __lt__( self, other ) self < other __gt__( self, other ) self > other __le__( self, other ) self <= other __ge__( self, other ) self >= other Table 6-2. Magic methods for math __add__( self, other ) self + other __sub__( self, other ) self - other __mul__( self, other ) self * other __floordiv__( self, other ) self // other __truediv__( self, other ) self / other __mod__( self, other ) self % other __pow__( self, other ) self ** other You aren’t restricted to use the math operators such as + (magic method __add__()) and - (magic method __sub__()) with numbers. For instance, Python string objects use + for concatenation and * for duplication. Table 6-3. Other, miscellaneous magic methods __str__( self ) str( self ) __repr__( self ) repr( self ) __len__( self ) len( self ) Besides __init__(), you might find yourself using __str__() the most in your own methods. It’s how you print your object. It’s used by print(), str(), and the string formatters that you can read about in Chapter 7. The interactive interpreter uses the __repr__() function to echo variables to output. If you fail to define either __str__() or __repr__(), you get Python’s default string version of your object: >>>first=Word('ha') >>>first <__main__.Wordobjectat0x1006ba3d0> >>>print(first) <__main__.Wordobjectat0x1006ba3d0> Let’s add both __str__() and __repr__() methods to the Word class to make it prettier: >>>class Word(): ... def __init__(self,text): ... self.text=text ... def __eq__(self,word2): ... return self.text.lower()==word2.text.lower() ... def__str__(self): ... returnself.text ... def__repr__(self): ... return'Word("'self.text'")' ... >>>first=Word('ha') >>>first # uses __repr__ Word("ha") >>>print(first) # uses __str__ ha —————————————— Composition n.成分;作品;组织;作文;合成物 hierarchy n. [计]分层,层次 等级制度 —————————————— Composition Inheritance is a good technique to use when you want a child class to act like its parent class most of the time (when child is-a parent). It’s tempting to build elaborate inheritance hierarchies, but sometimes composition or aggregation (when x has-a y) make more sense. A duck is-a bird, but has-a tail. A tail is not a kind of duck, but part of a duck. In this next example, let’s make bill and tail objects and provide them to a new duck object: >>>class Bill(): ... def __init__(self,description): ... self.description=description ... >>>class Tail(): ... def __init__(self,length): ... self.length=length ... >>>class Duck(): ... def __init__(self,bill,tail): ... self.bill=bill ... self.tail=tail ... def about(self): ... print('This duck has a',bill.description,'bill and a',tail.length,'tail') ... >>>tail=Tail('long') >>>bill=Bill('wide orange') >>>duck=Duck(bill,tail) >>>duck.about() This duck has a wide orange bill and a long tail —————————————— When to Use Classes and Objects versus Modules Here are some guidelines for deciding whether to put your code in a class or a module: • Objects are most useful when you need a number of individual instances that have similar behavior (methods), but differ in their internal states (attributes). • Classes support inheritance, modules don’t. • If you want only one of something, a module might be best. No matter how many times a Python module is referenced in a program, only one copy is loaded. (Java and C++ programmers: if you’re familiar with the book Design Patterns: Elements of Reusable Object-Oriented Software by Erich Gamma, you can use a Python module as a singleton.) • If you have a number of variables that contain multiple values and can be passed as arguments to multiple functions, it might be better to define them as classes. For example, you might use a dictionary with keys such as size and color to represent a color image. You could create a different dictionary for each image in your program, and pass them as arguments to functions such as scale() or transform(). This can get messy as you add keys and functions. It’s more coherent to define an Image class with attributes size or color and methods scale() and transform(). Then, all the data and methods for a color image are defined in one place. • Use the simplest solution to the problem. A dictionary, list, or tuple is simpler, smaller, and faster than a module, which is usually simpler than a class. Guido’s advice: Avoid overengineering datastructures. Tuples are better than objects (try namedtuple too though). Prefer simple fields over getter/setter functions … Built-in datatypes are your friends. Use more numbers, strings, tuples, lists, sets, dicts. Also check out the collections library, esp. deque. — Guido van Rossum —————————————— Module just module, don't forget how to import a module. import requests —————————————— + plus 加号;正号 - minus 减号;负号 ± plus or minus 正负号 × is multiplied by 乘号 ÷ is divided by 除号 = is equal to 等于号 ≠ is not equal to 不等于号 ≡ is equivalent to 全等于号 ≌ is equal to orapproximately equal to 等于或约等于号 ≈ is approximately equal to 约等于号 < is less than 小于号 > is more than 大于号 ≮ is not less than 不小于号 ≯ is not more than 不大于号 ≤ is less than or equal to 小于或等于号 ≥ is more than or equal to 大于或等于号 % per cent 百分之... ‰ per mill 千分之... ∞ infinity 无限大号 ∝ varies as 与...成比例 √ (square) root 平方根 ∵ since; because 因为 ∴ hence 所以 ∷ equals, as(proportion) 等于,成比例 ∠ angle 角 ⌒ semicircle 半圆 ⊙ circle 圆 ○ circumference 圆周 π pi 圆周率 △ triangle 三角形 ⊥ perpendicular to 垂直于 ∪ union of 并,合集 ∩ intersection of 交,通集 ∫ the integral of ...的积分 ∑ (sigma) summation of 总和 ° degree 度 ′ minute 分 ″ second 秒 ℃ Celsius system 摄氏度 { open brace, open curly 左花括号 } close brace, close curly 右花括号 ( open parenthesis, open paren 左圆括号 ) close parenthesis, close paren 右圆括号 () brakets/ parentheses 括号 [ open bracket 左方括号 ] close bracket 右方括号 [] square brackets 方括号 . period, dot 句号,点 | vertical bar, vertical virgule 竖线 & ampersand, and,reference, ref 和,引用 * asterisk, multiply, star, pointer 星号,乘号,星,指针 / slash, divide, oblique 斜线,斜杠,除号 // slash-slash, comment 双斜线,注释符 # pound 井号 backslash, sometimes escape 反斜线转义符,有时表示转义符或续行符 ~ tilde 波浪符 . full stop 句号 , comma 逗号 : colon 冒号 ; semicolon 分号 ? question mark 问号 ! exclamation mark (英式英语) exclamation point (美式英语) ' apostrophe 撇号 - hyphen 连字号 -- dash 破折号 ... dots/ ellipsis 省略号 " single quotationmarks 单引号 "" double quotation marks 双引号 ‖ parallel 双线号 & ampersand = and ~ swung dash 代字号 § section; division 分节号 → arrow 箭号;参见号 —————————————— Named Tuples Because Guido just mentioned them and I haven’t yet, this is a good place to talk about named tuples. A named tuple is a subclass of tuples with which you can access values by name (with .name) as well as by position (with [ offset ]). Let’s take the example from the previous section and convert the Duck class to a named tuple, with bill and tail as simple string attributes. We’ll call the namedtuple function with two arguments: • The name • A string of the field names, separated by spaces Named tuples are not automatically supplied with Python, so you need to load a module before using them. We do that in the first line of the following example: >>>from collections import namedtuple >>>Duck=namedtuple('Duck','bill tail') >>>duck=Duck('wide orange','long') >>>duck Duck(bill='wide orange',tail='long') >>>duck.bill 'wide orange' >>>duck.tail 'long' You can also make a named tuple from a dictionary: >>>parts={'bill':'wide orange','tail':'long'} >>>duck2=Duck(**parts) >>>duck2 Duck(bill='wide orange',tail='long') In the preceding code, take a look at **parts. This is a keyword argument. It extracts the keys and values from the parts dictionary and supplies them as arguments to Duck(). It has the same effect as: >>>duck2=Duck(bill='wide orange',tail='long') Named tuples are immutable, but you can replace one or more fields and return another named tuple: >>>duck3=duck2._replace(tail='magnificent',bill='crushing') >>>duck3 Duck(bill='crushing',tail='magnificent') We could have defined duck as a dictionary: >>>duck_dict={'bill':'wide orange','tail':'long'} >>>duck_dict {'tail':'long','bill':'wide orange'} You can add fields to a dictionary: >>>duck_dict['color']='green' >>>duck_dict {'color':'green','tail':'long','bill':'wide orange'} But not to a named tuple: >>>duck.color='green' Traceback(mostrecentcalllast): File"",line1,in AttributeError:'dict'objecthasnoattribute'color' To recap, here are some of the pros of a named tuple: • It looks and acts like an immutable object. • It is more space- and time-efficient than objects. • You can access attributes by using dot notation instead of dictionary-style square brackets. • You can use it as a dictionary key. —————————————— Things to Do 6.1. Make a class called Thing with no contents and print it. Then, create an object called example from this class and also print it. Are the printed values the same or different? class thing: pass print(thing) example = thing() print(example) —————————————— 6.2. Make a new class called Thing2 and assign the value 'abc' to a class attribute called letters. Print letters. class thing2: def __init__(self, letters): print(letters) thing2('abc') —————————————— 6.3. Make yet another class called, of course, Thing3. This time, assign the value 'xyz' to an instance (object) attribute called letters. Print letters. Do you need to make an object from the class to do this? class thing3: def __init__(self, letters): self.letters = letters example = thing3('xyz') print(example.letters) —————————————— 6.4. Make a class called Element, with instance attributes name, symbol, and number. Create an object of this class with the values 'Hydrogen', 'H', and 1. class Element: def __init__(self, name, symbol, number): self.name = name self.symbol = symbol self.number = number example = Element('Hydrogen', 'H', 1) —————————————— 6.5. Make a dictionary with these keys and values: 'name': 'Hydrogen', 'symbol': 'H', 'number': 1. Then, create an object called hydrogen from class Element using this dictionary. Adict = {'name': 'Hydrogen', 'symbol': 'H', 'number': 1} class Element: def __init__(self, name, symbol, number): self.name = name self.symbol = symbol self.number = number hydrogen = Element(**Adict) —————————————— 6.6. For the Element class, define a method called dump() that prints the values of the object’s attributes (name, symbol, and number). Create the hydrogen object from this new definition and use dump() to print its attributes. class Element: def __init__(self, name, symbol, number): self.name = name self.symbol = symbol self.number = number def dump(self): print(self.name, self.symbol, self.number) hydrogen = Element('Hydrogen', 'H', 1) hydrogen.dump() —————————————— 6.7. In the definition of Element, change the name of method dump to __str__, create a new hydrogen object, and call print(hydrogen). class Element: def __init__(self, name, symbol, number): self.name = name self.symbol = symbol self.number = number def __str__(self): return('name=%s, symbol=%s, number=%s'%(self.name,self.symbol,self.number)) hydrogen = Element('Hydrogen', 'H', 1) print(hydrogen) —————————————— 6.8. Modify Element to make the attributes name, symbol, and number private. Define a getter property for each to return its value. class Element: def __init__(self, name, symbol, number): self.__name = name self.__symbol = symbol self.__number = number @property def name(self): return self.__name @property def symbol(self): return self.__symbol @property def number(self): return self.__number hydrogen = Element('Hydrogen', 'H', 1) print(hydrogen.name) print(hydrogen.symbol) print(hydrogen.number) —————————————— 6.9. Define three classes: Bear, Rabbit, and Octothorpe. For each, define only one method: eats(). This should return 'berries' (Bear), 'clover' (Rabbit), or 'campers' (Octothorpe). Create one object from each and print what it eats. class Bear: def eats(self): return 'berries' class Rabbit: def eats(self): return 'clover' class Octothorpe: def eats(self): return 'campers' a = Bear() b = Rabbit() c = Octothorpe() print(a.eats(),b.eats(),c.eats()) —————————————— 6.10. Define these classes: Laser, Claw, and SmartPhone. Each has only one method: does(). This returns 'disintegrate' (Laser), 'crush' (Claw), or 'ring' (SmartPhone). Then, define the class Robot that has one instance (object) of each of these. Define a does() method for the Robot that prints what its component objects do. class Laser: def does(self): return 'disintegrate' class Claw: def does(self): return 'crush' class SmartPhone: def does(self): return 'ring' class Robot: def does(self, Laser, Claw, SmartPhone): print(Laser.does(), Claw.does(), SmartPhone.does()) a = Laser() b = Claw() c = SmartPhone() A_robot = Robot() A_robot.does(a, b, c) —————————————— Chapter 7. Mangle Data Like a Pro In this chapter, you’ll learn many techniques for taming data. Most of them concern these built-in Python data types: strings Sequences of Unicode characters, used for text data. bytes and bytearrays Sequences of eight-bit integers, used for binary data. —————————————— Unicode All of the text examples in this book thus far have been plain old ASCII. ASCII was defined in the 1960s, when computers were the size of refrigerators and only slightly better at performing computations. The basic unit of computer storage is the byte, which can store 256 unique values in its eight bits. For various reasons, ASCII only used 7 bits (128 unique values): 26 uppercase letters, 26 lowercase letters, 10 digits, some punctuation symbols, some spacing characters, and some nonprinting control codes. Unfortunately, the world has more letters than ASCII provides. You could have a hot dog at a diner, but never a Gewürztraminer at a café. Many attempts have been made to add more letters and symbols, and you’ll see them at times. Just a couple of those include: • Latin-1, or ISO 8859-1 • Windows code page 1252 Each of these uses all eight bits, but even that’s not enough, especially when you need non-European languages. Unicode is an ongoing international standard to define the characters of all the world’s languages, plus symbols from mathematics and other fields. Unicode provides a unique number for every character, no matter what the platform, no matter what the program, no matter what the language. — The Unicode Consortium The Unicode Code Charts page has links to all the currently defined character sets with images. The latest version (6.2) defines over 110,000 characters, each with a unique name and identification number. The characters are divided into eight-bit sets called planes. The first 256 planes are the basic multilingual planes. See the Wikipedia page about Unicode planes for details. —————————————— Python 3 Unicode strings(1) Python 3 strings are Unicode strings, not byte arrays. This is the single largest change from Python 2, which distinguished between normal byte strings and Unicode character strings. If you know the Unicode ID or name for a character, you can use it in a Python string. Here are some examples: • A \u followed by four hex numbers[6] specifies a character in one of Unicode’s 256 basic multilingual planes. The first two are the plane number (00 to FF), and the next two are the index of the character within the plane. Plane 00 is good old ASCII, and the character positions within that plane are the same as ASCII. • For characters in the higher planes, we need more bits. The Python escape sequence for these is \U followed by eight hex characters; the leftmost ones need to be 0. • For all characters, \N{ name } lets you specify it by its standard name. The Unicode Character Name Index page lists these. The Python unicodedata module has functions that translate in both directions: • lookup()—Takes a case-insensitive name and returns a Unicode character • name()—Takes a Unicode character and returns an uppercase name —————————————— Python 3 Unicode strings(2) In the following example, we’ll write a test function that takes a Python Unicode character, looks up its name, and looks up the character again from the name (it should match the original character): >>>def unicode_test(value): ... import unicodedata ... name=unicodedata.name(value) ... value2=unicodedata.lookup(name) ... print('value="%s", name="%s", value2="%s"'%(value,name,value2)) ... Let’s try some characters, beginning with a plain ASCII letter: >>>unicode_test('A') value="A",name="LATIN CAPITAL LETTER A",value2="A" ASCII punctuation: >>>unicode_test('$') value="$",name="DOLLAR SIGN",value2="$" A Unicode currency character: >>>unicode_test('\u00a2') value="¢",name="CENT SIGN",value2="¢" Another Unicode currency character: >>>unicode_test('\u20ac') value="€",name="EURO SIGN",value2="€" The only problem you could potentially run into is limitations in the font you’re using to display text. All fonts do not have images for all Unicode characters, and might display some placeholder character. For instance, here’s the Unicode symbol for SNOWMAN, like symbols in dingbat fonts: >>>unicode_test('\u2603') value="☃",name="SNOWMAN",value2="☃" Suppose that we want to save the word café in a Python string. One way is to copy and paste it from a file or website and hope that it works: >>>place='café' >>>place 'café' This worked because I copied and pasted from a source that used UTF-8 encoding (which you’ll see in a few pages) for its text. —————————————— Python 3 Unicode strings(3) How can we specify that final é character? If you look at character index for E, you see that the name E WITH ACUTE, LATIN SMALL LETTER has the value 00E9. Let’s check with the name() and lookup() functions that we were just playing with. First give the code to get the name: >>>unicodedata.name('\u00e9') 'LATIN SMALL LETTER E WITH ACUTE' Next, give the name to look up the code: >>>unicodedata.lookup('E WITH ACUTE, LATIN SMALL LETTER') Traceback(mostrecentcalllast): File"",line1,in KeyError:"undefined character name 'E WITH ACUTE, LATIN SMALL LETTER'" Note The names listed on the Unicode Character Name Index page were reformatted to make them sort nicely for display. To convert them to their real Unicode names (the ones that Python uses), remove the comma and move the part of the name that was after the comma to the beginning. Accordingly, change E WITH ACUTE, LATIN SMALL LETTER to LATIN SMALL LETTER E WITH ACUTE: >>>unicodedata.lookup('LATIN SMALL LETTER E WITH ACUTE') 'é' Now, we can specify the string café by code or by name: >>>place='caf\u00e9' >>>place 'café' >>>place='caf\N{LATIN SMALL LETTER E WITH ACUTE}' >>>place 'café' In the preceding snippet, we inserted the é directly in the string, but we can also build a string by appending: >>>u_umlaut='\N{LATIN SMALL LETTER U WITH DIAERESIS}' >>>u_umlaut 'ü' >>>drink='Gew'+u_umlaut+'rztraminer' >>>print('Now I can finally have my',drink,'in a',place) NowIcanfinallyhavemyGewürztraminerinacafé The string len function counts Unicode characters, not bytes: >>>len('$') 1 >>>len('\U0001f47b') 1 —————————————— Encode and decode with UTF-8 You don’t need to worry about how Python stores each Unicode character when you do normal string processing. However, when you exchange data with the outside world, you need a couple of things: • A way to encode character strings to bytes • A way to decode bytes to character strings If there were fewer than 64,000 characters in Unicode, we could store each Unicode character ID in two bytes. Unfortunately, there are more. We could encode each ID in three or four bytes, but that would increase the memory and disk storage space needs for common text strings by three or four times. Ken Thompson and Rob Pike, whose names will be familiar to Unix developers, designed the UTF-8 dynamic encoding scheme one night on a placemat in a New Jersey diner. It uses one to four bytes per Unicode character: • One byte for ASCII • Two bytes for most Latin-derived (but not Cyrillic) languages • Three bytes for the rest of the basic multilingual plane • Four bytes for the rest, including some Asian languages and symbols UTF-8 is the standard text encoding in Python, Linux, and HTML. It’s fast, complete, and works well. If you use UTF-8 encoding throughout your code, life will be much easier than trying to hop in and out of various encodings. —————————————— 代数 ALGEBRA 1. 数论 natural number 自然数 positive number 正数 negative number 负数 odd integer, odd number 奇数 even integer, even number 偶数 integer, whole number 整数 positive whole number 正整数 negative whole number 负整数 consecutive number 连续整数 real number, rational number 实数,有理数 irrational(number) 无理数 inverse 倒数 composite number 合数 e.g. 4,6,8,9,10,12,14,15… prime number 质数 e.g. 2,3,5,7,11,13,15… reciprocal 倒数 common divisor 公约数 multiple 倍数 (minimum) common multiple (最小)公倍数 (prime) factor (质)因子 common factor 公因子 ordinary scale, decimal scale 十进制 nonnegative 非负的 tens 十位 units 个位 mode 众数 mean 平均数 median 中值 common ratio 公比 2. 基本数学概念 arithmetic mean 算术平均值 weighted average 加权平均值 geometric mean 几何平均数 exponent指数,幂 base 乘幂的底数,底边 cube 立方数,立方体 square root 平方根 cube root 立方根 common logarithm 常用对数 digit 数字 constant 常数 variable 变量 inverse function 反函数 complementary function 余函数 linear 一次的,线性的 factorization 因式分解 absolute value 绝对值,e.g. |-32|=32 round off 四舍五入数学 3. 基本运算 add,plus 加 subtract 减 difference 差 multiply, times 乘 product 积 divide 除 divisible 可被整除的 divided evenly 被整除 dividend 被除数,红利 divisor 因子,除数,公约数 quotient 商 remainder 余数 factorial 阶乘 power 乘方 radical sign, root sign 根号 round to, to the nearest 四舍五入 4. 代数式,方程,不等式 algebraic term 代数项 like terms, similar terms 同类项 numerical coefficient 数字系数 literal coefficient 字母系数 inequality 不等式 triangle inequality 三角不等式 range 值域 original equation 原方程 equivalent equation 同解方程,等价方程 linear equation 线性方程(e.g. 5x+6=22) 5. 分数,小数 proper fraction 真分数 improper fraction 假分数 mixed number 带分数 vulgar fraction,common fraction 普通分数 simple fraction 简分数 complex fraction 繁分数 numerator 分子 denominator 分母 (least) common denominator (最小)公分母 quarter 四分之一 decimal fraction 纯小数 infinite decimal 无穷小数 recurring decimal 循环小数 tenths unit 十分位 6. 集合 union 并集 proper subset 真子集 solution set 解集 7. 数列 arithmetic progression(sequence) 等差数列 geometric progression(sequence) 等比数列 8. 其它 approximate 近似 (anti)clockwise (逆) 顺时针方向 cardinal 基数 ordinal 序数 direct proportion 正比 distinct 不同的 estimation 估计,近似 parentheses 括号 proportion 比例 permutation 排列 combination 组合 table 表格 trigonometric function 三角函数 unit 单位,位 —————————————— 几何 GEOMETRY 1. 角 alternate angle 内错角 corresponding angle 同位角 vertical angle 对顶角 central angle 圆心角 interior angle 内角 exterior angle 外角 supplementary angles 补角 complementary angle 余角 adjacent angle 邻角 acute angle 锐角 obtuse angle 钝角 right angle 直角 round angle 周角 straight angle 平角 included angle 夹角 2. 三角形 equilateral triangle 等边三角形 scalene triangle 不等边三角形 isosceles triangle 等腰三角形 right triangle 直角三角形 oblique 斜三角形 inscribed triangle 内接三角形 3. 收敛的平面图形,除三角形外 semicircle 半圆 concentric circles 同心圆 quadrilateral 四边形 pentagon 五边形 hexagon 六边形 heptagon 七边形 octagon 八边形 nonagon 九边形 decagon 十边形 polygon 多边形 parallelogram 平行四边形 equilateral 等边形 plane 平面 square 正方形,平方 rectangle 长方形 regular polygon 正多边形 rhombus 菱形 trapezoid 梯形 4. 其它平面图形 arc 弧 line, straight line 直线 line segment 线段 parallel lines 平行线 segment of a circle 弧形 5. 立体图形 cube 立方体,立方数 rectangular solid 长方体 regular solid/regular polyhedron 正多面体 circular cylinder 圆柱体 cone 圆锥 sphere 球体 solid 立体的 6. 图形的附属概念 plane geometry 平面几何 trigonometry 三角学 bisect 平分 circumscribe 外切 inscribe 内切 intersect 相交 perpendicular 垂直 Pythagorean theorem 勾股定理(毕达哥拉斯定理) congruent 全等的 multilateral 多边的 altitude 高 depth 深度 side 边长 circumference, perimeter 周长 radian 弧度 surface area 表面积 volume 体积 arm 直角三角形的股 cross section 横截面 center of a circle 圆心 chord 弦 diameter 直径 radius 半径 angle bisector 角平分线 diagonal 对角线化 edge 棱 face of a solid 立体的面 hypotenuse 斜边 included side 夹边 leg 三角形的直角边 median (三角形的)中线 base 底边,底数(e.g. 2的5次方,2就是底数) opposite 直角三角形中的对边 midpoint 中点 endpoint 端点 vertex (复数形式vertices) 顶点 tangent 切线的 transversal 截线 intercept 截距 7. 坐标 coordinate system 坐标系 rectangular coordinate 直角坐标系 origin 原点 abscissa 横坐标 ordinate 纵坐标 number line 数轴 quadrant 象限 slope 斜率 complex plane 复平面 8. 计量单位 cent 美分 penny 一美分硬币 nickel 5美分硬币 dime 一角硬币 dozen 打(12个) score 廿(20个) Centigrade 摄氏 Fahrenheit 华氏 quart 夸脱 gallon 加仑(1 gallon = 4 quart) yard 码 meter 米 micron 微米 inch 英寸 foot 英尺 minute 分(角度的度量单位,60分=1度) square measure 平方单位制 cubic meter 立方米 pint 品脱(干量或液量的单位) —————————————— Encoding You encode a string to bytes. The string encode() function’s first argument is the encoding name. The choices include those presented in Table 7-1. Table 7-1. Encodings 'ascii' Good old seven-bit ASCII 'utf-8' Eight-bit variable-length encoding, and what you almost always want to use 'latin-1' Also known as ISO 8859-1 'cp-1252' A common Windows encoding 'unicode-escape' Python Unicode literal format, \uxxxx or \Uxxxxxxxx You can encode anything as UTF-8. Let’s assign the Unicode string '\u2603' to the name snowman: >>>snowman='\u2603' snowman is a Python Unicode string with a single character, regardless of how many bytes might be needed to store it internally: >>>len(snowman) 1 Next let’s encode this Unicode character to a sequence of bytes: >>>ds=snowman.encode('utf-8') As I mentioned earlier, UTF-8 is a variable-length encoding. In this case, it used three bytes to encode the single snowman Unicode character: >>>len(ds) 3 >>>ds b'\xe2\x98\x83' Now, len() returns the number of bytes (3) because ds is a bytes variable. You can use encodings other than UTF-8, but you’ll get errors if the Unicode string can’t be handled by the encoding. For example, if you use the ascii encoding, it will fail unless your Unicode characters happen to be valid ASCII characters as well: >>>ds=snowman.encode('ascii') Traceback(mostrecentcalllast): File"",line1,in UnicodeEncodeError:'ascii'codeccan't encode character '\u2603' inposition0:ordinalnotinrange(128) The encode() function takes a second argument to help you avoid encoding exceptions. Its default value, which you can see in the previous example, is 'strict'; it raises a UnicodeEncodeError if it sees a non-ASCII character. There are other encodings. Use 'ignore' to throw away anything that won’t encode: >>>snowman.encode('ascii','ignore') b'' Use 'replace' to substitute ? for unknown characters: >>>snowman.encode('ascii','replace') b'?' Use 'backslashreplace' to produce a Python Unicode character string, like unicode-escape: >>>snowman.encode('ascii','backslashreplace') b'\\u2603' You would use this if you needed a printable version of the Unicode escape sequence. The following produces character entity strings that you can use in web pages: >>>snowman.encode('ascii','xmlcharrefreplace') b'☃' —————————————— Decoding We decode byte strings to Unicode strings. Whenever we get text from some external source (files, databases, websites, network APIs, and so on), it’s encoded as byte strings. The tricky part is knowing which encoding was actually used, so we can run it backward and get Unicode strings. The problem is that nothing in the byte string itself says what encoding was used. I mentioned the perils of copying and pasting from websites earlier. You’ve probably visited websites with odd characters where plain old ASCII characters should be. Let’s create a Unicode string called place with the value 'café': >>>place='caf\u00e9' >>>place 'café' >>>type(place) Encode it in UTF-8 format in a bytes variable called place_bytes: >>>place_bytes=place.encode('utf-8') >>>place_bytes b'caf\xc3\xa9' >>>type(place_bytes) Notice that place_bytes has five bytes. The first three are the same as ASCII (a strength of UTF-8), and the final two encode the 'é'. Now, let’s decode that byte string back to a Unicode string: >>>place2=place_bytes.decode('utf-8') >>>place2 'café' This worked because we encoded to UTF-8 and decoded from UTF-8. What if we told it to decode from some other encoding? >>>place3=place_bytes.decode('ascii') Traceback(mostrecentcalllast): File"",line1,in UnicodeDecodeError:'ascii'codeccan't decode byte 0xc3 in position 3: ordinalnotinrange(128) The ASCII decoder threw an exception because the byte value 0xc3 is illegal in ASCII. There are some 8-bit character set encodings in which values between 128 (hex 80) and 255 (hex FF) are legal but not the same as UTF-8: >>>place4=place_bytes.decode('latin-1') >>>place4 'café' >>>place5=place_bytes.decode('windows-1252') >>>place5 'café' The moral of this story: whenever possible, use UTF-8 encoding. It works, is supported everywhere, can express every Unicode character, and is quickly decoded and encoded. —————————————— Format We’ve pretty much ignored text formatting—until now. Chapter 2 shows a few string alignment functions, and the code examples have used simple print() statements, or just let the interactive interpreter display values. But it’s time we look at how to interpolate data values into strings—in other words, put the values inside the strings—using various formats. You can use this to produce reports and other outputs for which appearances need to be just so. Python has two ways of formatting strings, loosely called old style and new style. Both styles are supported in Python 2 and 3 (new style in Python 2.6 and up). Old style is simpler, so we’ll begin there. —————————————— Old style with % The old style of string formatting has the form string % data. Inside the string are interpolation sequences. Table 7-2 illustrates that the very simplest sequence is a % followed by a letter indicating the data type to be formatted. Table 7-2. Conversion types %s string %d decimal integer %x hex integer %o octal integer %f decimal float %e exponential float %g decimal or exponential float %% a literal % Following are some simple examples. First, an integer: >>>'%s'%42 '42' >>>'%d'%42 '42' >>>'%x'%42 '2a' >>>'%o'%42 '52' A float: >>>'%s'%7.03 '7.03' >>>'%f'%7.03 '7.030000' >>>'%e'%7.03 '7.030000e+00' >>>'%g'%7.03 '7.03' An integer and a literal %: >>>'%d%%'%100 '100%' Some string and integer interpolation: >>>actor='Richard Gere' >>>cat='Chester' >>>weight=28 >>>"My wife's favorite actor is %s" %actor "My wife's favorite actor is Richard Gere" >>>"Our cat %sweighs %spounds" %(cat,weight) 'Our cat Chester weighs 28 pounds' That %s inside the string means to interpolate a string. The number of % appearances in the string needs match the number of data items after the %. A single data item such as actor goes right after the %. Multiple data must be grouped into a tuple (bounded by parentheses, separated by commas) such as (cat, weight). Even though weight is an integer, the %s inside the string converted it to a string. You can add other values between the % and the type specifier to designate minimum and maximum widths, alignment, and character filling: For variables, let’s define an integer, n; a float, f; and a string, s: >>>n=42 >>>f=7.03 >>>s='string cheese' Format them using default widths: >>>'%d %f %s' %(n,f,s) '42 7.030000 string cheese' Set a minimum field width of 10 characters for each variable, and align them to the right, filling unused spots on the left with spaces: >>>'%10d %10f %10s' %(n,f,s) ' 42 7.030000 string cheese' Use the same field width, but align to the left: >>>'%-10d %-10f %-10s' %(n,f,s) '42 7.030000 string cheese' This time, the same field width, but a maximum character width of 4, and aligned to the right. This setting truncates the string, and limits the float to 4 digits after the decimal point: >>>'%10.4d %10.4f %10.4s' %(n,f,s) ' 0042 7.0300 stri' The same song as before, but right-aligned: >>>'%.4d %.4f %.4s' %(n,f,s) '0042 7.0300 stri' Finally, get the field widths from arguments rather than hard-coding them: >>>'%*.*d %*.*f %*.*s' %(10,4,n,10,4,f,10,4,s) ' 0042 7.0300 stri' —————————————— New style formatting with {} and format Old style formatting is still supported. In Python 2, which will freeze at version 2.7, it will be supported forever. However, new style formatting is recommended if you’re using Python 3. The simplest usage is demonstrated here: >>>'{} {} {}'.format(n,f,s) '42 7.03 string cheese' Old-style arguments needed to be provided in the order in which their % placeholders appeared in the string. With new-style, you can specify the order: >>>'{2} {0} {1}'.format(f,s,n) '42 7.03 string cheese' The value 0 referred to the first argument, f, whereas 1 referred to the string s, and 2 referred to the last argument, the integer n. The arguments can be a dictionary or named arguments, and the specifiers can include their names: >>>'{n} {f} {s}'.format(n=42,f=7.03,s='string cheese') '42 7.03 string cheese' In this next example, let’s try combining our three values into a dictionary, which looks like this: >>>d={'n':42,'f':7.03,'s':'string cheese'} In the following example, {0} is the entire dictionary, whereas {1} is the string 'other' that follows the dictionary: >>>'{0[n]} {0[f]} {0[s]} {1}'.format(d,'other') '42 7.03 string cheese other' These examples all printed their arguments with default formats. Old-style allows a type specifier after the % in the string, but new-style puts it after a :. First, with positional arguments: >>>'{0:d} {1:f} {2:s}'.format(n,f,s) '42 7.030000 string cheese' In this example, we’ll use the same values, but as named arguments: >>>'{n:d} {f:f} {s:s}'.format(n=42,f=7.03,s='string cheese') '42 7.030000 string cheese' The other options (minimum field width, maximum character width, alignment, and so on) are also supported. Minimum field width 10, right-aligned (default): >>>'{0:10d} {1:10f} {2:10s}'.format(n,f,s) ' 42 7.030000 string cheese' Same as the preceding example, but the > characters make the right-alignment more explicit: >>>'{0:>10d} {1:>10f} {2:>10s}'.format(n,f,s) ' 42 7.030000 string cheese' Minimum field width 10, left-aligned: >>>'{0:<10d} {1:<10f} {2:<10s}'.format(n,f,s) '42 7.030000 string cheese' Minimum field width 10, centered: >>>'{0:^10d} {1:^10f} {2:^10s}'.format(n,f,s) ' 42 7.030000 string cheese' There is one change from old-style: the precision value (after the decimal point) still means the number of digits after the decimal for floats, and the maximum number of characters for strings, but you can’t use it for integers: >>>'{0:>10.4d} {1:>10.4f} {2:10.4s}'.format(n,f,s) Traceback(mostrecentcalllast): File"",line1,in ValueError:Precisionnotallowedinintegerformatspecifier >>>'{0:>10d} {1:>10.4f} {2:>10.4s}'.format(n,f,s) ' 42 7.0300 stri' The final option is the fill character. If you want something other than spaces to pad your output fields, put it right after the :, before any alignment (<, >, ^) or width specifiers: >>>'{0:!^20s}'.format('BIG SALE') '!!!!!!BIG SALE!!!!!!' —————————————— Match with Regular Expressions Chapter 2 touched on simple string operations. Armed with that introductory information, you’ve probably used simple “wildcard” patterns on the command line, such as ls *.py, which means list all filenames ending in .py. It’s time to explore more complex pattern matching by using regular expressions. These are provided in the standard module re, which we’ll import. You define a string pattern that you want to match, and the source string to match against. For simple matches, usage looks like this: result=re.match('You','Young Frankenstein') Here, 'You' is the pattern and 'Young Frankenstein' is the source—the string you want to check. match() checks whether the source begins with the pattern. For more complex matches, you can compile your pattern first to speed up the match later: youpattern=re.compile('You') Then, you can perform your match against the compiled pattern: result=youpattern.match('Young Frankenstein') match() is not the only way to compare the pattern and source. Here are several other methods you can use: • search() returns the first match, if any. • findall() returns a list of all non-overlapping matches, if any. • split() splits source at matches with pattern and returns a list of the string pieces. • sub() takes another replacement argument, and changes all parts of source that are matched by pattern to replacement. —————————————— Exact match with match() Does the string 'Young Frankenstein' begin with the word 'You'? Here’s some code with comments: >>>import re >>>source='Young Frankenstein' >>>m=re.match('You',source)# match starts at the beginning of source >>>if m:# match returns an object; do this to see what matched ... print(m.group()) ... You >>>m=re.match('^You',source)# start anchor does the same >>>if m: ... print(m.group()) ... You How about 'Frank'? >>>m=re.match('Frank',source) >>>if m: ... print(m.group()) ... This time match() returned nothing and the if did not run the print statement. As I said earlier, match() works only if the pattern is at the beginning of the source. But search() works if the pattern is anywhere: >>>m=re.search('Frank',source) >>>if m: ... print(m.group()) ... Frank Let’s change the pattern: >>>m=re.match('.*Frank',source) >>>if m:# match returns an object ... print(m.group()) ... YoungFrank Following is a brief explanation of how our new pattern works: • . means any single character. • * means any number of the preceding thing. Together, .* mean any number of characters (even zero). • Frank is the phrase that we wanted to match, somewhere. match() returned the string that matched .*Frank: 'Young Frank'. —————————————— wildcard 通配符 —————————————— First match with search() You can use search() to find the pattern 'Frank' anywhere in the source string 'Young Frankenstein', without the need for the .* wildcards: >>>m=re.search('Frank',source) >>>if m:# search returns an object ... print(m.group()) ... Frank —————————————— All matches with findall() The preceding examples looked for one match only. But what if you want to know how many instances of the single-letter string 'n' are in the string? >>>m=re.findall('n',source) >>>m# findall returns a list ['n','n','n','n'] >>>print('Found',len(m),'matches') Found 4 matches How about 'n' followed by any character? >>>m=re.findall('n.',source) >>>m ['ng','nk','ns'] Notice that it did not match that final 'n'. We need to say that the character after 'n' is optional with ?: >>>m=re.findall('n.?',source) >>>m ['ng','nk','ns','n'] —————————————— Split at matches with split() The example that follows shows you how to split a string into a list by a pattern rather than a simple string (as the normal string split() method would do): >>>m=re.split('n',source) >>>m# split returns a list ['You','g Fra','ke','stei',''] —————————————— Replace at matches with sub() This is like the string replace() method, but for patterns rather than literal strings: >>>m=re.sub('n','?',source) >>>m# sub returns a string 'You?g Fra?ke?stei?' —————————————— sub 替换 —————————————— Patterns: special characters Many descriptions of regular expressions start with all the details of how to define them. I think that’s a mistake. Regular expressions are a not-so-little language in their own right, with too many details to fit in your head at once. They use so much punctuation that they look like cartoon characters swearing. With these expressions (match(), search(), findall(), and sub()) under your belt, let’s get into the details of building them. The patterns you make apply to any of these functions. You’ve seen the basics: • Literal matches with any non-special characters • Any single character except \n with . • Any number (including zero) with * • Optional (zero or one) with ? First, special characters are shown in Table 7-3: Table 7-3. Special characters Pattern Matches \d a single digit \D a single non-digit \w an alphanumeric character \W a non-alphanumeric character \s a whitespace character \S a non-whitespace character \b a word boundary (between a \w and a \W, in either order) \B a non-word boundary The Python string module has predefined string constants that we can use for testing. We’ll use printable, which contains 100 printable ASCII characters, including letters in both cases, digits, space characters, and punctuation: >>>import string >>>printable=string.printable >>>len(printable) 100 >>>printable[0:50] '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMN' >>>printable[50:] 'OPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~ \t\n\r\x0b\x0c' Which characters in printable are digits? >>>re.findall('\d',printable) ['0','1','2','3','4','5','6','7','8','9'] Which characters are digits, letters, or an underscore? >>>re.findall('\w',printable) ['0','1','2','3','4','5','6','7','8','9','a','b', 'c','d','e','f','g','h','i','j','k','l','m','n', 'o','p','q','r','s','t','u','v','w','x','y','z', 'A','B','C','D','E','F','G','H','I','J','K','L', 'M','N','O','P','Q','R','S','T','U','V','W','X', 'Y','Z','_'] Which are spaces? >>>re.findall('\s',printable) [' ','\t','\n','\r','\x0b','\x0c'] Regular expressions are not confined to ASCII. A \d will match whatever Unicode calls a digit, not just ASCII characters '0' through '9'. Let’s add two non-ASCII lowercase letters: In this test, we’ll throw in the following: • Three ASCII letters • Three punctuation symbols that should not match a \w • A Unicode LATIN SMALL LETTER E WITH CIRCUMFLEX (\u00ea) • A Unicode LATIN SMALL LETTER E WITH BREVE (\u0115) >>>x='abc'+'-/*'+'\u00ea'+'\u0115' As expected, this pattern found only the letters: >>>re.findall('\w',x) ['a','b','c','ê','ĕ'] —————————————— Patterns: using specifiers(1) Now, let’s make “punctuation pizza,” using the main pattern specifiers for regular expressions, which are presented in Table 7-4. In the table, expr and the other italicized words mean any valid regular expression. Table 7-4. Pattern specifiers Pattern Matches abc literal abc ( expr ) expr expr1 | expr2 expr1 or expr2 . any character except \n ^ start of source string $ end of source string prev ? zero or one prev prev * zero or more prev, as many as possible prev *? zero or more prev, as few as possible prev + one or more prev, as many as possible prev +? one or more prev, as few as possible prev { m } m consecutive prev prev { m, n } m to n consecutive prev, as many as possible prev { m, n }? m to n consecutive prev, as few as possible [ abc ] a or b or c (same as a|b|c) [^ abc ] not (a or b or c) prev (?= next ) prev if followed by next prev (?! next ) prev if not followed by next (?<= prev ) next next if preceded by prev (?>>source='''I wish I may, I wish I might ... Have a dish of fish tonight.''' First, find wish anywhere: >>>re.findall('wish',source) ['wish','wish'] Next, find wish or fish anywhere: >>>re.findall('wish|fish',source) ['wish','wish','fish'] Find wish at the beginning: >>>re.findall('^wish',source) [] Find I wish at the beginning: >>>re.findall('^I wish',source) ['I wish'] Find fish at the end: >>>re.findall('fish$',source) [] Finally, find fish tonight. at the end: >>>re.findall('fish tonight.$',source) ['fish tonight.'] The characters ^ and $ are called anchors: ^ anchors the search to the beginning of the search string, and $ anchors it to the end. .$ matches any character at the end of the line, including a period, so that worked. To be more precise, we should escape the dot to match it literally: >>>re.findall('fish tonight\.$',source) ['fish tonight.'] Begin by finding w or f followed by ish: >>>re.findall('[wf]ish',source) ['wish','wish','fish'] Find one or more runs of w, s, or h: >>>re.findall('[wsh]+',source) ['w','sh','w','sh','h','sh','sh','h'] Find ght followed by a non-alphanumeric: >>>re.findall('ght\W',source) ['ght\n','ght.'] Find I followed by wish: >>>re.findall('I (?=wish)',source) ['I ','I '] And last, wish preceded by I: >>>re.findall('(?<=I) wish',source) [' wish',' wish'] There are a few cases in which the regular expression pattern rules conflict with the Python string rules. The following pattern should match any word that begins with fish: >>>re.findall('\bfish',source) [] Why doesn’t it? As is discussed in Chapter 2, Python employs a few special escape characters for strings. For example, \b means backspace in strings, but in the mini-language of regular expressions it means the beginning of a word. Avoid the accidental use of escape characters by using Python’s raw strings when you define your regular expression string. Always put an r character before your regular expression pattern string, and Python escape characters will be disabled, as demonstrated here: >>>re.findall(r'\bfish',source) ['fish'] —————————————— anchor 锚 —————————————— 总有人会说你不是一个真正的程序员。 看看这些说法吧! “HTML不是真正的编程语言。” “如果你不用vi,你就不是真正的程序员。” “真正的程序员得懂C语言。” “有些人就是不适合编程。” “有些人就是学不会。” “你根本就不是真正的程序员,我才是。” 要我说,编程对于不同的人有不同的含义。同时,编程的含义随着时间的流逝也在变化。 有趣的是,那些能让初学者,甚至是编程老鸟,更快上手,更省事的工具,包,框架等往往会被贴上 “真正的程序员不该使用” 这样的标签。 这种贴标签行为背后是一种恐惧:如果任何人都能称自己为程序员,那这个头衔就将毫无意义。不过,我认为这种闭关自守的行为是有害的。 去用那些让写程序变得容易的工具吧。如果那意味着你用 Stencyl 或者 GameMaker 来写游戏,而不是从零开始写一个新的,没事,只管去做。 如果你第一次尝试编程是从HTML或者Excel宏开始,没事,只管去做。哪个(编程方式)你能坚持下去,你就用哪个。 随着你技术不断提升,你就会发现那些便利工具对你的限制大于对你的帮助。那时,你就会去寻找更强大的编程工具。 但大多数时候,很少有人会看你的代码,或者问你用什么编程工具。你的程序到底好不好用才是真正重要的。 —————————————— 曾有一个段子说:一个妹子问一个程序猿如何让一帮不爱说话的程序猿活跃起来,那个程序猿说了一句PHP是最好的语言。。。 很多程序猿都喜欢站队,所谓站队就是自己在玩某某语言的时候,就会觉得这个语言很强大,自然而然的就站了队了。 我们的身边充斥着写java的看不上写.net的,写c的看不上一切,node异军突起,python批评ruby垃圾性能,高富帅swift从天而降,PHP是最强语言等等。。。 十二年前,我刚工作的时候,写医疗软件,用VB写界面,用VC写功能封装dll,当时觉得VB,VC太强了,结果现在它们基本都死了。 同样的还有: 我写过Dephi,死了。。 我写过asp,死了。。 我写过塞班,死了。。 我们更应当做的是跳出语言的框框,编程注重的是思想,而非语言。 —————————————— Patterns: specifying match output 模式:指定匹配输出 When using match() or search(), all matches are returned from the result object m as m.group(). If you enclose a pattern in parentheses, the match will be saved to its own group, and a tuple of them will be available as m.groups(), as shown here: #\b #a word boundary (between a \w and a \W, in either order) >>>source='''I wish I may, I wish I might ... Have a dish of fish tonight.''' >>>m=re.search(r'(. dish\b).*(\bfish)',source) >>>m.group() 'a dish of fish' >>>m.groups() ('a dish','fish') If you use this pattern (?P< name > expr ), it will match expr, saving the match in group name: >>>m=re.search(r'(?P. dish\b).*(?P\bfish)',source) >>>m.group() 'a dish of fish' >>>m.groups() ('a dish','fish') >>>m.group('DISH') 'a dish' >>>m.group('FISH') 'fish' —————————————— regex 正则表达式 —————————————— Examples of Regular Expressions In this section, I will show you some examples of regex to help you understand the concept further. Say that you had this regex: /abder/ This is simply telling us to match the word abder only. What about this regex? /a[nr]t/ You can read this regex as follows: find a text pattern such that the first letter is a and the last letter is t, and between those letters comes either n or r. So the matching words are ant and art. Let me give you a small quiz at this point. How would you write a regular expression that starts with ca, and ends with one or all of the following characters tbr? Yes, this regex can be written as follows: /ca[tbr]/ If you see a regex that starts with a circumflex accent ^, this means match the string that starts with the string mentioned after ^. So, if you had the regex below, it is matching the string that begins with This. /^This/ Thus, in the following string: My name is Abder This is Abder This is Tom Based on the regex /^This/, the following strings will be matched: This is Abder This is Tom What if we wanted to match a string that ends with some string? In this case, we use the dollar sign $. Here is an example: Abder$ Thus, in the above string (the three lines), the following patterns would be matched using this regex: My name is Abder This is Abder Well, what do you think about this regex? ^[A-Z][a-z] I know it might seem complex at first glance, but let's go through it piece by piece. We already saw what a circumflex accent ^ is. It means match a string which starts with some string. [A-Z] refers to the upper case letters. So, if we read this part of the regex: ^[A-Z], it is telling us to match the string which begins with an uppercase letter. The last part, [a-z], means that after finding a string that starts with an uppercase letter, it would be followed by lowercase letters from the alphabet. So, which of the following strings will be matched using this regex? If you are not sure, you can use Python to figure out. abder Abder ABDER ABder Regular expressions are a very broad topic, and those examples are just to give you a feel for what they are and why we use them. A nice reference to learn more about regular expressions and see more examples is http://www.rexegg.com/. —————————————— pattern 式样,模板,模式 metacharacters 元字符 specifiers 量词 —————————————— If you want to know more about regular expressions, please looking for a book which named Mastering.Python.Regular.Expressions. —————————————— Binary Data Text data can be challenging, but binary data can be, well, interesting. You need to know about concepts such as endianness (how your computer’s processor breaks data into bytes) and sign bits for integers. You might need to delve into binary file formats or network packets to extract or even change data. This section will show you the basics of binary data wrangling in Python. —————————————— bytes and bytearray(1) Python 3 introduced the following sequences of eight-bit integers, with possible values from 0 to 255, in two types: • bytes is immutable, like a tuple of bytes • bytearray is mutable, like a list of bytes Beginning with a list called blist, this next example creates a bytes variable called the_bytes and a bytearray variable called the_byte_array: >>blist=[1,2,3,255] >>>the_bytes=bytes(blist) >>>the_bytes b'\x01\x02\x03\xff' >>>the_byte_array=bytearray(blist) >>>the_byte_array bytearray(b'\x01\x02\x03\xff') —————————————— bytes and bytearray(2) This next example demonstrates that you can’t change a bytes variable: >>>the_bytes[1]=127 Traceback(mostrecentcalllast): File"",line1,in TypeError:'bytes'objectdoesnotsupportitemassignment But a bytearray variable is mellow and mutable: >>>the_byte_array=bytearray(blist) >>>the_byte_array bytearray(b'\x01\x02\x03\xff') >>>the_byte_array[1]=127 >>>the_byte_array bytearray(b'\x01\x7f\x03\xff') Each of these would create a 256-element result, with values from 0 to 255: >>>the_bytes=bytes(range(0,256)) >>>the_byte_array=bytearray(range(0,256)) When printing bytes or bytearray data, Python uses \x xx for non-printable bytes and their ASCII equivalents for printable ones (plus some common escape characters, such as \n instead of \x0a). Here’s the printed representation of the_bytes (manually reformatted to show 16 bytes per line): >>>the_bytes b'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f \x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f !"#$%&\'()*+,-./ 0123456789:;<=>? @ABCDEFGHIJKLMNO PQRSTUVWXYZ[\\]^_ `abcdefghijklmno pqrstuvwxyz{|}~\x7f \x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f \x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f \xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7\xa8\xa9\xaa\xab\xac\xad\xae\xaf \xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf \xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf \xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf \xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef \xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff' This can be confusing, because they’re bytes (teeny integers), not characters. —————————————— The struct module: Using struct, you can convert binary data to and from Python data structures. —————————————— Other Binary Data Tools Some third-party open source packages offer the following, more declarative ways of defining and extracting binary data: • bitstring • construct • hachoir • binio —————————————— Things to Do —————————————— 7.1. Create a Unicode string called mystery and assign it the value '\U0001f4a9'. Print mystery. Look up the Unicode name for mystery. mystery = '\U0001f4a9' print(mystery) import unicodedata name = unicodedata.name(mystery) print(name) —————————————— 7.2. Encode mystery, this time using UTF-8, into the bytes variable pop_bytes. Print pop_bytes. mystery = '\U0001f4a9' pop_bytes = mystery.encode('utf-8') print(pop_bytes) —————————————— 7.3. Using UTF-8, decode pop_bytes into the string variable pop_string. Print pop_string. Is pop_string equal to mystery? mystery = '\U0001f4a9' pop_bytes = mystery.encode('utf-8') pop_string = pop_bytes.decode('utf-8') print(pop_string) print(pop_string==mystery) —————————————— 7.4. Write the following poem by using old-style formatting. Substitute the strings 'roast beef', 'ham', 'head', and 'clam' into this string: My kitty cat likes %s, My kitty cat likes %s, My kitty cat fell on his %s And now thinks he's a %s. poem = '''My kitty cat likes %s, My kitty cat likes %s, My kitty cat fell on his %s And now thinks he's a %s.'''%('roast beef','ham','head','clam') print(poem) —————————————— 7.5. Write a form letter by using new-style formatting. Save the following string as letter (you’ll use it in the next exercise): Dear {salutation} {name}, Thank you for your letter. We are sorry that our {product} {verbed} in your {room}. Please note that it should never be used in a {room}, especially near any {animals}. Send us your receipt and {amount} for shipping and handling. We will send you another {product} that, in our tests, is {percent}% less likely to have {verbed}. Thank you for your support. Sincerely, {spokesman} {job_title} —————————————— 7.6. Make a dictionary called response with values for the string keys 'salutation', 'name', 'product', 'verbed' (past tense verb), 'room', 'animals', 'percent', 'spokesman', and 'job_title'. Print letter with the values from response. letter = '''Dear {salutation} {name}, Thank you for your letter. We are sorry that our {product} {verbed} in your {room}. Please note that it should never be used in a {room}, especially near any {animals}. Send us your receipt and {amount} for shipping and handling. We will send you another {product} that, in our tests, is {percent}% less likely to have {verbed}. Thank you for your support. Sincerely, {spokesman} {job_title}''' response = { 'salutation':'Colonel', 'name':'Hackenbush', 'product':'duck blind', 'verbed':'imploded', 'room':'conservatory', 'animals':'emus', 'amount':'$1.38', 'percent':'1', 'spokesman':'Edgar Schmeltz', 'job_title':'Licensed Podiatrist' } print(letter.format(**response)) #** can from dict get parameters. —————————————— 7.8 Import the re module to use Python’s regular expression functions. Use re.findall() to print all the words that begin with 'c'. mammoth=''' We have seen thee, queen of cheese, Lying quietly at your ease, Gently fanned by evening breeze, Thy fair form no flies dare seize. All gaily dressed soon you'll go To the great Provincial show, To be admired by many a beau In the city of Toronto. Cows numerous as a swarm of bees, Or as the leaves upon the trees, It did require to make thee please, And stand unrivalled, queen of cheese. May you not receive a scar as We have heard that Mr. Harris Intends to send you off as far as The great world's show at Paris. Of the youth beware of these, For some of them might rudely squeeze And bite your cheek, then songs or glees We could not sing, oh! queen of cheese. We'rt thou suspended from balloon, You'd cast a shade even at noon, Folks would think it was the moon About to fall and crush them soon. ''' import re m = re.findall(r'\bc\w*\b', mammoth) print(m) —————————————— 7.9 Find all four-letter words that begin with c. import re m = re.findall(r'\bc\w{3}\b', mammoth) print(m) —————————————— 7.10. Find all the words that end with r. import re m = re.findall(r'\w*r\b', mammoth) print(m) —————————————— row 行,排 英语中的元音字母有a,e,i,o,u五个。 —————————————— 7.11. Find all the words that contain exactly three vowels in a row. import re m = re.findall(r'\b\w*[aeiou]{3}\w*\b', mammoth) print(m) —————————————— 7.12. Use unhexlify() to convert this hex string (combined from two strings to fit on a page) to a bytes variable called gif: '47494638396101000100800000000000ffffff21f9'+ '0401000000002c000000000100010000020144003b' import binascii hex_str='47494638396101000100800000000000ffffff21f9'+\ '0401000000002c000000000100010000020144003b' gif=binascii.unhexlify(hex_str) —————————————— Chapter 8. Data Has to Go Somewhere An active program accesses data that is stored in Random Access Memory, or RAM. RAM is very fast, but it is expensive and requires a constant supply of power; if the power goes out, all the data in memory is lost. Disk drives are slower than RAM but have more capacity, cost less, and retain data even after someone trips over the power cord. Thus, a huge amount of effort in computer systems has been devoted to making the best tradeoffs between storing data on disk and RAM. As programmers, we need persistence: storing and retrieving data using nonvolatile media such as disks. This chapter is all about the different flavors of data storage, each optimized for different purposes: flat files, structured files, and databases. File operations other than input and output are covered in Files. —————————————— File Input/Output The simplest kind of persistence is a plain old file, sometimes called a flat file. This is just a sequence of bytes stored under a filename. You read from a file into memory and write from memory to a file. Python makes these jobs easy. Its file operations were modeled on the familiar and popular Unix equivalents. Before reading or writing a file, you need to open it: fileobj=open(filename,mode) Here’s a brief explanation of the pieces of this call: • fileobj is the file object returned by open() • filename is the string name of the file • mode is a string indicating the file’s type and what you want to do with it The first letter of mode indicates the operation: • r means read. • w means write. If the file doesn’t exist, it’s created. If the file does exist, it’s overwritten. • x means write, but only if the file does not already exist. • a means append (write after the end) if the file exists. The second letter of mode is the file’s type: • t (or nothing) means text. • b means binary. After opening the file, you call functions to read or write data; these will be shown in the examples that follow. Last, you need to close the file. Let’s create a file from a Python string in one program and then read it back in the next. —————————————— Write a Text File with write() For some reason, there aren’t many limericks about special relativity. This one will just have to do for our data source: >>>poem='''There was a young lady named Bright, ... Whose speed was far faster than light; ... She started one day ... In a relative way, ... And returned on the previous night.''' >>>len(poem) 150 The following code writes the entire poem to the file 'relativity' in one call: >>>fout=open('relativity','wt') >>>fout.write(poem) 150 >>>fout.close() The write() function returns the number of bytes written. It does not add any spaces or newlines, as print() does. You can also print() to a text file: >>>fout=open('relativity','wt') >>>print(poem,file=fout) >>>fout.close() This brings up the question: should I use write() or print()? By default, print() adds a space after each argument and a newline at the end. In the previous example, it appended a newline to the relativity file. To make print() work like write(), pass the following two arguments: • sep (separator, which defaults to a space, ' ') • end (end string, which defaults to a newline, '\n') print() uses the defaults unless you pass something else. We’ll pass empty strings to suppress all of the fussiness normally added by print(): >>>fout=open('relativity','wt') >>>print(poem,file=fout,sep='',end='') >>>fout.close() If you have a large source string, you can also write chunks until the source is done: >>>fout=open('relativity','wt') >>>size=len(poem) >>>offset=0 >>>chunk=100 >>>whileTrue: ...if offset>size: ... break ...fout.write(poem[offset:offset+chunk]) ...offset+=chunk ... 100 50 >>>fout.close() This wrote 100 characters on the first try and the last 50 characters on the next. If the relativity file is precious to us, let’s see if using mode x really protects us from overwriting it: >>>fout=open('relativity','xt') Traceback(mostrecentcalllast): File"",line1,in FileExistsError:[Errno17]Fileexists:'relativity' You can use this with an exception handler: >>>try: ... fout=open('relativity','xt')] ... fout.write('stomp stomp stomp') ...except FileExistsError: ... print('relativity already exists! That was a close one.') ... relativity already exists!That was a close one. —————————————— Read a Text File with read(), readline(), or readlines() You can call read() with no arguments to slurp up the entire file at once, as shown in the example that follows. Be careful when doing this with large files; a gigabyte file will consume a gigabyte of memory. >>>fin=open('relativity','rt') >>>poem=fin.read() >>>fin.close() >>>len(poem) 150 You can provide a maximum character count to limit how much read() returns at one time. Let’s read 100 characters at a time and append each chunk to a poem string to rebuild the original: >>>poem='' >>>fin=open('relativity','rt') >>>chunk=100 >>>while True: ... fragment=fin.read(chunk) ... if not fragment: ... break ... poem+=fragment ... >>>fin.close() >>>len(poem) 150 After you’ve read all the way to the end, further calls to read() will return an empty string (''), which is treated as False in if not fragment. This breaks out of the while True loop. You can also read the file a line at a time by using readline(). In this next example, we’ll append each line to the poem string to rebuild the original: >>>poem='' >>>fin=open('relativity','rt') >>>while True: ... line=fin.readline() ... if not line: ... break ... poem+=line ... >>>fin.close() >>>len(poem) 150 For a text file, even a blank line has a length of one (the newline character), and is evaluated as True. When the file has been read, readline() (like read()) also returns an empty string, which is also evaluated as False. The easiest way to read a text file is by using an iterator. This returns one line at a time. It’s similar to the previous example, but with less code: >>>poem='' >>>fin=open('relativity','rt') >>>for line in fin: ... poem+=line ... >>>fin.close() >>>len(poem) 150 All of the preceding examples eventually built the single string poem. The readlines() call reads a line at a time, and returns a list of one-line strings: >>>fin=open('relativity','rt') >>>lines=fin.readlines() >>>fin.close() >>>print(len(lines),'lines read') 5 linesread >>>for line in lines: ... print(line,end='') ... There was a young lady named Bright, Whose speed was far faster than light; She started one day In a relative way, And returned on the previous night.>>> We told print() to suppress the automatic newlines because the first four lines already had them. The last line did not, causing the interactive prompt >>> to occur right after the last line. —————————————— Write a Binary File with write() If you include a 'b' in the mode string, the file is opened in binary mode. In this case, you read and write bytes instead of a string. We don’t have a binary poem lying around, so we’ll just generate the 256 byte values from 0 to 255: >>>bdata=bytes(range(0,256)) >>>len(bdata) 256 Open the file for writing in binary mode and write all the data at once: >>>fout=open('bfile','wb') >>>fout.write(bdata) 256 >>>fout.close() Again, write() returns the number of bytes written. As with text, you can write binary data in chunks: >>>fout=open('bfile','wb') >>>size=len(bdata) >>>offset=0 >>>chunk=100 >>>while True: ... if offset>size: ... break ... fout.write(bdata[offset:offset+chunk]) ... offset+=chunk ... 100 100 56 >>>fout.close() —————————————— Read a Binary File with read() This one is simple; all you need to do is just open with 'rb': >>>fin=open('bfile','rb') >>>bdata=fin.read() >>>len(bdata) 256 >>>fin.close() —————————————— Close Files Automatically by Using with If you forget to close a file that you’ve opened, it will be closed by Python after it’s no longer referenced. This means that if you open a file within a function and don’t close it explicitly, it will be closed automatically when the function ends. But you might have opened the file in a long-running function or the main section of the program. The file should be closed to force any remaining writes to be completed. Python has context managers to clean up things such as open files. You use the form with expression as variable: >>>with open('relativity','wt') as fout: ... fout.write(poem) ... That’s it. After the block of code under the context manager (in this case, one line) completes (normally or by a raised exception), the file is closed automatically. —————————————— Change Position with seek() As you read and write, Python keeps track of where you are in the file. The tell() function returns your current offset from the beginning of the file, in bytes. The seek() function lets you jump to another byte offset in the file. This means that you don’t have to read every byte in a file to read the last one; you can seek() to the last one and just read one byte. For this example, use the 256-byte binary file 'bfile' that you wrote earlier: >>>fin=open('bfile','rb') >>>fin.tell() 0 Use seek() to one byte before the end of the file: >>>fin.seek(255) 255 Read until the end of the file: >>>bdata=fin.read() >>>len(bdata) 1 >>>bdata[0] 255 seek() also returns the current offset. You can call seek() with a second argument: seek( offset, origin ): • If origin is 0 (the default), go offset bytes from the start • If origin is 1, go offset bytes from the current position • If origin is 2, go offset bytes relative to the end These values are also defined in the standard os module: >>>importos >>>os.SEEK_SET 0 >>>os.SEEK_CUR 1 >>>os.SEEK_END 2 So, we could have read the last byte in different ways: >>>fin=open('bfile','rb') One byte before the end of the file: >>>fin.seek(-1,2) 255 >>>fin.tell() 255 Read until the end of the file: >>>bdata=fin.read() >>>len(bdata) 1 >>>bdata[0] 255 Note You don’t need to call tell() for seek() to work. I just wanted to show that they both report the same offset. Here’s an example of seeking from the current position in the file: >>>fin=open('bfile','rb') This next example ends up two bytes before the end of the file: >>>fin.seek(254,0) 254 >>>fin.tell() 254 Now, go forward one byte: >>>fin.seek(1,1) 255 >>>fin.tell() 255 Finally, read until the end of the file: >>>bdata=fin.read() >>>len(bdata) 1 >>>bdata[0] 255 These functions are most useful for binary files. You can use them with text files, but unless the file is ASCII (one byte per character), you would have a hard time calculating offsets. These would depend on the text encoding, and the most popular encoding (UTF-8) uses varying numbers of bytes per character. —————————————— Structured Text Files With simple text files, the only level of organization is the line. Sometimes, you want more structure than that. You might want to save data for your program to use later, or send data to another program. There are many formats, and here’s how you can distinguish them: • A separator, or delimiter, character like tab ('\t'), comma (','), or vertical bar ('|'). This is an example of the comma-separated values (CSV) format. • '<' and '>' around tags. Examples include XML and HTML. • Punctuation. An example is JavaScript Object Notation (JSON). • Indentation. An example is YAML (which depending on the source you use means “YAML Ain’t Markup Language;” you’ll need to research that one yourself). • Miscellaneous, such as configuration files for programs. Each of these structured file formats can be read and written by at least one Python module. —————————————— parsing v.语法分析 n.分析; 解析 obligingly 亲切地;勤快 columns 纵列 rows 行列 omitting 省略 CSV(Comma Separated Values) 逗号分隔型取值格式,是一种纯文本格式,用来存储数据。 —————————————— CSV(1) Delimited files are often used as an exchange format for spreadsheets and databases. You could read CSV files manually, a line at a time, splitting each line into fields at comma separators, and adding the results to data structures such as lists and dictionaries. But it’s better to use the standard csv module, because parsing these files can get more complicated than you think. • Some have alternate delimiters besides a comma: '|' and '\t' (tab) are common. • Some have escape sequences. If the delimiter character can occur within a field, the entire field might be surrounded by quote characters or preceded by some escape character. • Files have different line-ending characters. Unix uses '\n', Microsoft uses '\r\n', and Apple used to use '\r' but now uses '\n'. • There can be column names in the first line. First, we’ll see how to read and write a list of rows, each containing a list of columns: >>>import csv >>>villains=[ ... ['Doctor','No'], ... ['Rosa','Klebb'], ... ['Mister','Big'], ... ['Auric','Goldfinger'], ... ['Ernst','Blofeld'], ... ] >>>with open('villains','wt') as fout:# a context manager ... csvout=csv.writer(fout) ... csvout.writerows(villains) This creates the file villains with these lines: Doctor,No Rosa,Klebb Mister,Big Auric,Goldfinger Ernst,Blofeld Now, we’ll try to read it back in: >>>import csv >>>with open('villains','rt') as fin:# context manager ... cin=csv.reader(fin) ... villains=[row for row in cin]# This uses a list comprehension ... >>>print(villains) [['Doctor','No'],['Rosa','Klebb'],['Mister','Big'], ['Auric','Goldfinger'],['Ernst','Blofeld']] Take a moment to think about list comprehensions (To feel better you can go back to see comprehension syntax). We took advantage of the structure created by the reader() function. It obligingly created rows in the cin object that we can extract in a for loop. Using reader() and writer() with their default options, the columns are separated by commas and the rows by line feeds. —————————————— CSV(2) The data can be a list of dictionaries rather than a list of lists. Let’s read the villains file again, this time using the new DictReader() function and specifying the column names: >>>import csv >>>with open('villains','rt') as fin: ... cin=csv.DictReader(fin,fieldnames=['first','last']) ... villains=[row for row in cin] ... >>>print(villains) [{'last':'No','first':'Doctor'}, {'last':'Klebb','first':'Rosa'}, {'last':'Big','first':'Mister'}, {'last':'Goldfinger','first':'Auric'}, {'last':'Blofeld','first':'Ernst'}] Let’s rewrite the CSV file by using the new DictWriter() function. We’ll also call writeheader() to write an initial line of column names to the CSV file: import csv villains=[ {'first':'Doctor','last':'No'}, {'first':'Rosa','last':'Klebb'}, {'first':'Mister','last':'Big'}, {'first':'Auric','last':'Goldfinger'}, {'first':'Ernst','last':'Blofeld'}, ] with open('villains','wt') as fout: cout=csv.DictWriter(fout,['first','last']) cout.writeheader() cout.writerows(villains) That creates a villains file with a header line: first,last Doctor,No Rosa,Klebb Mister,Big Auric,Goldfinger Ernst,Blofeld Now we’ll read it back. By omitting the fieldnames argument in the DictReader() call, we instruct it to use the values in the first line of the file (first,last) as column labels and matching dictionary keys: >>>import csv >>>with open('villains','rt') as fin: ... cin=csv.DictReader(fin) ... villains=[row for row in cin] ... >>>print(villains) [{'last':'No','first':'Doctor'}, {'last':'Klebb','first':'Rosa'}, {'last':'Big','first':'Mister'}, {'last':'Goldfinger','first':'Auric'}, {'last':'Blofeld','first':'Ernst'}] —————————————— hierarchy 层级;阶层;层次结构 prominent 重要的; 突出的; 卓越的 derive 导出;派生;衍生 —————————————— XML(1) Delimited files convey only two dimensions: rows (lines) and columns (fields within a line). If you want to exchange data structures among programs, you need a way to encode hierarchies, sequences, sets, and other structures as text. XML is the most prominent markup format that suits the bill. It uses tags to delimit data, as in this sample menu.xml file: breakfast burritos pancakes hamburger spaghetti Following are a few important characteristics of XML: • Tags begin with a < character. The tags in this sample were menu, breakfast, lunch, dinner, and item. • Whitespace is ignored. • Usually a start tag such as is followed by other content and then a final matching end tag such as . • Tags can nest within other tags to any level. In this example, item tags are children of the breakfast, lunch, and dinner tags; they, in turn, are children of menu. • Optional attributes can occur within the start tag. In this example, price is an attribute of item. • Tags can contain values. In this example, each item has a value, such as pancakes for the second breakfast item. • If a tag named thing has no values or children, it can be expressed as the single tag by including a forward slash just before the closing angle bracket, such as , rather than a start and end tag, like . • The choice of where to put data—attributes, values, child tags—is somewhat arbitrary. For instance, we could have written the last item tag as . —————————————— XML(2) XML is often used for data feeds and messages, and has subformats like RSS and Atom. Some industries have many specialized XML formats, such as the finance field. XML’s über-flexibility has inspired multiple Python libraries that differ in approach and capabilities. The simplest way to parse XML in Python is by using ElementTree. Here’s a little program to parse the menu.xml file and print some tags and attributes: >>>import xml.etree.ElementTreeaset >>>tree=et.ElementTree(file='menu.xml') >>>root=tree.getroot() >>>root.tag 'menu' >>>for child in root: ... print('tag:',child.tag,'attributes:',child.attrib) ... for grandchild in child: ... print('\ttag:',grandchild.tag,'attributes:',grandchild.attrib) ... tag: breakfast attributes:{'hours':'7-11'} tag: item attributes:{'price':'$6.00'} tag: item attributes:{'price':'$4.00'} tag: lunch attributes:{'hours':'11-3'} tag: item attributes:{'price':'$5.00'} tag: dinner attributes:{'hours':'3-10'} tag: item attributes:{'price':'8.00'} >>>len(root)# number of menu sections 3 >>>len(root[0])# number of breakfast items 2 For each element in the nested lists, tag is the tag string and attrib is a dictionary of its attributes. ElementTree has many other ways of searching XML-derived data, modifying it, and even writing XML files. The ElementTree documentation has the details. Other standard Python XML libraries include: xml.dom The Document Object Model (DOM), familiar to JavaScript developers, represents Web documents as hierarchical structures. This module loads the entire XML file into memory and lets you access all the pieces equally. xml.sax Simple API for XML, or SAX, parses XML on the fly, so it does not have to load everything into memory at once. Therefore, it can be a good choice if you need to process very large streams of XML. —————————————— Enormous 巨大的;大量的 —————————————— HTML Enormous amounts of data are saved as Hypertext Markup Language (HTML), the basic document format of the Web. The problem is so much of it doesn’t follow the HTML rules, which can make it difficult to parse. Also, much of HTML is intended more to format output than interchange data. Because this chapter is intended to describe fairly well-defined data formats, I have separated out the discussion about HTML to Chapter 9. —————————————— notation 记号 dump 倾销;转储文件 —————————————— JSON(1) JavaScript Object Notation (JSON) has become a very popular data interchange format, beyond its JavaScript origins. The JSON format is a subset of JavaScript, and often legal Python syntax as well. Its close fit to Python makes it a good choice for data interchange among programs. You’ll see many examples of JSON for web development in Chapter 9. Unlike the variety of XML modules, there’s one main JSON module, with the unforgettable name json. This program encodes (dumps) data to a JSON string and decodes (loads) a JSON string back to data. In this next example, let’s build a Python data structure containing the data from the earlier XML example: >>> menu = \ ... { ... "breakfast": { ... "hours": "7-11", ... "items": { ... "breakfast burritos": "$6.00", ... "pancakes": "$4.00" ... } ... }, ... "lunch" : { ... "hours": "11-3", ... "items": { ... "hamburger": "$5.00" ... } ... }, ... "dinner": { ... "hours": "3-10", ... "items": { ... "spaghetti": "$8.00" ... } ... } ... } . Next, encode the data structure (menu) to a JSON string (menu_json) by using dumps(): >>> import json >>> menu_json = json.dumps(menu) >>> menu_json '{"dinner": {"items": {"spaghetti": "$8.00"}, "hours": "3-10"}, "lunch": {"items": {"hamburger": "$5.00"}, "hours": "11-3"}, "breakfast": {"items": {"breakfast burritos" : "$6.00", "pancakes": "$4.00"}, "hours": "7-11"}}' And now, let’s turn the JSON string menu_json back into a Python data structure (menu2) by using loads(): >>> menu2 = json.loads(menu_json) >>> menu2 {'breakfast': {'items': {'breakfast burritos': '$6.00', 'pancakes': '$4.00'}, 'hours': '7-11'}, 'lunch': {'items': {'hamburger': '$5.00'}, 'hours': '11-3'}, 'dinner': {'items': {'spaghetti': '$8.00'}, 'hours': '3-10'}} menu and menu2 are both dictionaries with the same keys and values. As always with standard dictionaries, the order in which you get the keys varies. —————————————— JSON(2) You might get an exception while trying to encode or decode some objects, including objects such as datetime (covered in detail in Calendars and Clocks), as demonstrated here. >>> import datetime >>> now = datetime.datetime.utcnow() >>> now datetime.datetime(2013, 2, 22, 3, 49, 27, 483336) >>> json.dumps(now) Traceback (most recent call last): # ... (deleted stack trace to save trees) TypeError: datetime.datetime(2013, 2, 22, 3, 49, 27, 483336) is not JSON serializable >>> This can happen because the JSON standard does not define date or time types; it expects you to define how to handle them. You could convert the datetime to something JSON understands, such as a string or an epoch value (coming in Chapter 10): >>> now_str = str(now) >>> json.dumps(now_str) '"2013-02-22 03:49:27.483336"' >>> from time import mktime >>> now_epoch = int(mktime(now.timetuple())) >>> json.dumps(now_epoch) '1361526567' If the datetime value could occur in the middle of normally converted data types, it might be annoying to make these special conversions. You can modify how JSON is encoded by using inheritance, which is described in Inheritance. Python’s JSON documentation gives an example of this for complex numbers, which also makes JSON play dead. Let’s modify it for datetime: >>> class DTEncoder(json.JSONEncoder): ... def default(self, obj): ... # isinstance() checks the type of obj ... if isinstance(obj, datetime.datetime): ... return int(mktime(obj.timetuple())) ... # else it's something the normal decoder knows: ... return json.JSONEncoder.default(self, obj) ... >>> json.dumps(now, cls=DTEncoder) '1361526567' The new class DTEncoder is a subclass, or child class, of JSONEncoder. We only need to override its default() method to add datetime handling. Inheritance ensures that everything else will be handled by the parent class. The isinstance() function checks whether the object obj is of the class datetime.datetime. Because everything in Python is an object, isinstance() works everywhere: >>> type(now) >>> isinstance(now, datetime.datetime) True >>> type(234) >>> isinstance(234, int) True >>> type('hey') >>> isinstance('hey', str) True Note For JSON and other structured text formats, you can load from a file into data structures without knowing anything about the structures ahead of time. Then, you can walk through the structures by using isinstance() and type-appropriate methods to examine their values. For example, if one of the items is a dictionary, you can extract contents through keys(), values(), and items(). —————————————— third-party library 第三方库 manipulate 操作;处理 YAML 一个可读性高,用来表达资料序列的格式 —————————————— YAML Similar to JSON, YAML has keys and values, but handles more data types such as dates and times. The standard Python library does not yet include YAML handling, so you need to install a third-party library named yaml to manipulate it. load() converts a YAML string to Python data, whereas dump() does the opposite. The following YAML file, mcintyre.yaml, contains information on the Canadian poet James McIntyre, including two of his poems: name: first: James last: McIntyre dates: birth: 1828-05-25 death: 1906-03-31 details: bearded: true themes: [cheese, Canada] books: url: http://www.gutenberg.org/files/36068/36068-h/36068-h.htm poems: - title: 'Motto' text: | Politeness, perseverance and pluck, To their possessor will bring good luck. - title: 'Canadian Charms' text: | Here industry is not in vain, For we have bounteous crops of grain, And you behold on every field Of grass and roots abundant yield, But after all the greatest charm Is the snug home upon the farm, And stone walls now keep cattle warm. Values such as true, false, on, and off are converted to Python Booleans. Integers and strings are converted to their Python equivalents. Other syntax creates lists and dictionaries: >>> import yaml >>> with open('mcintyre.yaml', 'rt') as fin: >>> text = fin.read() >>> data = yaml.load(text) >>> data['details'] {'themes': ['cheese', 'Canada'], 'bearded': True} >>> len(data['poems']) 2 The data structures that are created match those in the YAML file, which in this case are more than one level deep in places. You can get the title of the second poem with this dict/list/dict reference: >>> data['poems'][1]['title'] 'Canadian Charms' Warning PyYAML can load Python objects from strings, and this is dangerous. Use safe_load() instead of load() if you’re importing YAML that you don’t trust. Better yet, always use safe_load(). Read war is peace for a description of how unprotected YAML loading compromised the Ruby on Rails platform. —————————————— A Security Note You can use all the formats described in this chapter to save objects to files and read them back again. It’s possible to exploit this process and cause security problems. For example, the following XML snippet from the billion laughs Wikipedia page defines ten nested entities, each expanding the lower level ten times for a total expansion of one billion: ]> &lol9; The bad news: billion laughs would blow up all of the XML libraries mentioned in the previous sections. Defused XML lists this attack and others, along with the vulnerability of Python libraries. The link shows how to change the settings for many of the libraries to avoid these problems. Also, you can use the defusedxml library as a security frontend for the other libraries: >>> # insecure: >>> from xml.etree.ElementTree import parse >>> et = parse(xmlfile) >>> # protected: >>> from defusedxml.ElementTree import parse >>> et = parse(xmlfile) —————————————— Configuration Files Most programs offer various options or settings. Dynamic ones can be provided as program arguments, but long-lasting ones need to be kept somewhere. The temptation to define your own quick and dirty config file format is strong—but resist it. It often turns out to be dirty, but not so quick. You need to maintain both the writer program and the reader program (sometimes called a parser). There are good alternatives that you can just drop into your program, including those in the previous sections. Here, we’ll use the standard configparser module, which handles Windows-style .ini files. Such files have sections of key = value definitions. Here’s a minimal settings.cfg file: [english] greeting = Hello [french] greeting = Bonjour [files] home = /usr/local # simple interpolation: bin = %(home)s/bin Here’s the code to read it into Python data structures: >>> import configparser >>> cfg = configparser.ConfigParser() >>> cfg.read('settings.cfg') ['settings.cfg'] >>> cfg >>> cfg['french'] >>> cfg['french']['greeting'] 'Bonjour' >>> cfg['files']['bin'] '/usr/local/bin' Other options are available, including fancier interpolation. See the configparser documentation. If you need deeper nesting than two levels, try YAML or JSON. —————————————— Other Interchange Formats These binary data interchange formats are usually more compact and faster than XML or JSON: MsgPack Protocol Buffers Avro Thrift Because they are binary, none can be easily edited by a human with a text editor. —————————————— serialize 连载;序列化 —————————————— Serialize by Using pickle Saving data structures to a file is called serializing. Formats such as JSON might require some custom converters to serialize all the data types from a Python program. Python provides the pickle module to save and restore any object in a special binary format. Remember how JSON lost its mind when encountering a datetime object? Not a problem for pickle: >>> import pickle >>> import datetime >>> now1 = datetime.datetime.utcnow() >>> pickled = pickle.dumps(now1) >>> now2 = pickle.loads(pickled) >>> now1 datetime.datetime(2014, 6, 22, 23, 24, 19, 195722) >>> now2 datetime.datetime(2014, 6, 22, 23, 24, 19, 195722) pickle works with your own classes and objects, too. We’ll define a little class called Tiny that returns the string 'tiny' when treated as a string: >>> import pickle >>> class Tiny(): ... def __str__(self): ... return 'tiny' ... >>> obj1 = Tiny() >>> obj1 <__main__.Tiny object at 0x10076ed10> >>> str(obj1) 'tiny' >>> pickled = pickle.dumps(obj1) >>> pickled b'\x80\x03c__main__\nTiny\nq\x00)\x81q\x01.' >>> obj2 = pickle.loads(pickled) >>> obj2 <__main__.Tiny object at 0x10076e550> >>> str(obj2) 'tiny' pickled is the pickled binary string made from the object obj1. We converted that back to the object obj2 to make a copy of obj1. Use dump() to pickle to a file, and load() to unpickle from one. Note Because pickle can create Python objects, the same security warnings that were discussed in earlier sections apply. Don’t unpickle something that you don’t trust. —————————————— Structured Binary Files Some file formats were designed to store particular data structures but are neither relational nor NoSQL databases. The sections that follow present some of them. Spreadsheets Spreadsheets, notably Microsoft Excel, are widespread binary data formats. If you can save your spreadsheet to a CSV file, you can read it by using the standard csv module that was described earlier. If you have a binary xls file, xlrd is a third-party package for reading and writing. HDF5 HDF5 is a binary data format for multidimensional or hierarchical numeric data. It’s used mainly in science, where fast random access to large datasets (gigabytes to terabytes) is a common requirement. Even though HDF5 could be a good alternative to databases in some cases, for some reason HDF5 is almost unknown in the business world. It’s best suited to WORM (write once/read many) applications for which database protection against conflicting writes is not needed. Here are a couple of modules that you might find useful: h5py is a full-featured low-level interface. Read the documentation and code. PyTables is a bit higher-level, with database-like features. Read the documentation and code. Both of these are discussed in terms of scientific applications of Python in Appendix C. I’m mentioning HDF5 here in case you have a need to store and retrieve large amounts of data and are willing to consider something outside the box, as well as the usual database solutions. A good example is the Million Song dataset, which has downloadable song data in HDF5 format. —————————————— Relational Databases(1) Relational databases are only about 40 years old but are ubiquitous in the computing world. You’ll almost certainly have to deal with them at one time or another. When you do, you’ll appreciate what they provide: Access to data by multiple simultaneous users Protection from corruption by those users Efficient methods to store and retrieve the data Data defined by schemas and limited by constraints Joins to find relationships across diverse types of data A declarative (rather than imperative) query language: SQL (Structured Query Language) These are called relational because they show relationships among different kinds of data in the form of tables (as they are usually called nowadays). For instance, in our menu example earlier, there is a relationship between each item and its price. A table is a grid of rows and columns, similar to a spreadsheet. To create a table, name it and specify the order, names, and types of its columns. Each row has the same columns, although a column may be defined to allow missing data (called nulls). In the menu example, you could create a table with one row for each item being sold. Each item has the same columns, including one for the price. A column or group of columns is usually the table’s primary key; its values must be unique in the table. This prevents adding the same data to the table more than once. This key is indexed for fast lookups during queries. An index works a little like a book index, making it fast to find a particular row. Each table lives within a parent database, like a file within a directory. Two levels of hierarchy help keep things organized a little better. Note Yes, the word database is used in multiple ways: as the server, the table container, and the data stored therein. If you’ll be referring to all of them at the same time, it might help to call them database server, database, and data. If you want to find rows by some non-key column value, define a secondary index on that column. Otherwise, the database server must perform a table scan—a brute-force search of every row for matching column values. Tables can be related to each other with foreign keys, and column values can be constrained to these keys. —————————————— Relational Databases(2) SQL SQL is not an API or a protocol, but a declarative language: you say what you want rather than how to do it. It’s the universal language of relational databases. SQL queries are text strings, that a client sends to the database server, which figures out what to do with them. There have been various SQL standard definitions, but all database vendors have added their own tweaks and extensions, resulting in many SQL dialects. If you store your data in a relational database, SQL gives you some portability. Still, dialect and operational differences can make it difficult to move your data to another type of database. There are two main categories of SQL statements: DDL (data definition language) Handles creation, deletion, constraints, and permissions for tables, databases, and uses DML (data manipulation language) Handles data insertions, selects, updates, and deletions For more information, please search by yourself. —————————————— Relational Databases(3) DB-API An application programming interface (API) is a set of functions that you can call to get access to some service. DB-API is Python’s standard API for accessing relational databases. Using it, you can write a single program that works with multiple kinds of relational databases instead of writing a separate program for each one. It’s similar to Java’s JDBC or Perl’s dbi. Its main functions are the following: connect() Make a connection to the database; this can include arguments such as username, password, server address, and others. cursor() Create a cursor object to manage queries. execute() and executemany() Run one or more SQL commands against the database. fetchone(), fetchmany(), and fetchall() Get the results from execute. The Python database modules in the coming sections conform to DB-API, often with extensions and some differences in details. —————————————— Relational Databases(4) SQLite SQLite is a good, light, open source relational database. It’s implemented as a standard Python library, and stores databases in normal files. These files are portable across machines and operating systems, making SQLite a very portable solution for simple relational database applications. It isn’t as full-featured as MySQL or PostgreSQL, but it does support SQL, and it manages multiple simultaneous users. Web browsers, smart phones, and other applications use SQLite as an embedded database. You begin with a connect() to the local SQLite database file that you want to use or create. This file is the equivalent of the directory-like database that parents tables in other servers. The special string ':memory:' creates the database in memory only; this is fast and useful for testing but will lose data when your program terminates or if your computer goes down. For the next example, let’s make a database called enterprise.db and the table zoo to manage our thriving roadside petting zoo business. The table columns are as follows: critter A variable length string, and our primary key count An integer count of our current inventory for this animal damages The dollar amount of our current losses from animal-human interactions >>> import sqlite3 >>> conn = sqlite3.connect('enterprise.db') >>> curs = conn.cursor() >>> curs.execute('''CREATE TABLE zoo (critter VARCHAR(20) PRIMARY KEY, count INT, damages FLOAT)''') Python’s triple quotes are handy when creating long strings such as SQL queries. Now, add some animals to the zoo: >>> curs.execute('INSERT INTO zoo VALUES("duck", 5, 0.0)') >>> curs.execute('INSERT INTO zoo VALUES("bear", 2, 1000.0)') There’s a safer way to insert data, using a placeholder: >>> ins = 'INSERT INTO zoo (critter, count, damages) VALUES(?, ?, ?)' >>> curs.execute(ins, ('weasel', 1, 2000.0)) This time, we used three question marks in the SQL to indicate that we plan to insert three values, and then pass those three values as a list to the execute() function. Placeholders handle tedious details such as quoting. They protect you against SQL injection—a kind of external attack that is common on the Web that inserts malicious SQL commands into the system. Now, let’s see if we can get all our animals out again: >>> curs.execute('SELECT * FROM zoo') >>> rows = curs.fetchall() >>> print(rows) [('duck', 5, 0.0), ('bear', 2, 1000.0), ('weasel', 1, 2000.0)] Let’s get them again, but ordered by their counts: >>> curs.execute('SELECT * from zoo ORDER BY count') >>> curs.fetchall() [('weasel', 1, 2000.0), ('bear', 2, 1000.0), ('duck', 5, 0.0)] Hey, we wanted them in descending order: >>> curs.execute('SELECT * from zoo ORDER BY count DESC') >>> curs.fetchall() [('duck', 5, 0.0), ('bear', 2, 1000.0), ('weasel', 1, 2000.0)] Which type of animal is costing us the most? >>> curs.execute('''SELECT * FROM zoo WHERE ... damages = (SELECT MAX(damages) FROM zoo)''') >>> curs.fetchall() [('weasel', 1, 2000.0)] You would have thought it was the bears. It’s always best to check the actual data. Before we leave SQLite, we need to clean up. If we opened a connection and a cursor, we need to close them when we’re done: >>> curs.close() >>> conn.close() —————————————— Relational Databases(5) MySQL MySQL is a very popular open source relational database. Unlike SQLite, it’s an actual server, so clients can access it from different devices across the network. MysqlDB has been the most popular MySQL driver, but it has not yet been ported to Python 3. PostgreSQL PostgreSQL is a full-featured open source relational database, in many ways more advanced than MySQL. The most popular driver is psycopg2, but its installation requires the PostgreSQL client libraries. —————————————— #排列2,6,4,8,10,12,89,68,45,37 num_str = '2,6,4,8,10,12,89,68,45,37' raw_items = [int(num) for num in num_str.split(',')] print('Raw items:', str(raw_items)) ascending_items = sorted(raw_items) print('Ascending items:', str(ascending_items)) —————————————— Relational Databases(6) SQLAlchemy SQL is not quite the same for all relational databases, and DB-API takes you only so far. Each database implements a particular dialect reflecting its features and philosophy. Many libraries try to bridge these differences in one way or another. The most popular cross-database Python library is SQLAlchemy. It isn’t in the standard library, but it’s well known and used by many people. You can install it on your system by using this command: $ pip install sqlalchemy You can use SQLAlchemy on several levels: The lowest level handles database connection pools, executing SQL commands, and returning results. This is closest to the DB-API. Next up is the SQL expression language, a Pythonic SQL builder. Highest is the ORM (Object Relational Model) layer, which uses the SQL Expression Language and binds application code with relational data structures. As we go along, you’ll understand what the terms mean in those levels. SQLAlchemy works with the database drivers documented in the previous sections. You don’t need to import the driver; the initial connection string you provide to SQLAlchemy will determine it. That string looks like this: dialect + driver :// user : password @ host : port / dbname The values you put in this string are as follows: dialect The database type driver The particular driver you want to use for that database user and password Your database authentication strings host and port The database server’s location (: port is only needed if it’s not the standard one for this server) dbname The database to initially connect to on the server —————————————— Relational Databases(7) The engine layer First, we’ll try the lowest level of SQLAlchemy, which does little more than the base DB-API functions. Let’s try it with SQLite, which is already built into Python. The connection string for SQLite skips the host, port, user, and password. The dbname informs SQLite as to what file to use to store your database. If you omit the dbname, SQLite builds a database in memory. If the dbname starts with a slash (/), it’s an absolute filename on your computer (as in Linux and OS X; for example, C:\\ on Windows). Otherwise, it’s relative to your current directory. The following segments are all part of one program, separated here for explanation. To begin, you need to import what we need. The following is an example of an import alias, which lets us use the string sa to refer to SQLAlchemy methods. I do this mainly because sa is a lot easier to type than sqlalchemy: >>> import sqlalchemy as sa Connect to the database and create the storage for it in memory (the argument string 'sqlite:///:memory:' also works): >>> conn = sa.create_engine('sqlite://') Create a database table called zoo that comprises three columns: >>> conn.execute('''CREATE TABLE zoo ... (critter VARCHAR(20) PRIMARY KEY, ... count INT, ... damages FLOAT)''') Running conn.execute() returns a SQLAlchemy object called a ResultProxy. You’ll soon see what to do with it. By the way, if you’ve never made a database table before, congratulations. Check that one off your bucket list. Now, insert three sets of data into your new empty table: >>> ins = 'INSERT INTO zoo (critter, count, damages) VALUES (?, ?, ?)' >>> conn.execute(ins, 'duck', 10, 0.0) >>> conn.execute(ins, 'bear', 2, 1000.0) >>> conn.execute(ins, 'weasel', 1, 2000.0) Next, ask the database for everything that we just put in: >>> rows = conn.execute('SELECT * FROM zoo') In SQLAlchemy, rows is not a list; it’s that special ResultProxy thing that we can’t print directly: >>> print(rows) However, you can iterate over it like a list, so we can get a row at a time: >>> for row in rows: ... print(row) ... ('duck', 10, 0.0) ('bear', 2, 1000.0) ('weasel', 1, 2000.0) That was almost the same as the SQLite DB-API example that you saw earlier. The one advantage is that we didn’t need to import the database driver at the top; SQLAlchemy figured that out from the connection string. Just changing the connection string would make this code portable to another type of database. Another plus is SQLAlchemy’s connection pooling, which you can read about at its documentation site. —————————————— Relational Databases(8) The SQL Expression Language The next level up is SQLAlchemy’s SQL Expression Language. It introduces functions to create the SQL for various operations. The Expression Language handles more of the SQL dialect differences than the lower-level engine layer does. It can be a handy middle-ground approach for relational database applications. Here’s how to create and populate the zoo table. Again, these are successive fragments of a single program. The import and connection are the same as before: >>> import sqlalchemy as sa >>> conn = sa.create_engine('sqlite://') To define the zoo table, we’ll begin using some of the Expression Language instead of SQL: >>> meta = sa.MetaData() >>> zoo = sa.Table('zoo', meta, ... sa.Column('critter', sa.String, primary_key=True), ... sa.Column('count', sa.Integer), ... sa.Column('damages', sa.Float) ... ) >>> meta.create_all(conn) Check out the parentheses in that multiline call in the preceding example. The structure of the Table() method matches the structure of the table. Just as our table contains three columns, there are three calls to Column() inside the parentheses of the Table() method call. Meanwhile, zoo is some magic object that bridges the SQL database world and the Python data structure world. Insert the data with more Expression Language functions: ... conn.execute(zoo.insert(('bear', 2, 1000.0))) >>> conn.execute(zoo.insert(('weasel', 1, 2000.0))) >>> conn.execute(zoo.insert(('duck', 10, 0))) Next, create the SELECT statement (zoo.select() selects everything from the table represented by the zoo object, such as SELECT * FROM zoo would do in plain SQL): >>> result = conn.execute(zoo.select()) Finally, get the results: >>> rows = result.fetchall() >>> print(rows) [('bear', 2, 1000.0), ('weasel', 1, 2000.0), ('duck', 10, 0.0)] —————————————— Mapper 制图人 declarative 宣言的;公布的 —————————————— Relational Databases(9) The Object-Relational Mapper(1) In the last section, the zoo object was a mid-level connection between SQL and Python. At the top layer of SQLAlchemy, the Object-Relational Mapper (ORM) uses the SQL Expression Language but tries to make the actual database mechanisms invisible. You define classes, and the ORM handles how to get their data in and out of the database. The basic idea behind that complicated phrase, “object-relational mapper,” is that you can refer to objects in your code, and thus stay close to the way Python likes to operate, while still using a relational database. We’ll define a Zoo class and hook it into the ORM. This time, we’ll make SQLite use the file zoo.db so that we can confirm that the ORM worked. As in the previous two sections, the snippets that follow are actually one program separated by explanations. Don’t worry if you don’t understand some if it. The SQLAlchemy documentation has all the details, and this stuff can get complex. I just want you to get an idea of how much work it is to do this, so that you can decide which of the approaches discussed in this chapter suits you. The initial import is the same, but this time we need another something also: >>> import sqlalchemy as sa >>> from sqlalchemy.ext.declarative import declarative_base Here, we make the connection: >>> conn = sa.create_engine('sqlite:///zoo.db') Now, we get into SQLAlchemy’s ORM. We define the Zoo class and associate its attributes with table columns: >>> Base = declarative_base() >>> class Zoo(Base): ... __tablename__ = 'zoo' ... critter = sa.Column('critter', sa.String, primary_key=True) ... count = sa.Column('count', sa.Integer) ... damages = sa.Column('damages', sa.Float) ... def __init__(self, critter, count, damages): ... self.critter = critter ... self.count = count ... self.damages = damages ... def __repr__(self): ... return "".format(self.critter, self.count, self.damages) The following line magically creates the database and table: >>> Base.metadata.create_all(conn) You can then insert data by creating Python objects. The ORM manages these internally: >>> first = Zoo('duck', 10, 0.0) >>> second = Zoo('bear', 2, 1000.0) >>> third = Zoo('weasel', 1, 2000.0) >>> first —————————————— Relational Databases(9) The Object-Relational Mapper(2) Next, we get the ORM to take us to SQL land. We create a session to talk to the database: >>> from sqlalchemy.orm import sessionmaker >>> Session = sessionmaker(bind=conn) >>> session = Session() Within the session, we write the three objects that we created to the database. The add() function adds one object, and add_all() adds a list: >>> session.add(first) >>> session.add_all([second, third]) Finally, we need to force everything to complete: >>> session.commit() Did it work? Well, it created a zoo.db file in the current directory. You can use the command-line sqlite3 program to check it: $ sqlite3 zoo.db SQLite version 3.6.12 Enter ".help" for instructions Enter SQL statements terminated with a ";" sqlite> .tables zoo sqlite> select * from zoo; duck|10|0.0 bear|2|1000.0 weasel|1|2000.0 The purpose of this section was to show what an ORM is and how it works at a high level. The author of SQLAlchemy has written a full tutorial. After reading this, decide which of the following levels would best fit your needs: Plain DB-API, as in the earlier SQLite section The SQLAlchemy engine room The SQLAlchemy Expression Language The SQLAlchemy ORM It seems like a natural choice to use an ORM to avoid the complexities of SQL. Should you use one? Some people think ORMs should be avoided, but others think the criticism is overdone. Whoever’s right, an ORM is an abstraction, and all abstractions break down at some point; they’re leaky. When your ORM doesn’t do what you want, you must figure out both how it works and how to fix it in SQL. To borrow an Internet meme: Some people, when confronted with a problem, think, “I know, I’ll use an ORM.” Now they have two problems. Use ORMs sparingly, and mostly for simple applications. If the application is that simple, maybe you can just use straight SQL (or the SQL Expression Language), anyhow. Or, you can try something simpler such as dataset. It’s built on SQLAlchemy and provides a simple ORM for SQL, JSON, and CSV storage. —————————————— NoSQL Data Stores Some databases are not relational and don’t support SQL. These were written to handle very large data sets, allow more flexible data definitions, or support custom data operations. They’ve been collectively labeled NoSQL (formerly meaning no SQL; now the less confrontational not only SQL). —————————————— The dbm Family The dbm formats were around long before NoSQL was coined. They’re key-value stores, often embedded in applications such as web browsers to maintain various settings. A dbm database is like a Python dictionary in the following ways: You can assign a value to a key, and it’s automatically saved to the database on disk. You can get a value from a key. The following is a quick example. The second argument to the following open() method is 'r' to read, 'w' to write, and 'c' for both, creating the file if it doesn’t exist: >>> import dbm >>> db = dbm.open('definitions', 'c') To create key-value pairs, just assign a value to a key just as you would a dictionary: >>> db['mustard'] = 'yellow' >>> db['ketchup'] = 'red' >>> db['pesto'] = 'green' Let’s pause and check what we have so far: >>> len(db) 3 >>> db['pesto'] b'green' Now close, then reopen to see if it actually saved what we gave it: >>> db.close() >>> db = dbm.open('definitions', 'r') >>> db['mustard'] b'yellow' Keys and values are stored as bytes. You cannot iterate over the database object db, but you can get the number of keys by using len(). Note that get() and setdefault() work as they do for dictionaries. —————————————— Memcached memcached is a fast in-memory key-value cache server. It’s often put in front of a database, or used to store web server session data. You can download versions for Linux and OS X, and for Windows. If you want to try out this section, you’ll need a memcached server and Python driver. There are many Python drivers; one that works with Python 3 is python3-memcached, which you can install by using this command: $ pip install python-memcached To use it, connect to a memcached server, after which you can do the following: Set and get values for keys Increment or decrement a value Delete a key Data is not persistent, and data that you wrote earlier might disappear. This is inherent in memcached, being that it’s a cache server. It avoids running out of memory by discarding old data. You can connect to multiple memcached servers at the same time. In this next example, we’re just talking to one on the same computer: >>> import memcache >>> db = memcache.Client(['127.0.0.1:11211']) >>> db.set('marco', 'polo') True >>> db.get('marco') 'polo' >>> db.set('ducks', 0) True >>> db.get('ducks') 0 >>> db.incr('ducks', 2) 2 >>> db.get('ducks') 2 —————————————— Redis is an open source (BSD licensed), in-memory data structure store, used as database, cache and message broker. It supports data structures such as strings, hashes, lists, sets, sorted sets with range queries, bitmaps, hyperloglogs and geospatial indexes with radius queries. —————————————— Redis(0) Redis is a data structure server. Like memcached, all of the data in a Redis server should fit in memory (although there is now an option to save the data to disk). Unlike memcached, Redis can do the following: Save data to disk for reliability and restarts Keep old data Provide more data structures than simple strings The Redis data types are a close match to Python’s, and a Redis server can be a useful intermediary for one or more Python applications to share data. I’ve found it so useful that it’s worth a little extra coverage here. The Python driver redis-py has its source code and tests on GitHub, as well as online documentation. You can install it by using this command: $ pip install redis The Redis server itself has good documentation. If you install and start the Redis server on your local computer (with the network nickname localhost), you can try the programs in the following sections. —————————————— How to learn English: Google translation. NCE and VOA Special English. Yuantiku and self correction. https://textranch.com/grammar-checker/ —————————————— Redis(1) Strings A key with a single value is a Redis string. Simple Python data types are automatically converted. Connect to a Redis server at some host (default is localhost) and port (default is 6379): >>> import redis >>> conn = redis.Redis() redis.Redis('localhost') or redis.Redis('localhost', 6379) would have given the same result. List all keys (none so far): >>> conn.keys('*') [] Set a simple string (key 'secret'), integer (key 'carats'), and float (key 'fever'): >>> conn.set('secret', 'ni!') True >>> conn.set('carats', 24) True >>> conn.set('fever', '101.5') True Get the values back by key: >>> conn.get('secret') b'ni!' >>> conn.get('carats') b'24' >>> conn.get('fever') b'101.5' Here, the setnx() method sets a value only if the key does not exist: >>> conn.setnx('secret', 'icky-icky-icky-ptang-zoop-boing!') False It failed because we had already defined 'secret': >>> conn.get('secret') b'ni!' The getset() method returns the old value and sets it to a new one at the same time: >>> conn.getset('secret', 'icky-icky-icky-ptang-zoop-boing!') b'ni!' Let’s not get too far ahead of ourselves. Did it work? >>> conn.get('secret') b'icky-icky-icky-ptang-zoop-boing!' Now, get a substring by using getrange() (as in Python, offset 0=start, -1=end): >>> conn.getrange('secret', -6, -1) b'boing!' Replace a substring by using setrange() (using a zero-based offset): >>> conn.setrange('secret', 0, 'ICKY') 32 >>> conn.get('secret') b'ICKY-icky-icky-ptang-zoop-boing!' Next, set multiple keys at once by using mset(): >>> conn.mset({'pie': 'cherry', 'cordial': 'sherry'}) True Get more than one value at once by using mget(): >>> conn.mget(['fever', 'carats']) [b'101.5', b'24'] Delete a key by using delete(): >>> conn.delete('fever') True Increment by using the incr() or incrbyfloat() commands, and decrement with decr(): >>> conn.incr('carats') 25 >>> conn.incr('carats', 10) 35 >>> conn.decr('carats') 34 >>> conn.decr('carats', 15) 19 >>> conn.set('fever', '101.5') True >>> conn.incrbyfloat('fever') 102.5 >>> conn.incrbyfloat('fever', 0.5) 103.0 There’s no decrbyfloat(). Use a negative increment to reduce the fever: >>> conn.incrbyfloat('fever', -2.0) 101.0 —————————————— Redis(2) Lists Redis lists can contain only strings. The list is created when you do your first insertion. Insert at the beginning by using lpush(): >>> conn.lpush('zoo', 'bear') 1 Insert more than one item at the beginning: >>> conn.lpush('zoo', 'alligator', 'duck') 3 Insert before or after a value by using linsert(): >>> conn.linsert('zoo', 'before', 'bear', 'beaver') 4 >>> conn.linsert('zoo', 'after', 'bear', 'cassowary') 5 Insert at an offset by using lset() (the list must exist already): >>> conn.lset('zoo', 2, 'marmoset') True Insert at the end by using rpush(): >>> conn.rpush('zoo', 'yak') 6 Get the value at an offset by using lindex(): >>> conn.lindex('zoo', 3) b'bear' Get the values in an offset range by using lrange() (0 to -1 for all): >>> conn.lrange('zoo', 0, 2) [b'duck', b'alligator', b'marmoset'] Trim the list with ltrim(), keeping only those in a range of offsets: >>> conn.ltrim('zoo', 1, 4) True Get a range of values (use 0 to -1 for all) by using lrange(): >>> conn.lrange('zoo', 0, -1) [b'alligator', b'marmoset', b'bear', b'cassowary'] Chapter 10 shows you how you can use Redis lists and publish-subscribe to implement job queues. —————————————— hash A Hash is a dictionary-like collection of unique keys and their values. —————————————— Redis(3) Hashes Redis hashes are similar to Python dictionaries but can contain only strings. Thus, you can go only one level deep, not make deep-nested structures. Here are examples that create and play with a Redis hash called song: Set the fields do and re in hash song at once by using hmset(): >>> conn.hmset('song', {'do': 'a deer', 're': 'about a deer'}) True Set a single field value in a hash by using hset(): >>> conn.hset('song', 'mi', 'a note to follow re') 1 Get one field’s value by using hget(): >>> conn.hget('song', 'mi') b'a note to follow re' Get multiple field values by using hmget(): >>> conn.hmget('song', 're', 'do') [b'about a deer', b'a deer'] Get all field keys for the hash by using hkeys(): >>> conn.hkeys('song') [b'do', b're', b'mi'] Get all field values for the hash by using hvals(): >>> conn.hvals('song') [b'a deer', b'about a deer', b'a note to follow re'] Get the number of fields in the hash by using hlen(): >>> conn.hlen('song') 3 Get all field keys and values in the hash by using hgetall(): >>> conn.hgetall('song') {b'do': b'a deer', b're': b'about a deer', b'mi': b'a note to follow re'} Set a field if its key doesn’t exist by using hsetnx(): >>> conn.hsetnx('song', 'fa', 'a note that rhymes with la') 1 —————————————— Redis(4) Sets Redis sets are similar to Python sets, as you can see in the series of examples that follow. Add one or more values to a set: >>> conn.sadd('zoo', 'duck', 'goat', 'turkey') 3 Get the number of values from the set: >>> conn.scard('zoo') 3 Get all the set’s values: >>> conn.smembers('zoo') {b'duck', b'goat', b'turkey'} Remove a value from the set: >>> conn.srem('zoo', 'turkey') True Let’s make a second set to show some set operations: >>> conn.sadd('better_zoo', 'tiger', 'wolf', 'duck') 0 Intersect (get the common members of) the zoo and better_zoo sets: >>> conn.sinter('zoo', 'better_zoo') {b'duck'} Get the intersection of zoo and better_zoo, and store the result in the set fowl_zoo: >>> conn.sinterstore('fowl_zoo', 'zoo', 'better_zoo') 1 Who’s in there? >>> conn.smembers('fowl_zoo') {b'duck'} Get the union (all members) of zoo and better_zoo: >>> conn.sunion('zoo', 'better_zoo') {b'duck', b'goat', b'wolf', b'tiger'} Store that union result in the set fabulous_zoo: >>> conn.sunionstore('fabulous_zoo', 'zoo', 'better_zoo') 4 >>> conn.smembers('fabulous_zoo') {b'duck', b'goat', b'wolf', b'tiger'} What does zoo have that better_zoo doesn’t? Use sdiff() to get the set difference, and sdiffstore() to save it in the zoo_sale set: >>> conn.sdiff('zoo', 'better_zoo') {b'goat'} >>> conn.sdiffstore('zoo_sale', 'zoo', 'better_zoo') 1 >>> conn.smembers('zoo_sale') {b'goat'} —————————————— Redis(5) Sorted sets One of the most versatile Redis data types is the sorted set, or zset. It’s a set of unique values, but each value has an associated floating point score. You can access each item by its value or score. Sorted sets have many uses: Leader boards Secondary indexes Timeseries, using timestamps as scores We’ll show the last use case, tracking user logins via timestamps. We’re using the Unix epoch value (more on this in Chapter 10) that’s returned by the Python time() function: >>> import time >>> now = time.time() >>> now 1361857057.576483 Let’s add our first guest, looking nervous: >>> conn.zadd('logins', 'smeagol', now) 1 Five minutes later, another guest: >>> conn.zadd('logins', 'sauron', now+(5*60)) 1 Two hours later: >>> conn.zadd('logins', 'bilbo', now+(2*60*60)) 1 One day later, not hasty: >>> conn.zadd('logins', 'treebeard', now+(24*60*60)) 1 In what order did bilbo arrive? >>> conn.zrank('logins', 'bilbo') 2 When was that? >>> conn.zscore('logins', 'bilbo') 1361864257.576483 Let’s see everyone in login order: >>> conn.zrange('logins', 0, -1) [b'smeagol', b'sauron', b'bilbo', b'treebeard'] With their times, please: >>> conn.zrange('logins', 0, -1, withscores=True) [(b'smeagol', 1361857057.576483), (b'sauron', 1361857357.576483), (b'bilbo', 1361864257.576483), (b'treebeard', 1361943457.576483)] —————————————— Redis(6) Bits This is a very space-efficient and fast way to deal with large sets of numbers. Suppose that you have a website with registered users. You’d like to track how often people log in, how many users visit on a particular day, how often the same user visits on following days, and so on. You could use Redis sets, but if you’ve assigned increasing numeric user IDs, bits are more compact and faster. Let’s begin by creating a bitset for each day. For this test, we’ll just use three days and a few user IDs: >>> days = ['2013-02-25', '2013-02-26', '2013-02-27'] >>> big_spender = 1089 >>> tire_kicker = 40459 >>> late_joiner = 550212 Each date is a separate key. Set the bit for a particular user ID for that date. For example, on the first date (2013-02-25), we had visits from big_spender (ID 1089) and tire_kicker (ID 40459): >>> conn.setbit(days[0], big_spender, 1) 0 >>> conn.setbit(days[0], tire_kicker, 1) 0 The next day, big_spender came back: >>> conn.setbit(days[1], big_spender, 1) 0 The next day had yet another visit from our friend, big_spender, and a new person whom we’re calling late_joiner: >>> conn.setbit(days[2], big_spender, 1) 0 >>> conn.setbit(days[2], late_joiner, 1) 0 Let’s get the daily visitor count for these three days: >>> for day in days: ... conn.bitcount(day) ... 2 1 2 Did a particular user visit on a particular day? >>> conn.getbit(days[1], tire_kicker) 0 So, tire_kicker did not visit on the second day. How many users visited every day? >>> conn.bitop('and', 'everyday', *days) 68777 >>> conn.bitcount('everyday') 1 I’ll give you three guesses who it was: >>> conn.getbit('everyday', big_spender) 1 Finally, what was the number of total unique users in these three days? >>> conn.bitop('or', 'alldays', *days) 68777 >>> conn.bitcount('alldays') 3 —————————————— Redis(7) Caches and expiration All Redis keys have a time-to-live, or expiration date. By default, this is forever. We can use the expire() function to instruct Redis how long to keep the key. As is demonstrated here, the value is a number of seconds: >>> import time >>> key = 'now you see it' >>> conn.set(key, 'but not for long') True >>> conn.expire(key, 5) True >>> conn.ttl(key) 5 >>> conn.get(key) b'but not for long' >>> time.sleep(6) >>> conn.get(key) >>> The expireat() command expires a key at a given epoch time. Key expiration is useful to keep caches fresh and to limit login sessions. —————————————— Things to Do 8.1. Assign the string 'This is a test of the emergency text system' to the variable test1, and write test1 to a file called test.txt. test1 = 'This is a test of the emergency text system' fout = open('/sdcard/yuanfudao/test.txt', 'wt') fout.write(test1) fout.close() —————————————— 8.2. Open the file test.txt and read its contents into the string test2. Are test1 and test2 the same? test1 = 'This is a test of the emergency text system' fout = open('/sdcard/yuanfudao/test.txt', 'rt') test2 = fout.read() print(test1==test2) —————————————— 8.3. Save these text lines to a file called books.csv. Notice that if the fields are separated by commas, you need to surround a field with quotes if it contains a comma. author,book J R R Tolkien,The Hobbit Lynne Truss,"Eats, Shoots & Leaves" text = '''author,book J R R Tolkien,The Hobbit Lynne Truss,"Eats, Shoots & Leaves" ''' with open('/sdcard/yuanfudao/test.csv', 'wt') as outfile: outfile.write(text) —————————————— 8.4. Use the csv module and its DictReader method to read books.csv to the variable books. Print the values in books. Did DictReader handle the quotes and commas in the second book’s title? import csv with open('/sdcard/yuanfudao/books.csv', 'rt') as fin: cin = csv.DictReader(fin) books = [row for row in cin] print(books) —————————————— 8.5. Create a CSV file called books.csv by using these lines: title,author,year The Weirdstone of Brisingamen,Alan Garner,1960 Perdido Street Station,China Miéville,2000 Thud!,Terry Pratchett,2005 The Spellman Files,Lisa Lutz,2007 Small Gods,Terry Pratchett,1992 text = '''title,author,year The Weirdstone of Brisingamen,Alan Garner,1960 Perdido Street Station,China Miéville,2000 Thud!,Terry Pratchett,2005 The Spellman Files,Lisa Lutz,2007 Small Gods,Terry Pratchett,1992''' with open('/sdcard/yuanfudao/books.csv', 'wt') as outfile: outfile.write(text) —————————————— 8.6. Use the sqlite3 module to create a SQLite database called books.db and a table called books with these fields: title (text), author (text), and year (integer). import sqlite3 conn = sqlite3.connect('/sdcard/yuanfudao/books.db') curs = conn.cursor() curs.execute('''CREATE TABLE books(title TEXT,author TEXT,year INT)''') conn.commit() —————————————— 8.7. Read books.csv and insert its data into the book table. import csv import sqlite3 conn = sqlite3.connect('/sdcard/yuanfudao/books.db') curs = conn.cursor() ins_str = 'insert into books values(?, ?, ?)' with open('/sdcard/yuanfudao/books.csv', 'rt') as infile: books = csv.DictReader(infile) for book in books: curs.execute(ins_str, (book['title'], book['author'], book['year'])) conn.commit() —————————————— alphabetical 字母顺序排列 —————————————— 8.8. Select and print the title column from the book table in alphabetical order. import sqlite3 db = sqlite3.connect('/sdcard/yuanfudao/books.db') sql = 'SELECT title FROM books ORDER BY title asc' for row in db.execute(sql): print(row) —————————————— publication 出版 —————————————— 8.9. Select and print all columns from the book table in order of publication. >>> for row in db.execute('select * from book order by year'): ... print(row) ... ('The Weirdstone of Brisingamen', 'Alan Garner', 1960) ('Small Gods', 'Terry Pratchett', 1992) ('Perdido Street Station', 'China Miéville', 2000) ('Thud!', 'Terry Pratchett', 2005) ('The Spellman Files', 'Lisa Lutz', 2007) To print all the fields in each row, just separate with a comma and space: >>> for row in db.execute('select * from book order by year'): ... print(*row, sep=', ') ... The Weirdstone of Brisingamen, Alan Garner, 1960 Small Gods, Terry Pratchett, 1992 Perdido Street Station, China Miéville, 2000 Thud!, Terry Pratchett, 2005 The Spellman Files, Lisa Lutz, 2007 —————————————— Look at it. It's doesn't matter whether or not you remember it because you'll find it in this book when you really want to use it. —————————————— 8.10. Use the sqlalchemy module to connect to the sqlite3 database books.db that you just made in exercise 8.6. As in 8.8, select and print the title column from the book table in alphabetical order. >>> import sqlalchemy >>> conn = sqlalchemy.create_engine('sqlite:///books.db') >>> sql = 'select title from book order by title asc' >>> rows = conn.execute(sql) >>> for row in rows: ... print(row) ... ('Perdido Street Station',) ('Small Gods',) ('The Spellman Files',) ('The Weirdstone of Brisingamen',) ('Thud!',) —————————————— 8.11. Install the Redis server (see Appendix D) and the Python redis library (pip install redis) on your machine. Create a Redis hash called test with the fields count (1) and name ('Fester Bestertester'). Print all the fields for test. >>> import redis >>> conn = redis.Redis() >>> conn.delete('test') 1 >>> conn.hmset('test', {'count': 1, 'name': 'Fester Bestertester'}) True >>> conn.hgetall('test') {b'name': b'Fester Bestertester', b'count': b'1'} —————————————— 8.12. Increment the count field of test and print it. >>> conn.hincrby('test', 'count', 3) 4 >>> conn.hget('test', 'count') b'4' —————————————— Untangled 解决 skeleton 骨架 Straddling 横跨 CERN 欧洲核子研究中心 lair 巢穴 villain 恶棍 quest 追求 domination 统治 prodigious 惊人的 amounts 量 circulated 散布 proposal 建议 disseminate 传播 distilled 蒸馏;提取精华 Hypertext 超文本 Protocol 协议 specification 规范 Markup 标记 presentation 陈述 Locator 定位器 term 术语 hiatus 中断 awareness 意识 Illinois 伊利诺斯州 released 发布 Mosaic 马赛克 Macintosh 麦金塔电脑 NCSA 美国国家计算机安全协会 noncommercial 非商业性 founded 创立 frenzy 狂暴 occurring 发生 explosive 爆炸 —————————————— Chapter 9. The Web, Untangled Straddling the French-Swiss border is CERN—a particle physics research institute that would seem a good lair for a Bond villain. Luckily, its quest is not world domination but to understand how the universe works. This has always led CERN to generate prodigious amounts of data, challenging physicists and computer scientists just to keep up. In 1989, the English scientist Tim Berners-Lee first circulated a proposal to help disseminate information within CERN and the research community. He called it the World Wide Web, and soon distilled its design into three simple ideas: HTTP (Hypertext Transfer Protocol) A specification for web clients and servers to interchange requests and responses HTML (Hypertext Markup Language) A presentation format for results URL (Uniform Resource Locator) A way to uniquely represent a server and a resource on that server In its simplest usage, a web client (I think Berners-Lee was the first to use the term browser) connected to a web server with HTTP, requested a URL, and received HTML. He wrote the first web browser and server on a NeXT computer, invented by a short-lived company Steve Jobs founded during his hiatus from Apple Computer. Web awareness really expanded in 1993, when a group of students at the University of Illinois released the Mosaic web browser (for Windows, the Macintosh, and Unix) and NCSA httpd server. When I downloaded these and started building sites, I had no idea that the Web and the Internet would soon become part of everyday life. At the time, the Internet was still officially noncommercial; there were about 500 known web servers in the world. By the end of 1994, the number of web servers had grown to 10,000. The Internet was opened to commercial use, and the authors of Mosaic founded Netscape to write commercial web software. Netscape went public as part of the Internet frenzy that was occurring at the time, and the Web’s explosive growth has never stopped. Almost every computer language has been used to write web clients and web servers. The dynamic languages Perl, PHP, and Ruby have been especially popular. In this chapter, I’ll show why Python is a particularly good language for web work at every level: Clients, to access remote sites Servers, to provide data for websites and web APIs Web APIs and services, to interchange data in other ways than viewable web pages And while we’re at it, we’ll build an actual interactive website in the exercises at the end of this chapter. —————————————— plumbing 管道 Transmission 传输 protocols 协议 initiate 发起 intended 目的 stateless 无状态的 simplifies 简化了 cart 车 Authentication 身份验证 —————————————— Web Clients The low-level network plumbing of the Internet is called Transmission Control Protocol/Internet Protocol, or more commonly, simply TCP/IP (TCP/IP goes into more detail about this). It moves bytes among computers, but doesn’t care about what those bytes mean. That’s the job of higher-level protocols—syntax definitions for specific purposes. HTTP is the standard protocol for web data interchange. The Web is a client-server system. The client makes a request to a server: it opens a TCP/IP connection, sends the URL and other information via HTTP, and receives a response. The format of the response is also defined by HTTP. It includes the status of the request, and (if the request succeeded) the response’s data and format. The most well-known web client is a web browser. It can make HTTP requests in a number of ways. You might initiate a request manually by typing a URL into the location bar or clicking on a link in a web page. Very often, the data returned is used to display a website—HTML documents, JavaScript files, CSS files, and images—but it can be any type of data, not just that intended for display. An important aspect of HTTP is that it’s stateless. Each HTTP connection that you make is independent of all the others. This simplifies basic web operations but complicates others. Here are just a few samples of the challenges: Caching Remote content that doesn’t change should be saved by the web client and used to avoid downloading from the server again. Sessions A shopping website should remember the contents of your shopping cart. Authentication Sites that require your username and password should remember them while you’re logged in. Solutions to statelessness include cookies, in which the server sends the client enough specific information to be able to identify it uniquely when the client sends the cookie back. —————————————— telnet 远程登录 reassuring 让人安心 cue 提示 retrieves 检索 trimmed 修剪 track 跟踪 stranded 被困 —————————————— Test with telnet HTTP is a text-based protocol, so you can actually type it yourself for web testing. The ancient telnet program lets you connect to any server and port and type commands. Let’s ask everyone’s favorite test site, Google, some basic information about its home page. Type this: $ telnet www.google.com 80 If there is a web server on port 80 at google.com (I think that’s a safe bet), telnet will print some reassuring information and then display a final blank line that’s your cue to type something else: Trying 74.125.225.177... Connected to www.google.com. Escape character is '^]'. Now, type an actual HTTP command for telnet to send to the Google web server. The most common HTTP command (the one your browser uses when you type a URL in its location bar) is GET. This retrieves the contents of the specified resource, such as an HTML file, and returns it to the client. For our first test, we’ll use the HTTP command HEAD, which just retrieves some basic information about the resource: HEAD / HTTP/1.1 That HEAD / sends the HTTP HEAD verb (command) to get information about the home page (/). Add an extra carriage return to send a blank line so the remote server knows you’re all done and want a response. You’ll receive a response such as this (we trimmed some of the long lines using … so they wouldn’t stick out of the book): HTTP/1.1 200 OK Date: Sat, 26 Oct 2013 17:05:17 GMT Expires: -1 Cache-Control: private, max-age=0 Content-Type: text/html; charset=ISO-8859-1 Set-Cookie: PREF=ID=962a70e9eb3db9d9:FF=0:TM=1382807117:LM=1382807117:S=y... expires=Mon, 26-Oct-2015 17:05:17 GMT; path=/; domain=.google.com Set-Cookie: NID=67=hTvtVC7dZJmZzGktimbwVbNZxPQnaDijCz716B1L56GM9qvsqqeIGb... expires=Sun, 27-Apr-2014 17:05:17 GMT path=/; domain=.google.com; HttpOnly P3P: CP="This is not a P3P policy! See http://www.google.com/support/accounts... Server: gws X-XSS-Protection: 1; mode=block X-Frame-Options: SAMEORIGIN Alternate-Protocol: 80:quic Transfer-Encoding: chunked These are HTTP response headers and their values. Some, like Date and Content-Type, are required. Others, such as Set-Cookie, are used to track your activity across multiple visits (we’ll talk about state management a little later in this chapter). When you make an HTTP HEAD request, you get back only headers. If you had used the HTTP GET or POST commands, you would also receive data from the home page (a mixture of HTML, CSS, JavaScript, and whatever else Google decided to throw into its home page). I don’t want to leave you stranded in telnet. To close telnet, type the following: q —————————————— scattered 分散 bundle 包;归拢 directory 目录 parse 解析 fortune 财富 chunk 块 peachy 桃色的 conveys 传达了 generic 通用的 whoops 哎呦 gateway 网关 backend 后端 sheer 纯粹的 curiosity 好奇心 Compatible 兼容的 revalidate 重新验证 straightforward 直截了当的 —————————————— Python’s Standard Web Libraries In Python 2, web client and server modules were a bit scattered. One of the Python 3 goals was to bundle these modules into two packages (remember from Chapter 5 that a package is just a directory containing module files): ■http manages all the client-server HTTP details: client does the client-side stuff server helps you write Python web servers cookies and cookiejar manage cookies, which save data between site visits ■urllib runs on top of http: request handles the client request response handles the server response parse cracks the parts of a URL Let’s use the standard library to get something from a website. The URL in the following example returns a random text quote, similar to a fortune cookie: >>> import urllib.request as ur >>> url = 'http://www.iheartquotes.com/api/v1/random' >>> conn = ur.urlopen(url) >>> print(conn) In the official documentation, we find that conn is an HTTPResponse object with a number of methods, and that its read() method will give us data from the web page: >>> data = conn.read() >>> print(data) b'You will be surprised by a loud noise.\r\n\n[codehappy] http://iheartquotes.com/fortune/show/20447\n' This little chunk of Python opened a TCP/IP connection to the remote quote server, made an HTTP request, and received an HTTP response. The response contained more than just the page data (the fortune). One of the most important parts of the response is the HTTP status code: >>> print(conn.status) 200 A 200 means that everything was peachy. There are dozens of HTTP status codes, grouped into five ranges by their first (hundreds) digit: 1xx (information) The server received the request but has some extra information for the client. 2xx (success) It worked; every success code other than 200 conveys extra details. 3xx (redirection) The resource moved, so the response returns the new URL to the client. 4xx (client error) Some problem from the client side, such as the famous 404 (not found). 418 (I’m a teapot) was an April Fool’s joke. 5xx (server error) 500 is the generic whoops; you might see a 502 (bad gateway) if there’s some disconnect between a web server and a backend application server. Web servers can send data back to you in any format they like. It’s usually HTML (and usually some CSS and JavaScript), but in our fortune cookie example it’s plain text. The data format is specified by the HTTP response header value with the name Content-Type, which we also saw in our google.com example: >>> print(conn.getheader('Content-Type')) text/plain That text/plain string is a MIME type, and it means plain old text. The MIME type for HTML, which the google.com example sent, is text/html. I’ll show you more MIME types in this chapter. Out of sheer curiosity, what other HTTP headers were sent back to us? >>> for key, value in conn.getheaders(): ... print(key, value) ... Server nginx Date Sat, 24 Aug 2013 22:48:39 GMT Content-Type text/plain Transfer-Encoding chunked Connection close Etag "8477e32e6d053fcfdd6750f0c9c306d6" X-Ua-Compatible IE=Edge,chrome=1 X-Runtime 0.076496 Cache-Control max-age=0, private, must-revalidate Remember that telnet example a little earlier? Now, our Python library is parsing all those HTTP response headers and providing them in a dictionary. Date and Server seem straightforward; some of the others, less so. It’s helpful to know that HTTP has a set of standard headers such as Content-Type, and many optional ones. —————————————— API Application Programming Interface browse 浏览 wordy 冗长的 —————————————— Beyond the Standard Library: Requests At the beginning of Chapter 1, there’s a program that accesses a YouTube API by using the standard libraries urllib.request and json. Following that example is a version that uses the third-party module requests. The requests version is shorter and easier to understand. For most purposes, I think web client development with requests is easier. You can browse the documentation (which is pretty good) for full details. I’ll show the basics of requests in this section and use it throughout this book for web client tasks. First, install the requests library into your Python environment. From a terminal window (Windows users, type cmd to make one), type the following command to make the Python package installer pip download the latest version of the requests package and install it: $ pip install requests If you have trouble, read Appendix D for details on how to install and use pip. Let’s redo our previous call to the quotes service with requests: >>> import requests >>> url = 'http://www.iheartquotes.com/api/v1/random' >>> resp = requests.get(url) >>> resp >>> print(resp.text) I know that there are people who do not love their fellow man, and I hate people like that! -- Tom Lehrer, Satirist and Professor [codehappy] http://iheartquotes.com/fortune/show/21465 It isn’t that different from using urllib.request.urlopen, but I think it feels a little less wordy. —————————————— navigate 导航 templates 模板 inclusions 夹杂物 —————————————— Web Servers Web developers have found Python to be an excellent language for writing web servers and server-side programs. This has led to such a variety of Python-based web frameworks that it can be hard to navigate among them and make choices—not to mention deciding what deserves to go into a book. A web framework provides features with which you can build websites, so it does more than a simple web (HTTP) server. You’ll see features such as routing (URL to server function), templates (HTM with dynamic inclusions), debugging, and more. I’m not going to cover all of the frameworks here—just those that I’ve found to be relatively simple to use and suitable for real websites. I’ll also show how to run the dynamic parts of a website with Python and other parts with a traditional web server. —————————————— Serving 服务 plumbing 管道 synonyms 同义词 interpret 解释 parameters 参数 —————————————— The Simplest Python Web Server You can run a simple web server by typing just one line of Python: $ python -m http.server This implements a bare-bones Python HTTP server. If there are no problems, this will print an initial status message: Serving HTTP on 0.0.0.0 port 8000 ... That 0.0.0.0 means any TCP address, so web clients can access it no matter what address the server has. There’s more low-level details on TCP and other network plumbing for you to read about in Chapter 11. You can now request files, with paths relative to your current directory, and they will be returned. If you type http://localhost:8000 in your web browser, you should see a directory listing there, and the server will print access log lines such as this: 127.0.0.1 - - [20/Feb/2013 22:02:37] "GET / HTTP/1.1" 200 - localhost and 127.0.0.1 are TCP synonyms for your local computer, so this works regardless of whether you’re connected to the Internet. You can interpret this line as follows: ■127.0.0.1 is the client’s IP address ■The first "-" is the remote username, if found ■The second "-" is the login username, if required ■[20/Feb/2013 22:02:37] is the access date and time ■"GET / HTTP/1.1" is the command sent to the web server: The HTTP method (GET) The resource requested (/, the top) The HTTP version (HTTP/1.1) ■The final 200 is the HTTP status code returned by the web server Click any file. If your browser can recognize the format (HTML, PNG, GIF, JPEG, and so on) it should display it, and the server will log the request. For instance, if you have the file oreilly.png in your current directory, a request for http://localhost:8000/oreilly.png should return the image of the unsettling fellow in Figure 7-1, and the log should show something such as this: 127.0.0.1 - - [20/Feb/2013 22:03:48] "GET /oreilly.png HTTP/1.1" 200 - If you have other files in the same directory on your computer, they should show up in a listing on your display, and you can click any one to download it. If your browser is configured to display that file’s format, you’ll see the results on your screen; otherwise, your browser will ask you if you want to download and save the file. The default port number used is 8000, but you can specify another: $ python -m http.server 9999 You should see this: Serving HTTP on 0.0.0.0 port 9999 ... This Python-only server is best suited for quick tests. You can stop it by killing its process; in most terminals, press Ctrl+C. You should not use this basic server for a busy production website. Traditional web servers such as Apache and Nginx are much faster for serving static files. In addition, this simple server has no way to handle dynamic content, which more extensive servers can do by accepting parameters. —————————————— allure 魅力 anew 重新 scale 规模 appreciable 可感知的 startup 启动 merging 合并 leap 飞跃 hood 罩 —————————————— Web Server Gateway Interface All too soon, the allure of serving simple files wears off, and we want a web server that can also run programs dynamically. In the early days of the Web, the Common Gateway Interface (CGI) was designed for clients to make web servers run external programs and return the results. CGI also handled getting input arguments from the client through the server to the external programs. However, the programs were started anew for each client access. This could not scale well, because even small programs have appreciable startup time. To avoid this startup delay, people began merging the language interpreter into the web server. Apache ran PHP within its mod_php module, Perl in mod_perl, and Python in mod_python. Then, code in these dynamic languages could be executed within the long-running Apache process itself rather than in external programs. An alternative method was to run the dynamic language within a separate long-running program and have it communicate with the web server. FastCGI and SCGI are examples. Python web development made a leap with the definition of Web Server Gateway Interface (WSGI), a universal API between Python web applications and web servers. All of the Python web frameworks and web servers in the rest of this chapter use WSGI. You don’t normally need to know how WSGI works (there really isn’t much to it), but it helps to know what some of the parts under the hood are called. —————————————— WSGI Web Server Gateway Interface PythonWeb服务器网关接口 Merge 合并 authorization 授权 permissions 权限 Sessions 会话 transient 短暂的 —————————————— Frameworks Web servers handle the HTTP and WSGI details, but you use web frameworks to actually write the Python code that powers the site. So, we’ll talk about frameworks for a while and then get back to alternative ways of actually serving sites that use them. If you want to write a website in Python, there are many Python web frameworks (some might say too many). A web framework handles, at a minimum, client requests and server responses. It might provide some or all of these features: Routes Interpret URLs and find the corresponding server files or Python server code Templates Merge server-side data into pages of HTML Authentication and authorization Handle usernames, passwords, permissions Sessions Maintain transient data storage during a user’s visit to the website In the coming sections, we’ll write example code for two frameworks (bottle and flask). Then, we’ll talk about alternatives, especially for database-backed websites. You can find a Python framework to power any site that you can think of. —————————————— deploy 部署 route 路线;路径 —————————————— Bottle Bottle consists of a single Python file, so it’s very easy to try out, and it’s easy to deploy later. Bottle isn’t part of standard Python, so to install it, type the following command: $ pip install bottle Here’s code that will run a test web server and return a line of text when your browser accesses the URL http://localhost:9999/. Save it as bottle1.py: from bottle import route, run @route('/') def home(): return "It isn't fancy, but it's my home page" run(host='localhost', port=9999) Bottle uses the route decorator to associate a URL with the following function; in this case, / (the home page) is handled by the home() function. Make Python run this server script by typing this: $ python bottle1.py You should see this on your browser when you access http://localhost:9999: It isn't fancy, but it's my home page The run() function executes bottle’s built-in Python test web server. You don’t need to use this for bottle programs, but it’s useful for initial development and testing. Now, instead of creating text for the home page in code, let’s make a separate HTML file called index.html that contains this line of text: My new and improved home page!!! Make bottle return the contents of this file when the home page is requested. Save this script as bottle2.py: from bottle import route, run, static_file @route('/') def main(): return static_file('index.html', root='.') run(host='localhost', port=9999) In the call to static_file(), we want the file index.html in the directory indicated by root (in this case, '.', the current directory). If your previous server example code was still running, stop it. Now, run the new server: $ python bottle2.py When you ask your browser to get http:/localhost:9999/, you should see: My new and improved home page!!! Let’s add one last example that shows how to pass arguments to a URL and use them. Of course, this will be bottle3.py: from bottle import route, run, static_file @route('/') def home(): return static_file('index.html', root='.') @route('/echo/') def echo(thing): return "Say hello to my little friend: %s!" % thing run(host='localhost', port=9999) We have a new function called echo() and want to pass it a string argument in a URL. That’s what the line @route('/echo/') in the preceding example does. That in the route means that whatever was in the URL after /echo/ is assigned to the string argument thing, which is then passed to the echo function. To see what happens, stop the old server if it’s still running, and start it with the new code: $ python bottle3.py Then, access http://localhost:9999/echo/Mothra in your web browser. You should see the following: Say hello to my little friend: Mothra! Now, leave bottle3.py running for a minute so that we can try something else. You’ve been verifying that these examples work by typing URLs into your browser and looking at the displayed pages. You can also use client libraries such as requests to do your work for you. Save this as bottle_test.py: import requests resp = requests.get('http://localhost:9999/echo/Mothra') if resp.status_code == 200 and \ resp.text == 'Say hello to my little friend: Mothra!': print('It worked! That almost never happens!') else: print('Argh, got this:', resp.text) Great! Now, run it: $ python bottle_test.py You should see this in your terminal: It worked! That almost never happens! This is a little example of a unit test. Chapter 8 provides more details on why tests are good and how to write them in Python. There’s more to bottle than I’ve shown here. In particular, you can try adding these arguments when you call run(): ■debug=True creates a debugging page if you get an HTTP error; ■reloader=True reloads the page in the browser if you change any of the Python code. It’s well documented at the developer site. —————————————— ease 轻松 replicate 复写 prefix 前缀 intruders 入侵者 —————————————— __name__ __name__ is the name of the main module or package of the application. —————————————— Flask(1) Bottle is a good initial web framework. If you need a few more cowbells and whistles, try Flask. It started in 2010 as an April Fools’ joke, but enthusiastic response encouraged the author, Armin Ronacher, to make it a real framework. He named the result Flask as a wordplay on bottle. Flask is about as simple to use as Bottle, but it supports many extensions that are useful in professional web development, such as Facebook authentication and database integration. It’s my personal favorite among Python web frameworks because it balances ease of use with a rich feature set. The Flask package includes the werkzeug WSGI library and the jinja2 template library. You can install it from a terminal: $ pip install flask Let’s replicate the final bottle example code in flask. First, though, we need to make a few changes: ■Flask’s default directory home for static files is static, and URLs for files there also begin with /static. We change the folder to '.' (current directory) and the URL prefix to '' (empty) to allow the URL / to map to the file index.html. ■In the run() function, setting debug=True also activates the automatic reloader; bottle used separate arguments for debugging and reloading. Save this file to flask1.py: from flask import Flask app = Flask(__name__, static_folder='.', static_url_path='') @app.route('/') def home(): return app.send_static_file('index.html') @app.route('/echo/') def echo(thing): return "Say hello to my little friend: %s" % thing app.run(port=9999, debug=True) Then, run the server from a terminal or window: $ python flask1.py Test the home page by typing this URL into your browser: http://localhost:9999/ You should see the following (as you did for bottle): My new and improved home page!!! Try the /echo endpoint: http://localhost:9999/echo/Godzilla You should see this: Say hello to my little friend: Godzilla There’s another benefit to setting debug to True when calling run. If an exception occurs in the server code, Flask returns a specially formatted page with useful details about what went wrong, and where. Even better, you can type some commands to see the values of variables in the server program. Warning Do not set debug = True in production web servers. It exposes too much information about your server to potential intruders. —————————————— templating 模板 grab 抓取 render 渲染 dropping 下降;减少 —————————————— Flask(2) So far, the Flask example just replicates what we did with bottle. What can Flask do that bottle can’t? Flask includes jinja2, a more extensive templating system. Here’s a tiny example of how to use jinja2 and flask together. Create a directory called templates, and a file within it called flask2.html: Flask2 Example Say hello to my little friend: {{ thing }} Next, we’ll write the server code to grab this template, fill in the value of thing that we passed it, and render it as HTML (I’m dropping the home() function here to save space). Save this as flask2.py: from flask import Flask, render_template app = Flask(__name__) @app.route('/echo/') def echo(thing): return render_template('flask2.html', thing=thing) app.run(port=9999, debug=True) That thing = thing argument means to pass a variable named thing to the template, with the value of the string thing. Ensure that flask1.py isn’t still running, and start flask2.py: $ python flask2.py Now, type this URL: http://localhost:9999/echo/Gamera You should see the following: Say hello to my little friend: Gamera Let’s modify our template and save it in the templates directory as flask3.html: Flask3 Example Say hello to my little friend: {{ thing }}. Alas, it just destroyed {{ place }}! You can pass this second argument to the echo URL in many ways. Pass an argument as part of the URL path Using this method, you simply extend the URL itself (save this as flask3a.py): from flask import Flask, render_template app = Flask(__name__) @app.route('/echo//') def echo(thing, place): return render_template('flask3.html', thing=thing, place=place) app.run(port=9999, debug=True) As usual, stop the previous test server script if it’s still running and then try this new one: $ python flask3a.py The URL would look like this: http://localhost:9999/echo/Rodan/McKeesport And you should see the following: Say hello to my little friend: Rodan. Alas, it just destroyed McKeesport! —————————————— Flask(3) Or, you can provide the arguments as GET parameters (save this as flask3b.py): from flask import Flask, render_template, request app = Flask(__name__) @app.route('/echo/') def echo(): thing = request.args.get('thing') place = request.args.get('place') return render_template('flask3.html', thing=thing, place=place) app.run(port=9999, debug=True) Run the new server script: $ python flask3b.py This time, use this URL: http://localhost:9999/echo?thing=Gorgo&place=Wilmerding You should get back what you see here: Say hello to my little friend: Gorgo. Alas, it just destroyed Wilmerding! When a GET command is used for a URL, any arguments are passed in the form &key1=val1&key2=val2&... You can also use the dictionary ** operator to pass multiple arguments to a template from a single dictionary (call this flask3c.py): from flask import Flask, render_template, request app = Flask(__name__) @app.route('/echo/') def echo(): kwargs = {} kwargs['thing'] = request.args.get('thing') kwargs['place'] = request.args.get('place') return render_template('flask3.html', **kwargs) app.run(port=9999, debug=True) That **kwargs acts like thing=thing, place=place. It saves some typing if there are a lot of input arguments. The jinja2 templating language does a lot more than this. If you’ve programmed in PHP, you’ll see many similarities. —————————————— stability 稳定 —————————————— What is route? Clients such as web browsers send requests to the web server, which in turn sends them to the Flask application instance. The application instance needs to know what code needs to run for each URL requested, so it keeps a mapping of URLs to Python functions. The association between a URL and the function that handles it is called a route. —————————————— if __name__ == '__main__': app.run(debug=True) The __name__ == '__main__' Python idiom is used here to ensure that the development web server is started only when the script is executed directly. When the script is imported by another script, it is assumed that the parent script will launch a different server, so the app.run() call is skipped. —————————————— Non-Python Web Servers So far, the web servers we’ve used have been simple: the standard library’s http.server or the debugging servers in Bottle and Flask. In production, you’ll want to run Python with a faster web server. The usual choices are the following: apache with the mod_wsgi module nginx with the uWSGI app server Both work well; apache is probably the most popular, and nginx has a reputation for stability and lower memory use. —————————————— For the following things, all you need to do is give it a glimpse. —————————————— preferred 首选 salamander 火蜥蜴 daemon 守护进程 directive 指令 —————————————— Apache The apache web server’s best WSGI module is mod_wsgi. This can run Python code within the Apache process or in separate processes that communicate with Apache. You should already have apache if your system is Linux or OS X. For Windows, you’ll need to install apache. Finally, install your preferred WSGI-based Python web framework. Let’s try bottle here. Almost all of the work involves configuring Apache, which can be a dark art. Create this test file and save it as /var/www/test/home.wsgi: import bottle application = bottle.default_app() @bottle.route('/') def home(): return "apache and wsgi, sitting in a tree" Do not call run() this time, because that starts the built-in Python web server. We need to assign to the variable application because that’s what mod_wsgi looks for to marry the web server and the Python code. If apache and its mod_wsgi module are working correctly, we just need to connect them to our Python script. We want to add one line to the file that defines the default website for this apache server, but finding that file is a task in and of itself. It could be /etc/apache2/httpd.conf, or /etc/apache2/sites-available/default, or the Latin name of someone’s pet salamander. Let’s assume for now that you understand apache and found that file. Add this line inside the section that governs the default website: WSGIScriptAlias / /var/www/test/home.wsgi That section might then look like this: DocumentRoot /var/www WSGIScriptAlias / /var/www/test/home.wsgi Order allow,deny Allow from all Start apache, or restart it if it was running to make it use this new configuration. If you then browse to http://localhost/, you should see: apache and wsgi, sitting in a tree This runs mod_wsgi in embedded mode, as part of apache itself. You can also run it in daemon mode: as one or more processes, separate from apache. To do this, add two new directive lines to your apache config file: $ WSGIDaemonProcess domain-name user=user-name group=group-name threads=25 WSGIProcessGroup domain-name In the preceding example, user-name and group-name are the operating system user and group names, and the domain-name is the name of your Internet domain. A minimal apache config might look like this: DocumentRoot /var/www WSGIScriptAlias / /var/www/test/home.wsgi WSGIDaemonProcess mydomain.com user=myuser group=mygroup threads=25 WSGIProcessGroup mydomain.com Order allow,deny Allow from all —————————————— configurable 可配置的 knobs 旋钮 —————————————— The nginx Web Server The nginx web server does not have an embedded Python module. Instead, it communicates by using a separate WSGI server such as uWSGI. Together they make a very fast and configurable platform for Python web development. You can install nginx from its website. You also need to install uWSGI. uWSGI is a large system, with many levers and knobs to adjust. A short documentation page gives you instructions on how to combine Flask, nginx, and uWSGI. —————————————— peanut 花生 jelly 果冻 grew 增长 scope 范围 optimized 优化 performance 性能 gloss over 掩盖 demanding 要求 —————————————— Other Frameworks Websites and databases are like peanut butter and jelly—you see them together a lot. The smaller frameworks such as bottle and flask do not include direct support for databases, although some of their contributed add-ons do. If you need to crank out database-backed websites, and the database design doesn’t change very often, it might be worth the effort to try one of the larger Python web frameworks. The current main contenders include: django This is the most popular, especially for large sites. It’s worth learning for many reasons, among them the frequent requests for django experience in Python job ads. It includes ORM code (we talked about ORMs in The Object-Relational Mapper) to create automatic web pages for the typical database CRUD functions (create, replace, update, delete) that I discussed in SQL. You don’t have to use django’s ORM if you prefer another, such as SQLAlchemy, or direct SQL queries. web2py This covers much the same ground as django, with a different style.pyramid This grew from the earlier pylons project, and is similar to django in scope. turbogears This framework supports an ORM, many databases, and multiple template languages. wheezy.web This is a newer framework optimized for performance. It was faster than the others in a recent test. You can compare the frameworks by viewing this online table. If you want to build a website backed by a relational database, you don’t necessarily need one of these larger frameworks. You can use bottle, flask, and others directly with relational database modules, or use SQLAlchemy to help gloss over the differences. Then, you’re writing generic SQL instead of specific ORM code, and more developers know SQL than any particular ORM’s syntax. Also, there’s nothing written in stone demanding that your database must be a relational one. If your data schema varies significantly—columns that differ markedly across rows—it might be worthwhile to consider a schemaless database, such as one of the NoSQL databases discussed in NoSQL Data Stores. I once worked on a website that initially stored its data in a NoSQL database, switched to a relational one, on to another relational one, to a different NoSQL one, and then finally back to one of the relational ones. —————————————— simultaneous 同时 concurrency 并发性 —————————————— Other Python Web Servers Following are some of the independent Python-based WSGI servers that work like apache or nginx, using multiple processes and/or threads (see Concurrency) to handle simultaneous requests: uwsgi cherrypy pylons Here are some event-based servers, which use a single process but avoid blocking on any single request: tornado gevent gunicorn I have more to say about events in the discussion about concurrency in Chapter 11. —————————————— Automation 自动化 consuming 消耗 generating 生成 —————————————— Web Services and Automation We’ve just looked at traditional web client and server applications, consuming and generating HTML pages. Yet the Web has turned out to be a powerful way to glue applications and data in many more formats than HTML. —————————————— terminal 终端 enlightening 有启发性的 —————————————— The webbrowser Module Let’s start begin a little surprise. Start a Python session in a terminal window and type the following: >>> import antigravity This secretly calls the standard library’s webbrowser module and directs your browser to an enlightening Python link. You can use this module directly. This program loads the main Python site’s page in your browser: >>> import webbrowser >>> url = 'http://www.python.org/' >>> webbrowser.open(url) True This opens it in a new window: >>> webbrowser.open_new(url) True And this opens it in a new tab, if your browser supports tabs: >>> webbrowser.open_new_tab('http://www.python.org/') True The webbrowser makes your browser do all the work. —————————————— Representational 代表性的 Transfer 转移 consume 消费 doctoral 博士 thesis 论文 implies 暗示 retrieves 取回 —————————————— Web APIs and Representational State Transfer Often, data is only available within web pages. If you want to access it, you need to access the pages through a web browser and read it. If the authors of the website made any changes since the last time you visited, the location and style of the data might have changed. Instead of publishing web pages, you can provide data through a web application programming interface (API). Clients access your service by making requests to URLs and getting back responses containing status and data. Instead of HTML pages, the data is in formats that are easier for programs to consume, such as JSON or XML (refer to Chapter 8 for more about these formats). Representational State Transfer (REST) was defined by Roy Fielding in his doctoral thesis. Many products claim to have a REST interface or a RESTful interface. In practice, this often only means that they have a web interface—definitions of URLs to access a web service. A RESTful service uses the HTTP verbs in specific ways, as is described here: HEAD Gets information about the resource, but not its data. GET As its name implies, GET retrieves the resource’s data from the server. This is the standard method used by your browser. Any time you see a URL with a question mark (?) followed by a bunch of arguments, that’s a GET request. GET should not be used to create, change, or delete data. POST This verb updates data on the server. It’s often used by HTML forms and web APIs.PUT This verb creates a new resource. DELETE This one speaks for itself: DELETE deletes. Truth in advertising! A RESTful client can also request one or more content types from the server by using HTTP request headers. For example, a complex service with a REST interface might prefer its input and output to be JSON strings. —————————————— JSON JSON is especially well suited to web client-server data interchange. It’s especially popular in web-based APIs, such as OpenStack. —————————————— Crawl 爬取 Scrape 刮析(切刮分析) rating 评级 stock 股票 availability 可用性 extraneous 无关的 fetcher 取物者 unappealing 无吸引力的 haystack 干草堆 industrial 工业 —————————————— Crawl and Scrape Sometimes, you might want a little bit of information—a movie rating, stock price, or product availability—but the information is available only in HTML pages, surrounded by ads and extraneous content. You could extract what you’re looking for manually by doing the following: Type the URL into your browser. Wait for the remote page to load. Look through the displayed page for the information you want. Write it down somewhere. Possibly repeat the process for related URLs. However, it’s much more satisfying to automate some or all of these steps. An automated web fetcher is called a crawler or spider (unappealing terms to arachnophobes). After the contents have been retrieved from the remote web servers, a scraper parses it to find the needle in the haystack. If you need an industrial-strength combined crawler and scraper, Scrapy is worth downloading: $ pip install scrapy Scrapy is a framework, not a module such as BeautifulSoup. It does more, but it’s more complex to set up. To learn more about Scrapy, read the online introduction. —————————————— complications 并发症;困难 destination 目的地 grunt work 枯燥工作 enumerate 列举 —————————————— Scrape HTML with BeautifulSoup If you already have the HTML data from a website and just want to extract data from it, BeautifulSoup is a good choice. HTML parsing is harder than it sounds. This is because much of the HTML on public web pages is technically invalid: unclosed tags, incorrect nesting, and other complications. If you try to write your own HTML parser by using regular expressions (discussed in Chapter 7) you’ll soon encounter these messes. To install BeautifulSoup, type the following command (don’t forget the final 4, or pip will try to install an older version and probably fail): $ pip install beautifulsoup4 Now, let’s use it to get all the links from a web page. The HTML a element represents a link, and href is its attribute representing the link destination. In the following example, we’ll define the function get_links() to do the grunt work, and a main program to get one or more URLs as command-line arguments: def get_links(url): import requests from bs4 import BeautifulSoup as soup result = requests.get(url) page = result.text doc = soup(page) links = [element.get('href') for element in doc.find_all('a')] return links if __name__ == '__main__': import sys for url in sys.argv[1:]: print('Links in', url) for num, link in enumerate(get_links(url), start=1): print(num, link) print() I saved this program as links.py and then ran this command: $ python links.py http://boingboing.net Here are the first few lines that it printed: Links in http://boingboing.net/ 1 http://boingboing.net/suggest.html 2 http://boingboing.net/category/feature/ 3 http://boingboing.net/category/review/ 4 http://boingboing.net/category/podcasts 5 http://boingboing.net/category/video/ 6 http://bbs.boingboing.net/ 7 javascript:void(0) 8 http://shop.boingboing.net/ 9 http://boingboing.net/about 10 http://boingboing.net/contact —————————————— Things to Do 9.1. If you haven’t installed flask yet, do so now. This will also install werkzeug, jinja2, and possibly other packages. —————————————— skeleton 骨架 —————————————— 9.2. Build a skeleton website, using Flask’s debug/reload development web server. Ensure that the server starts up for hostname localhost on default port 5000. If your computer is already using port 5000 for something else, use another port number. Here’s flask1.py: from flask import Flask app = Flask(__name__) app.run(port=5000, debug=True) Gentlemen, start your engines: $ python flask1.py * Running on http://127.0.0.1:5000/ * Restarting with reloader —————————————— 9.3. Add a home() function to handle requests for the home page. Set it up to return the string It's alive! from flask import Flask app = Flask(__name__) @app.route('/') def home(): return "It's alive!" app.run(debug=True) Start the server: $ python flask2.py * Running on http://127.0.0.1:5000/ * Restarting with reloader —————————————— referring 指 —————————————— 9.4. Create a Jinja2 template file called home.html with the following contents: It's alive! I'm of course referring to {{thing}}, which is {{height}} feet tall and {{color}}. Make a directory called templates and create the file home.html with the contents just shown. If your Flask server is still running from the previous examples, it will detect the new content and restart itself. —————————————— 9.5. Modify your server’s home() function to use the home.html template. Provide it with three GET parameters: thing, height, and color. Here comes flask3.py: from flask import Flask, request, render_template app = Flask(__name__) @app.route('/') def home(): thing = request.values.get('thing') height = request.values.get('height') color = request.values.get('color') return render_template('home.html', thing=thing, height=height, color=color) app.run(debug=True) Go to this address in your web client: http://localhost:5000/?thing=Octothorpe&height=7&color=green You should see the following: I'm of course referring to Octothorpe, which is 7 feet tall and green. —————————————— One thing a computer can do that most humans can’t is be sealed up in a cardboard box and sit in a warehouse. 电脑可以做的一件事,大多数人不能被密封在一个纸板盒,坐在一个仓库。 insomnia 失眠 —————————————— Additional Chapter -1. Application Programming Interfaces In recent years, there has been a trend in web applications to move more and more of the business logic to the client side, producing an architecture that is known as Rich Internet Application (RIA). In RIAs, the server’s main (and sometimes only) function is to provide the client application with data retrieval and storage services. In this model, the server becomes a web service or Application Programming Interface (API). There are several protocols by which RIAs can communicate with a web service. Remote Procedure Call (RPC) protocols such as XML-RPC or its derivative Simplified Object Access Protocol (SOAP) were popular choices a few years ago. More recently, the Representational State Transfer (REST) architecture has emerged as the favorite for web applications due to it being built on the familiar model of the World Wide Web. Flask is an ideal framework to build RESTful web services due to its lightweight nature. In this chapter, you will learn how to implement a Flask-based RESTful API. —————————————— architectural 建筑式的 characteristics 特征 cacheable Could be 缓存 noncacheable Couldn't be 缓存 intermediaries 中介 optimization 优化 purposes 目的 consistent 一致的 Layered 分层的 scalability 可伸缩性 —————————————— Introduction to REST Roy Fielding’s Ph.D. dissertation introduces the REST architectural style for web services by listing its six defining characteristics: Client-Server There must be a clear separation between the clients and the server. Stateless A client request must contain all the information that is necessary to carry it out. The server must not store any state about the client that persists from one request to the next. Cache Responses from the server can be labeled as cacheable or noncacheable so that clients (or intermediaries between clients and servers) can use a cache for optimization purposes. Uniform Interface The protocol by which clients access server resources must be consistent, well defined, and standardized. The commonly used uniform interface of REST web services is the HTTP protocol. Layered System Proxy servers, caches, or gateways can be inserted between clients and servers as necessary to improve performance, reliability, and scalability. Code-on-Demand Clients can optionally download code from the server to execute in their context. —————————————— represents 代表 identifier 标识符 treatment 治疗 reverse 反向 —————————————— Resources Are Everything The concept of resources is core to the REST architectural style. In this context, a resource is an item of interest in the domain of the application. For example, in the blogging application, users, blog posts, and comments are all resources. Each resource must have a unique URL that represents it. Continuing with the blogging example, a blog post could be represented by the URL /api/posts/12345, where 12345 is a unique identifier for the post such as the post’s database primary key. The format or contents of the URL do not really matter; all that matters is that each resource URL uniquely identifies a resource. A collection of all the resources in a class also has an assigned URL. The URL for the collection of blog posts could be /api/posts/ and the URL for the collection of all comments could be /api/comments/. An API can also define collection URLs that represent logical subsets of all the resources in a class. For example, the collection of all comments in blog post 12345 could be represented by the URL /api/posts/12345/comments/. It is a common practice to define URLs that represent collections of resources with a trailing slash, as this gives them a “folder” representation. Tip Be aware that Flask applies special treatment to routes that end with a slash. If a client requests a URL without a trailing slash and the only matching route has a slash at the end, then Flask will automatically respond with a redirect to the trailing slash URL. No redirects are issued for the reverse case. —————————————— Request Methods The client application sends requests to the server at the established resource URLs and uses the request method to indicate the desired operation. Followling request methods are commonly used in RESTful APIs: GET POST PUT DELETE If you have any questions, Google for yourself. —————————————— forth 出来 negotiation 谈判 mechanisms 机制 ties 关系 —————————————— Request and Response Bodies Resources are sent back and forth between client and server in the bodies of requests and responses, but REST does not specify the format to use to encode resources. The Content-Type header in requests and responses is used to indicate the format in which a resource is encoded in the body. The standard content negotiation mechanisms in the HTTP protocol can be used between client and server to agree on a format that both support. The two formats commonly used with RESTful web services are JavaScript Object Notation (JSON) and Extensible Markup Language (XML). For web-based RIAs, JSON is attractive because of its close ties to JavaScript, the client-side scripting language used by web browsers. Returning to the blog example API, a blog post resource could be represented in JSON as follows: { "url": "http://www.example.com/api/posts/12345", "title": "Writing RESTful APIs in Python", "author": "http://www.example.com/api/users/2", "body": "... text of the article here ...", "comments": "http://www.example.com/api/posts/12345/comments" } Note how the url, author, and comments fields in the blog post above are fully qualified resource URLs. This is important because these URLs allow the client to discover new resources. In a well-designed RESTful API, the client just knows a short list of top-level resource URLs and then discovers the rest from links included in responses, similar to how you can discover new web pages while browsing the Web by clicking on links that appear in pages that you know. —————————————— tolerant 宽容 organized 有组织的 maintenance 维护 burden 负担 deployments 部署 —————————————— Versioning In a traditional server-centric web application, the server has full control of the application. When an application is updated, installing the new version in the server is enough to update all users because even the parts of the application that run in the user’s web browser are downloaded from the server. The situation with RIAs and web services is more complicated, because often clients are developed independently of the server—maybe even by different people. Consider the case of an application where the RESTful web service is used by a variety of clients including web browsers and native smartphone clients. The web browser client can be updated in the server at any time, but the smartphone apps cannot be updated by force; the smartphone owner needs to allow the update to happen. Even if the smartphone owner is willing to update, it is not possible to time the deployment of the updated smartphone applications to all the app stores to coincide exactly with the deployment of the new server. For these reasons, web services need to be more tolerant than regular web applications and be able to work with old versions of its clients. A common way to address this problem is to version the URLs handled by the web service. For example, the first release of the blogging web service could expose the collection of blog posts at /api/v1.0/posts/. Including the web service version in the URL helps keeps old and new features organized so that the server can provide new features to new clients while continuing to support old clients. An update to the blogging service could change the JSON format of blog posts and now expose blog posts as /api/v1.1/posts/, while keeping the older JSON format for clients that connect to /api/v1.0/posts/. For a period of time, the server handles all the URLs in their v1.1 and v1.0 variations. Although supporting multiple versions of the server can become a maintenance burden, there are situations in which this is the only way to allow the application to grow without causing problems to existing deployments. —————————————— RESTful Web Services with Flask Flask makes it very easy to create RESTful web services. The familiar route() decorator along with its methods optional argument can be used to declare the routes that handle the resource URLs exposed by the service. Working with JSON data is also simple, as JSON data included with a request is automatically exposed as a request.json Python dictionary and a response that needs to contain JSON can be easily generated from a Python dictionary using Flask’s jsonify() helper function. The following sections show how Flasky can be extended with a RESTful web service that gives clients access to blog posts and related resources. —————————————— This chapter is a mistake I made, so if you really want to know how to construct a API, please visit its official website: http://flask.pocoo.org/ —————————————— You may couldn't remember all those knowledge. Don't worry, just regard it as a dictionary. Then all you need to do, is simply remember its title and what they said. The real important thing is how to use it, and make it work in reality. Reading and understanding are just the first step. —————————————— Chapter 10. Systems In your everyday use of a computer, you do such things as list the contents of a folder or directory, create and remove files, and other housekeeping that’s necessary if not particularly exciting. You can also carry out these tasks, and more, within your own Python programs. Will this power drive you mad or cure your insomnia? We’ll see. Python provides many system functions through a module named os (for “operating system”), which we’ll import for all the programs in this chapter. —————————————— patterned 有图案的 —————————————— Files Python, like many other languages, patterned its file operations after Unix. Some functions, such as chown() and chmod(), have the same names, but there are a few new ones. —————————————— Create with open() File Input/Output introduced you to the open() function and explains how you can use it to open a file or create one if it doesn’t already exist. Let’s create a text file called oops.txt: >>> fout = open('oops.txt', 'wt') >>> print('Oops, I created a file.', file=fout) >>> fout.close() With that done, let’s perform some tests with it. —————————————— Check Existence with exists() To verify whether the file or directory is really there or you just imagined it, you can provide exists(), with a relative or absolute pathname, as demonstrated here: >>> import os >>> os.path.exists('oops.txt') True >>> os.path.exists('./oops.txt') True >>> os.path.exists('waffles') False >>> os.path.exists('.') True >>> os.path.exists('..') True —————————————— symbolic 象征性的 law-abiding file 守法的文件 shorthand 速记 fully qualified filenames 完全限定的文件名 —————————————— Check Type with isfile() The functions in this section check whether a name refers to a file, directory, or symbolic link (see the examples that follow for a discussion of links). The first function we’ll look at, isfile, asks a simple question: is it a plain old law-abiding file? >>> name = 'oops.txt' >>> os.path.isfile(name) True Here’s how you determine a directory: >>> os.path.isdir(name) False A single dot (.) is shorthand for the current directory, and two dots (..) stands for the parent directory. These always exist, so a statement such as the following will always report True: >>> os.path.isdir('.') True The os module contains many functions dealing with pathnames (fully qualified filenames, starting with / and including all parents). One such function, isabs(), determines whether its argument is an absolute pathname. The argument doesn’t need to be the name of a real file: >>> os.path.isabs(name) False >>> os.path.isabs('/big/fake/name') True >>> os.path.isabs('big/fake/name/without/a/leading/slash') False —————————————— Copy with copy() The copy() function comes from another module, shutil. This example copies the file oops.txt to the file ohno.txt: >>> import shutil >>> shutil.copy('oops.txt', 'ohno.txt') The shutil.move() function copies a file and then removes the original. —————————————— Change Name with rename() This function does exactly what it says. In the example here, it renames ohno.txt to ohwell.txt: >>> import os >>> os.rename('ohno.txt', 'ohwell.txt') —————————————— Link with link() or symlink() In Unix, a file exists in one place, but it can have multiple names, called links. In low-level hard links, it’s not easy to find all the names for a given file. A symbolic link is an alternative method that stores the new name as its own file, making it possible for you to get both the original and new names at once. The link() call creates a hard link, and symlink() makes a symbolic link. The islink() function checks whether the file is a symbolic link. Here’s how to make a hard link to the existing file oops.txt from the new file yikes.txt: >>> os.link('oops.txt', 'yikes.txt') >>> os.path.isfile('yikes.txt') True To create a symbolic link to the existing file oops.txt from the new file jeepers.txt, use the following: >>> os.path.islink('yikes.txt') False >>> os.symlink('oops.txt', 'jeepers.txt') >>> os.path.islink('jeepers.txt') True —————————————— intensely 强烈的 compress 压缩 octal 八进制 cryptic 神秘的 obscure 晦涩难懂的 constants 常量 —————————————— Change Permissions with chmod() On a Unix system, chmod() changes file permissions.There are read, write, and execute permissions for the user (that’s usually you, if you created the file), the main group that the user is in, and the rest of the world. The command takes an intensely compressed octal (base 8) value that combines user, group, and other permissions. For instance, to make oops.txt only readable by its owner, type the following: >>> os.chmod('oops.txt', 0o400) If you don’t want to deal with cryptic octal values and would rather deal with (slightly) obscure cryptic symbols, you can import some constants from the stat module and use a statement such as the following: >>> import stat >>> os.chmod('oops.txt', stat.S_IRUSR) —————————————— Ownership 所有权 numeric 数字 —————————————— Change Ownership with chown() This function is also Unix/Linux/Mac–specific. You can change the owner and/or group ownership of a file by specifying the numeric user ID (uid) and group ID (gid): >>> uid = 5 >>> gid = 22 >>> os.chown('oops', uid, gid) —————————————— Get a Pathname with abspath() This function expands a relative name to an absolute one. If your current directory is /usr/gaberlunzie and the file oops.txt is there, also, you can type the following: >>> os.path.abspath('oops.txt') '/usr/gaberlunzie/oops.txt' —————————————— Delete a File with remove() In this snippet, we use the remove() function and say farewell to oops.txt: >>> os.remove('oops.txt') >>> os.path.exists('oops.txt') False —————————————— hierarchy 层次结构 —————————————— Directories In most operating systems, files exist in a hierarchy of directories (more often called folders these days). The container of all of these files and directories is a file system (sometimes called a volume). The standard os module deals with operating specifics such as these and provides the following functions with which you can manipulate them. —————————————— Create with mkdir() This example shows how to create a directory called poems to store that precious verse: >>> os.mkdir('poems') >>> os.path.exists('poems') True —————————————— Delete with rmdir() Upon second thought, you decide you don’t need that directory after all. Here’s how to delete it: >>> os.rmdir('poems') >>> os.path.exists('poems') False —————————————— List Contents with listdir() Okay, take two; let’s make poems again, with some contents: >>> os.mkdir('poems') Now, get a list of its contents (none so far): >>> os.listdir('poems') [] Next, make a subdirectory: >>> os.mkdir('poems/mcintyre') >>> os.listdir('poems') ['mcintyre'] Create a file in this subdirectory (don’t type all these lines unless you really feel poetic; just make sure you begin and end with matching quotes, either single or tripled): >>> fout = open('poems/mcintyre/the_good_man', 'wt') >>> fout.write('''Cheerful and happy was his mood, ... He to the poor was kind and good, ... And he oft' times did find them food, ... Also supplies of coal and wood, ... He never spake a word was rude, ... And cheer'd those did o'er sorrows brood, ... He passed away not understood, ... Because no poet in his lays ... Had penned a sonnet in his praise, ... 'Tis sad, but such is world's ways. ... ''') 344 >>> fout.close() Finally, let’s see what we have. It had better be there: >>> os.listdir('poems/mcintyre') ['the_good_man'] —————————————— Change Current Directory with chdir() With this function, you can go from one directory to another. Let’s leave the current directory and spend a little time in poems: >>> import os >>> os.chdir('poems') >>> os.listdir('.') ['mcintyre'] —————————————— List Matching Files with glob() The glob() function matches file or directory names by using Unix shell rules rather than the more complete regular expression syntax. Here are those rules: * matches everything (re would expect .*) ? matches a single character [abc] matches character a, b, or c [!abc] matches any character except a, b, or c Try getting all files or directories that begin with m: >>> import glob >>> glob.glob('m*') ['mcintyre'] How about any two-letter files or directories? >>> glob.glob('??') [] I’m thinking of an eight-letter word that begins with m and ends with e: >>> glob.glob('m??????e') ['mcintyre'] What about anything that begins with a k, l, or m, and ends with e? >>> glob.glob('[klm]*e') ['mcintyre'] —————————————— kernel 内核 interfere 影响 —————————————— Programs and Processes When you run an individual program, your operating system creates a single process. It uses system resources (CPU, memory, disk space) and data structures in the operating system’s kernel (file and network connections, usage statistics, and so on). A process is isolated from other processes—it can’t see what other processes are doing or interfere with them. The operating system keeps track of all the running processes, giving each a little time to run and then switching to another, with the twin goals of spreading the work around fairly and being responsive to the user. You can see the state of your processes with graphical interfaces such as the Mac’s Activity Monitor (OS X), or Task Manager on Windows-based computers. You can also access process data from your own programs. The standard library’s os module provides a common way of accessing some system information. For instance, the following functions get the process ID and the current working directory of the running Python interpreter: >>> import os >>> os.getpid() 76051 >>> os.getcwd() '/Users/williamlubanovic' And these get my user ID and group ID: >>> os.getuid() 501 >>> os.getgid() 20 —————————————— concurrency 并发性 variant 变体 —————————————— Create a Process with subprocess All of the programs that you’ve seen here so far have been individual processes. You can start and stop other existing programs from Python by using the standard library’s subprocess module. If you just want to run another program in a shell and grab whatever output it created (both standard output and standard error output), use the getoutput() function. Here, we’ll get the output of the Unix date program: >>> import subprocess >>> ret = subprocess.getoutput('date') >>> ret 'Sun Mar 30 22:54:37 CDT 2014' You won’t get anything back until the process ends. If you need to call something that might take a lot of time, see the discussion on concurrency in Concurrency. Because the argument to getoutput() is a string representing a complete shell command, you can include arguments, pipes, < and > I/O redirection, and so on: >>> ret = subprocess.getoutput('date -u') >>> ret 'Mon Mar 31 03:55:01 UTC 2014' Piping that output string to the wc command counts one line, six “words,” and 29 characters: >>> ret = subprocess.getoutput('date -u | wc') >>> ret ' 1 6 29' A variant method called check_output() takes a list of the command and arguments. By default it only returns standard output as type bytes rather than a string and does not use the shell: >>> ret = subprocess.check_output(['date', '-u']) >>> ret b'Mon Mar 31 04:01:50 UTC 2014\n' To show the exit status of the other program, getstatusoutput() returns a tuple with the status code and output: >>> ret = subprocess.getstatusoutput('date') >>> ret (0, 'Sat Jan 18 21:36:23 CST 2014') If you don’t want to capture the output but might want to know its exit status, use call(): >>> ret = subprocess.call('date') Sat Jan 18 21:33:11 CST 2014 >>> ret 0 (In Unix-like systems, 0 is usually the exit status for success.) That date and time was printed to output but not captured within our program. So, we saved the return code as ret. You can run programs with arguments in two ways. The first is to specify them in a single string. Our sample command is date -u, which prints the current date and time in UTC (you’ll read more about UTC in a few pages): >>> ret = subprocess.call('date -u', shell=True) Tue Jan 21 04:40:04 UTC 2014 You need that shell=True to recognize the command line date -u, splitting it into separate strings and possibly expanding any wildcard characters such as * (we didn’t use any in this example). The second method makes a list of the arguments, so it doesn’t need to call the shell: >>> ret = subprocess.call(['date', '-u']) Tue Jan 21 04:41:59 UTC 2014 —————————————— spawned 催生了 bells 钟 clown 小丑 calliope A 风琴 queue 队列 —————————————— Create a Process with multiprocessing You can run a Python function as a separate process or even run multiple independent processes in a single program with the multiprocessing module. Here’s a short example that does nothing useful; save it as mp.py and then run it by typing python mp.py: import multiprocessing import os def do_this(what): whoami(what) def whoami(what): print("Process %s says: %s" % (os.getpid(), what)) if __name__ == "__main__": whoami("I'm the main program") for n in range(4): p = multiprocessing.Process(target=do_this, args=("I'm function %s" % n,)) p.start() When I run this, my output looks like this: Process 6224 says: I'm the main program Process 6225 says: I'm function 0 Process 6226 says: I'm function 1 Process 6227 says: I'm function 2 Process 6228 says: I'm function 3 The Process() function spawned a new process and ran the do_this() function in it. Because we did this in a loop that had four passes, we generated four new processes that executed do_this() and then exited. The multiprocessing module has more bells and whistles than a clown on a calliope. It’s really intended for those times when you need to farm out some task to multiple processes to save overall time; for example, downloading web pages for scraping, resizing images, and so on. It includes ways to queue tasks, enable intercommunication among processes, and wait for all the processes to finish. —————————————— terminate 终止 stuck 卡住了 —————————————— Kill a Process with terminate() If you created one or more processes and want to terminate one for some reason (perhaps it’s stuck in a loop, or maybe you’re bored, or you want to be an evil overlord), use terminate(). In the example that follows, our process would count to a million, sleeping at each step for a second, and printing an irritating message. However, our main program runs out of patience in five seconds and nukes it from orbit: import multiprocessing import time import os def whoami(name): print("I'm %s, in process %s" % (name, os.getpid())) def loopy(name): whoami(name) start = 1 stop = 1000000 for num in range(start, stop): print("\tNumber %s of %s. Honk!" % (num, stop)) time.sleep(1) if __name__ == "__main__": whoami("main") p = multiprocessing.Process(target=loopy, args=("loopy",)) p.start() time.sleep(5) p.terminate() When I run this program, I get the following: I'm main, in process 97080 I'm loopy, in process 97081 Number 1 of 1000000. Honk! Number 2 of 1000000. Honk! Number 3 of 1000000. Honk! Number 4 of 1000000. Honk! Number 5 of 1000000. Honk! —————————————— Calendars 日历 ambiguous 模棱两可的 leap year 闰年 longitude 经度 hemisphere 半球 vice versa 反之亦然 overlap 重叠 —————————————— Calendars and Clocks Programmers devote a surprising amount of effort to dates and times. Let’s talk about some of the problems they encounter, and then get to some best practices and tricks to make the situation a little less messy. Dates can be represented in many ways—too many ways, actually. Even in English with the Roman calendar, you’ll see many variants of a simple date: ▪July 29 1984 ▪29 Jul 1984 ▪29/7/1984 ▪7/29/1984 Among other problems, date representations can be ambiguous. In the previous examples, it’s easy to determine that 7 stands for the month and 29 is the day of the month, largely because months don’t go to 29. But how about 1/6/2012? Is that referring to January 6 or June 1? The month name varies by language within the Roman calendar. Even the year and month can have a different definition in other cultures. Leap years are another wrinkle. You probably know that every four years is a leap year (and the summer Olympics and the American presidential election). Did you also know that every 100 years is not a leap year, but that every 400 years is? Here’s code to test various years for leapiness: >>> import calendar >>> calendar.isleap(1900) False >>> calendar.isleap(1996) True >>> calendar.isleap(1999) False >>> calendar.isleap(2000) True >>> calendar.isleap(2002) False >>> calendar.isleap(2004) True Times have their own sources of grief, especially because of time zones and daylight savings time. If you look at a time zone map, the zones follow political and historic boundaries rather than every 15 degrees (360 degrees / 24) of longitude. And countries start and end daylight saving times on different days of the year. In fact, countries in the southern hemisphere advance their clocks when the northern hemisphere is winding them back, and vice versa. (If you think about it a bit, you will see why.) Python’s standard library has many date and time modules: datetime, time, calendar, dateutil, and others. There’s some overlap, and it’s a bit confusing. —————————————— investigating 调查 astronomical 天文 microsecond 微秒 subsecond 次秒级 yank 猛地一拉 —————————————— The datetime Module(1) Let’s begin by investigating the standard datetime module. It defines four main objects, each with many methods: ▪date for years, months, and days ▪time for hours, minutes, seconds, and fractions ▪datetime for dates and times together ▪timedelta for date and/or time intervals You can make a date object by specifying a year, month, and day. Those values are then available as attributes: >>> from datetime import date >>> halloween = date(2014, 10, 31) >>> halloween datetime.date(2014, 10, 31) >>> halloween.day 31 >>> halloween.month 10 >>> halloween.year 2014 You can print a date with its isoformat() method: >>> halloween.isoformat() '2014-10-31' The iso refers to ISO 8601, an international standard for representing dates and times. It goes from most general (year) to most specific (day). It also sorts correctly: by year, then month, then day. I usually pick this format for date representation in programs, and for filenames that save data by date. The next section describes the more complex strptime() and strftime() methods for parsing and formatting dates. This example uses the today() method to generate today’s date: >>> from datetime import date >>> now = date.today() >>> now datetime.date(2014, 2, 2) This one makes use of a timedelta object to add some time interval to a date: >>> from datetime import timedelta >>> one_day = timedelta(days=1) >>> tomorrow = now + one_day >>> tomorrow datetime.date(2014, 2, 3) >>> now + 17*one_day datetime.date(2014, 2, 19) >>> yesterday = now - one_day >>> yesterday datetime.date(2014, 2, 1) The range of date is from date.min (year=1, month=1, day=1) to date.max (year=9999, month=12, day=31). As a result, you can’t use it for historic or astronomical calculations. The datetime module’s time object is used to represent a time of day: >>> from datetime import time >>> noon = time(12, 0, 0) >>> noon datetime.time(12, 0) >>> noon.hour 12 >>> noon.minute 0 >>> noon.second 0 >>> noon.microsecond 0 The arguments go from the largest time unit (hours) to the smallest (microseconds). If you don’t provide all the arguments, time assumes all the rest are zero. By the way, just because you can store and retrieve microseconds doesn’t mean you can retrieve time from your computer to the exact microsecond. The accuracy of subsecond measurements depends on many factors in the hardware and operating system. —————————————— The datetime Module(2) The datetime object includes both the date and time of day. You can create one directly, such as the one that follows, which is for January 2, 2014, at 3:04 A.M., plus 5 seconds and 6 microseconds: >>> from datetime import datetime >>> some_day = datetime(2014, 1, 2, 3, 4, 5, 6) >>> some_day datetime.datetime(2014, 1, 2, 3, 4, 5, 6) The datetime object also has an isoformat() method: >>> some_day.isoformat() '2014-01-02T03:04:05.000006' That middle T separates the date and time parts. datetime has a now() method with which you can get the current date and time: >>> from datetime import datetime >>> now = datetime.now() >>> now datetime.datetime(2014, 2, 2, 23, 15, 34, 694988) 14 >>> now.month 2 >>> now.day 2 >>> now.hour 23 >>> now.minute 15 >>> now.second 34 >>> now.microsecond 694988 You can merge a date object and a time object into a datetime object by using combine(): >>> from datetime import datetime, time, date >>> noon = time(12) >>> this_day = date.today() >>> noon_today = datetime.combine(this_day, noon) >>> noon_today datetime.datetime(2014, 2, 2, 12, 0) You can yank the date and time from a datetime by using the date() and time() methods: >>> noon_today.date() datetime.date(2014, 2, 2) >>> noon_today.time() datetime.time(12, 0) —————————————— denominator 分母 formerly 以前 omit 省略 mystified 迷惑 duplicates 重复的 dropouts 辍学 —————————————— Using the time Module It is confusing that Python has a datetime module with a time object, and a separate time module. Furthermore, the time module has a function called—wait for it—time(). One way to represent an absolute time is to count the number of seconds since some starting point. Unix time uses the number of seconds since midnight on January 1, 1970.[8] This value is often called the epoch, and it is often the simplest way to exchange dates and times among systems. The time module’s time() function returns the current time as an epoch value: >>> import time >>> now = time.time() >>> now 1391488263.664645 If you do the math, you’ll see that it has been over one billion seconds since New Year’s, 1970. Where did the time go? You can convert an epoch value to a string by using ctime(): >>> time.ctime(now) 'Mon Feb 3 22:31:03 2014' In the next section, you’ll see how to produce more attractive formats for dates and times. Epoch values are a useful least-common denominator for date and time exchange with different systems, such as JavaScript. Sometimes, though, you need actual days, hours, and so forth, which time provides as struct_time objects. localtime() provides the time in your system’s time zone, and gmtime() provides it in UTC: >>> time.localtime(now) time.struct_time(tm_year=2014, tm_mon=2, tm_mday=3, tm_hour=22, tm_min=31, tm_sec=3, tm_wday=0, tm_yday=34, tm_isdst=0) >>> time.gmtime(now) time.struct_time(tm_year=2014, tm_mon=2, tm_mday=4, tm_hour=4, tm_min=31, tm_sec=3, tm_wday=1, tm_yday=35, tm_isdst=0) In my (Central) time zone, 22:31 was 04:31 of the next day in UTC (formerly called Greenwich time or Zulu time). If you omit the argument to localtime() or gmtime(), they assume the current time. The opposite of these is mktime(), which converts a struct_time object to epoch seconds: >>> tm = time.localtime(now) >>> time.mktime(tm) 1391488263.0 This doesn’t exactly match our earlier epoch value of now() because the struct_time object preserves time only to the second. Some advice: wherever possible, use UTC instead of time zones. UTC is an absolute time, independent of time zones. If you have a server, set its time to UTC; do not use local time. Here’s some more advice (free of charge, no less): never use daylight savings time if you can avoid it. If you use daylight savings time, an hour disappears at one time of year (“spring ahead”) and occurs twice at another time (“fall back”). For some reason, many organizations use daylight savings in their computer systems, but are mystified every year by data duplicates and dropouts. It all ends in tears. Note Remember, your friends are UTC for times, and UTF-8 for strings. —————————————— abbreviation 缩写 elves 精灵 wonky 靠不住的 —————————————— Read and Write Dates and Times(1) isoformat() is not the only way to write dates and times. You already saw the ctime() function in the time module, which you can use to convert epochs to strings: >>> import time >>> now = time.time() >>> time.ctime(now) 'Mon Feb 3 21:14:36 2014' You can also convert dates and times to strings by using strftime(). This is provided as a method in the datetime, date, and time objects, and as a function in the time module. strftime() uses format strings to specify the output, which you can see in here: %Y year 1900-… %m month 01-12 %B month name January, … %b month abbrev Jan, … %d day of month 01-31 %A weekday name Sunday, … a weekday abbrev Sun, … %H hour (24 hr) 00-23 %I hour (12 hr) 01-12 %p AM/PM AM, PM %M minute 00-59 %S second 00-59 Numbers are zero-padded on the left. Here’s the strftime() function provided by the time module. It converts a struct_time object to a string. We’ll first define the format string fmt and use it again later: >>> import time >>> fmt = "It's %A, %B %d, %Y, local time %I:%M:%S%p" >>> t = time.localtime() >>> t time.struct_time(tm_year=2014, tm_mon=2, tm_mday=4, tm_hour=19, tm_min=28, tm_sec=38, tm_wday=1, tm_yday=35, tm_isdst=0) >>> time.strftime(fmt, t) "It's Tuesday, February 04, 2014, local time 07:28:38PM" If we try this with a date object, only the date parts will work, and the time defaults to midnight: >>> from datetime import date >>> some_day = date(2014, 7, 4) >>> fmt = "It's %B %d, %Y, local time %I:%M:%S%p" >>> some_day.strftime(fmt) "It's Friday, July 04, 2014, local time 12:00:00AM" For a time object, only the time parts are converted: >>> from datetime import time >>> some_time = time(10, 35) >>> some_time.strftime(fmt) "It's Monday, January 01, 1900, local time 10:35:00AM" Clearly, you won’t want to use the day parts from a time object, because they’re meaningless. —————————————— Read and Write Dates and Times(2) To go the other way and convert a string to a date or time, use strptime() with the same format string. There’s no regular expression pattern matching; the nonformat parts of the string (without %) need to match exactly. Let’s specify a format that matches year-month-day, such as 2012-01-29. What happens if the date string you want to parse has spaces instead of dashes? >>> import time >>> fmt = "%Y-%m-%d" >>> time.strptime("2012 01 29", fmt) Traceback (most recent call last): File "", line 1, in File "/Library/Frameworks/Python.framework/Versions/3.3/lib/ python3.3/_strptime.py", line 494, in _strptime_time tt = _strptime(data_string, format)[0] File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/_strptime.py", line 337, in _strptime (data_string, format)) ValueError: time data '2012 01 29' does not match format '%Y-%m-%d' If we feed strptime() some dashes, is it happy now? >>> time.strptime("2012-01-29", fmt) time.struct_time(tm_year=2012, tm_mon=1, tm_mday=29, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=6, tm_yday=29, tm_isdst=-1) Yes. Even if the string seems to match its format, an exception is raised if a value is out of range: >>> time.strptime("2012-13-29", fmt) Traceback (most recent call last): File "", line 1, in File "/Library/Frameworks/Python.framework/Versions/3.3/lib/ python3.3/_strptime.py", line 494, in _strptime_time tt = _strptime(data_string, format)[0] File "/Library/Frameworks/Python.framework/Versions/3.3/lib/ python3.3/_strptime.py", line 337, in _strptime (data_string, format)) ValueError: time data '2012-13-29' does not match format '%Y-%m-%d' Names are specific to your locale—internationalization settings for your operating system. To print different month and day names, change your locale by using setlocale(); its first argument is locale.LC_TIME for dates and times, and the second is a string combining the language and country abbreviation. Let’s invite some international friends to a Halloween party. We’ll print the month, day, and day of week in US English, French, German, Spanish, and Icelandic. (What? You think Icelanders don’t enjoy a good party as much as anyone else? They even have real elves.) >>> import locale >>> from datetime import date >>> halloween = date(2014, 10, 31) >>> for lang_country in ['en_us', 'fr_fr', 'de_de', 'es_es', 'is_is',]: ... locale.setlocale(locale.LC_TIME, lang_country) ... halloween.strftime('%A, %B %d') ... 'en_us' 'Friday, October 31' 'fr_fr' 'Vendredi, octobre 31' 'de_de' 'Freitag, Oktober 31' 'es_es' 'viernes, octubre 31' 'is_is' 'föstudagur, október 31' >>> —————————————— Read and Write Dates and Times(3) Where do you find these magic values for lang_country? This is a bit wonky, but you can try this to get all of them (there are a few hundred): >>> import locale >>> names = locale.locale_alias.keys() From names, let’s get just locale names that seem to work with setlocale(), such as the ones we used in the preceding example—a two-character language code followed by an underscore and a two-character country code: >>> good_names = [name for name in names if \ len(name) == 5 and name[2] == '_'] What do the first five look like? >>> good_names[:5] ['sr_cs', 'de_at', 'nl_nl', 'es_ni', 'sp_yu'] So, if you wanted all the German language locales, try this: >>> de = [name for name in good_names if name.startswith('de')] >>> de ['de_at', 'de_de', 'de_ch', 'de_lu', 'de_be'] —————————————— Alternative Modules If you find the standard library modules confusing, or lacking a particular conversion that you want, there are many third-party alternatives. Here are just a few of them: arrow This combines many date and time functions with a simple API. dateutil This module parses almost any date format and handles relative dates and times well. iso8601 This fills in gaps in the standard library for the ISO8601 format. fleming This module offers many time zone functions. —————————————— data 数据 date 日期 —————————————— Things to Do —————————————— 10.1 Write the current date as a string to the text file today.txt. from datetime import datetime now = datetime.now() time_string = '{}-{}-{}'.format(now.year, now.month, now.day) with open('today.txt', 'w') as f: f.write(time_string) —————————————— 10.2 Read the text file today.txt into the string today_string. with open('today.txt', 'r') as f: today_string = f.read() print(today_string) —————————————— 10.3 Parse the date from today_string. import time fmt = '%Y-%m-%d' today_string = '2016-7-17' print(time.strptime(today_string, fmt)) —————————————— 10.4 List the files in your current directory. import os for content in os.listdir('.'): print(content) —————————————— 10.5 List the files in your parent directory. import os for content in os.listdir('..'): print(content) —————————————— random 随机 range 范围 —————————————— 10.6 Use multiprocessing to create three separate processes. Make each one wait a random number of seconds between one and five, print the current time, and then exit. import multiprocessing def do_this(): import time import random from datetime import datetime time.sleep(random.randrange(1,5)) print(datetime.now().strftime('%Y-%m-%d %H:%M:%S')) if __name__ == '__main__': for n in range(3): p = multiprocessing.Process(target=do_this) p.start() —————————————— 10.7 Create a date object of your day of birth. from datetime import date birth = date(1998, 3, 29) print(type(birth)) —————————————— 10.8 What day of the week was your day of birth? from datetime import date birth = date(1998, 3, 29) fmt = 'My birthday is %A.' print(birth.strftime(fmt)) —————————————— 10.9 When will you be (or when were you) 10,000 days old? datetime import date from datetime import timedelta birth = date(1998, 3, 29) interval = timedelta(days=10000) print(birth + interval) —————————————— sequential 顺序 concurrency 并发性 distributed 分布式 robustness 稳健性;健壮性 duplicate 重复的 Simplicity 简单 footloose 自由自在的 coroutines 协同程序 —————————————— Chapter 11. Concurrency and Networks Time is nature’s way of keeping everything from happening at once. Space is what prevents everything from happening to me. —Quotes about Time So far, most of the programs that you’ve written run in one place (a single machine) and one line at a time (sequential). But, we can do more than one thing at a time (concurrency) and in more than one place (distributed computing or networking). There are many good reasons to challenge time and space: Performance Your goal is to keep fast components busy, not waiting for slow ones. Robustness There’s safety in numbers, so you want to duplicate tasks to work around hardware and software failures. Simplicity It’s best practice to break complex tasks into many little ones that are easier to create, understand, and fix. Communication It’s just plain fun to send your footloose bytes to distant places, and bring friends back with them. We’ll start with concurrency, first building on the non-networking techniques that are described in Chapter 10—processes and threads. Then we’ll look at other approaches, such as callbacks, green threads, and coroutines. Finally, we’ll arrive at networking, initially as a concurrency technique, and then spreading outward. Note Some Python packages discussed in this chapter were not yet ported to Python 3 when this was written. In many cases, I’ll show example code that would need to be run with a Python 2 interpreter, which we’re calling python2. —————————————— synchronous 同步 asynchronous 异步 bound 约束 crunching 处理 invoking 调用 bottlenecks 瓶颈 odds 几率 —————————————— Concurrency The official Python site discusses concurrency in general and in the standard library. Those pages have many links to various packages and techniques; we’ll show the most useful ones in this chapter. In computers, if you’re waiting for something, it’s usually for one of two reasons: I/O bound This is by far more common. Computer CPUs are ridiculously fast—hundreds of times faster than computer memory and many thousands of times faster than disks or networks. CPU bound This happens with number crunching tasks such as scientific or graphic calculations. Two more terms are related to concurrency: synchronous One thing follows the other, like a funeral procession. asynchronous Tasks are independent, like party-goers dropping in and tearing off in separate cars. As you progress from simple systems and tasks to real-life problems, you’ll need at some point to deal with concurrency. Consider a website, for example. You can usually provide static and dynamic pages to web clients fairly quickly. A fraction of a second is considered interactive, but if the display or interaction takes longer, people become impatient. Tests by companies such as Google and Amazon showed that traffic drops off quickly if the page loads even a little slower. But what if you can’t help it when something takes a long time, such as uploading a file, resizing an image, or querying a database? You can’t do it within your synchronous web server code anymore, because someone’s waiting. On a single machine, if you want to perform multiple tasks as fast as possible, you want to make them independent. Slow tasks shouldn’t block all the others. Programs and Processes demonstrates how multiprocessing can be used to overlap work on a single machine. If you needed to resize an image, your web server code could call a separate, dedicated image resizing process to run asynchronously and concurrently. It could scale your application horizontally by invoking multiple resizing processes. The trick is getting them all to work with one another. Any shared control or state means that there will be bottlenecks. An even bigger trick is dealing with failures, because concurrent computing is harder than regular computing. Many more things can go wrong, and your odds of end-to-end success are lower. All right. What methods can help you to deal with these complexities? Let’s begin with a good way to manage multiple tasks: queues. —————————————— Queues 队列 stuck 卡住了 batch 批处理 accumulate 积累 overall 整体 barn 谷仓 —————————————— Queues A queue is like a list: things are added at one end and taken away from the other. The most common is referred to as FIFO (first in, first out). Suppose that you’re washing dishes. If you’re stuck with the entire job, you need to wash each dish, dry it, and put it away. You can do this in a number of ways. You might wash the first dish, dry it, and then put it away. You then repeat with the second dish, and so on. Or, you might batch operations and wash all the dishes, dry them all, and then put them away; this assumes you have space in your sink and drainer for all the dishes that accumulate at each step. These are all synchronous approaches—one worker, one thing at a time. As an alternative, you could get a helper or two. If you’re the washer, you can hand each cleaned dish to the dryer, who hands each dried dish to the put-away-er (look it up; it’s absolutely a real word!). As long as each of you works at the same pace, you should finish much faster than by yourself. However, what if you wash faster than the dryer dries? Wet dishes either fall on the floor, or you pile them up between you and the dryer, or you just whistle off-key until the dryer is ready. And if the last person is slower than the dryer, dry dishes can end up falling on the floor, or piling up, or the dryer does the whistling. You have multiple workers, but the overall task is still synchronous and can proceed only as fast as the slowest worker. Many hands make light work, goes the old saying (I always thought it was Amish, because it makes me think of barn building). Adding workers can build a barn, or do the dishes, faster. This involves queues. In general, queues transport messages, which can be any kind of information. In this case, we’re interested in queues for distributed task management, also known as work queues, job queues, or task queues. Each dish in the sink is given to an available washer, who washes and hands it off to the first available dryer, who dries and hands it to a put-away-er. This can be synchronous (workers wait for a dish to handle and another worker to whom to give it), or asynchronous (dishes are stacked between workers with different paces). As long as you have enough workers, and they keep up with the dishes, things move a lot faster. —————————————— simulate 模拟 intermediate 中间 —————————————— Processes You can implement queues in many ways. For a single machine, the standard library’s multiprocessing module (which you can see in Programs and Processes) contains a Queue function. Let’s simulate just a single washer and multiple dryer processes (someone can put the dishes away later) and an intermediate dish_queue. Call this program dishes.py: import multiprocessing as mp def washer(dishes, output): for dish in dishes: print('Washing', dish, 'dish') output.put(dish) def dryer(input): while True: dish = input.get() print('Drying', dish, 'dish') input.task_done() dish_queue = mp.JoinableQueue() dryer_proc = mp.Process(target=dryer, args=(dish_queue,)) dryer_proc.daemon = True dryer_proc.start() dishes = ['salad', 'bread', 'entree', 'dessert'] washer(dishes, dish_queue) dish_queue.join() Run your new program thusly: $ python dishes.py Washing salad dish Washing bread dish Washing entree dish Washing dessert dish Drying salad dish Drying bread dish Drying entree dish Drying dessert dish This queue looked a lot like a simple Python iterator, producing a series of dishes. It actually started up separate processes along with the communication between the washer and dryer. I used a JoinableQueue and the final join() method to let the washer know that all the dishes have been dried. There are other queue types in the multiprocessing module, and you can read the documentation for more examples. If you can't understand the codes showed above, it's doesn't matter, because me couldn't understand it either. —————————————— counterpart 配对 paranormal 超自然现象的 investigator 调查员 roam 漫游 apprehensively 担心地 candlestick 烛台 entity 实体 marbles 玻璃球 ablaze 闪亮,着火 Despite 尽管 brandy 白兰地酒 evaporation 蒸发 Ghostbuster 捉鬼敢死队 notoriously 出了名的 recommendations 建议 —————————————— Threads(1) A thread runs within a process with access to everything in the process, similar to a multiple personality. The multiprocessing module has a cousin called threading that uses threads instead of processes (actually, multiprocessing was designed later as its process-based counterpart). Let’s redo our process example with threads: import threading def do_this(what): whoami(what) def whoami(what): print("Thread %s says: %s" % (threading.current_thread(), what)) if __name__ == "__main__": whoami("I'm the main program") for n in range(4): p = threading.Thread(target=do_this, args=("I'm function %s" % n,)) p.start() Here’s what prints for me: Thread <_MainThread(MainThread, started 140735207346960)> says: I'm the main program Thread says: I'm function 0 Thread says: I'm function 1 Thread says: I'm function 2 Thread says: I'm function 3 We can reproduce our process-based dish example by using threads: import threading, queue import time def washer(dishes, dish_queue): for dish in dishes: print ("Washing", dish) time.sleep(5) dish_queue.put(dish) def dryer(dish_queue): while True: dish = dish_queue.get() print ("Drying", dish) time.sleep(10) dish_queue.task_done() dish_queue = queue.Queue() for n in range(2): dryer_thread = threading.Thread(target=dryer, args=(dish_queue,)) dryer_thread.start() dishes = ['salad', 'bread', 'entree', 'desert'] washer(dishes, dish_queue) dish_queue.join() One difference between multiprocessing and threading is that threading does not have a terminate() function. There’s no easy way to terminate a running thread, because it can cause all sorts of problems in your code, and possibly in the space-time continuum itself. Threads can be dangerous. Like manual memory management in languages such as C and C++, they can cause bugs that are extremely hard to find, let alone fix. To use threads, all the code in the program—and in external libraries that it uses—must be thread-safe. In the preceding example code, the threads didn’t share any global variables, so they could run independently without breaking anything. Imagine that you’re a paranormal investigator in a haunted house. Ghosts roam the halls, but none are aware of the others, and at any time, any of them can view, add, remove, or move any of the house’s contents. You’re walking apprehensively through the house, taking readings with your impressive instruments. Suddenly you notice that the candlestick you passed seconds ago is now missing. The contents of the house are like the variables in a program. The ghosts are threads in a process (the house). If the ghosts only looked at the house’s contents, there would be no problem. It’s like a thread reading the value of a constant or variable without trying to change it. Yet, some unseen entity could grab your flashlight, blow cold air down your neck, put marbles on the stairs, or make the fireplace come ablaze. The really subtle ghosts would change things in other rooms that you might never notice. Despite your fancy instruments, you’d have a very hard time figuring out who did it, and how, and when. —————————————— Threads(2) If you used multiple processes instead of threads, it would be like having a number of houses but with only one (living) person in each. If you put your brandy in front of the fireplace, it would still be there an hour later. Some lost to evaporation, perhaps, but in the same place. Threads can be useful and safe when global data is not involved. In particular, threads are useful for saving time while waiting for some I/O operation to complete. In these cases, they don’t have to fight over data, because each has completely separate variables. But threads do sometimes have good reasons to change global data. In fact, one common reason to launch multiple threads is to let them divide up the work on some data, so a certain degree of change to the data is expected. The usual way to share data safely is to apply a software lock before modifying a variable in a thread. This keeps the other threads out while the change is made. It’s like having a Ghostbuster guard the room you want to remain unhaunted. The trick, though, is that you need to remember to unlock it. Plus, locks can be nested—what if another Ghostbuster is also watching the same room, or the house itself? The use of locks is traditional but notoriously hard to get right. Note In Python, threads do not speed up CPU-bound tasks because of an implementation detail in the standard Python system called the Global Interpreter Lock (GIL). This exists to avoid threading problems in the Python interpreter, and can actually make a multithreaded program slower than its single-threaded counterpart, or even a multi-process version. So for Python, the recommendations are as follows: ■Use threads for I/O bound problems ■Use processes, networking, or events (discussed in the next section) for CPU-bound problems —————————————— doles out 发放 imperative 必要的 variation 变异 —————————————— Don't forget an important thing, that is, this book is just an introduction. —————————————— Green Threads and gevent(1) As you’ve seen, developers traditionally avoid slow spots in programs by running them in separate threads or processes. The Apache web server is an example of this design. One alternative is event-based programming. An event-based program runs a central event loop, doles out any tasks, and repeats the loop. The nginx web server follows this design, and is generally faster than Apache. The gevent library is event-based and accomplishes a cool trick: you write normal imperative code, and it magically converts pieces to coroutines. These are like generators that can communicate with one another and keep track of where they are. gevent modifies many of Python’s standard objects such as socket to use its mechanism instead of blocking. This does not work with Python add-in code that was written in C, as some database drivers are. Note As of this writing, gevent was not completely ported to Python 3, so these code examples use the Python 2 tools pip2 and python2. You install gevent by using the Python 2 version of pip: $ pip2 install gevent Here’s a variation of sample code at the gevent website. You’ll see the socket module’s gethostbyname() function in the upcoming DNS section. This function is synchronous, so you wait (possibly many seconds) while it chases name servers around the world to look up that address. But you could use the gevent version to look up multiple sites independently. Save this as gevent_test.py: import gevent from gevent import socket hosts = ['www.crappytaxidermy.com', 'www.walterpottertaxidermy.com', 'www.antique-taxidermy.com'] jobs = [gevent.spawn(gevent.socket.gethostbyname, host) for host in hosts] gevent.joinall(jobs, timeout=5) for job in jobs: print(job.value) There’s a one-line for-loop in the preceding example. Each hostname is submitted in turn to a gethostbyname() call, but they can run asynchronously because it’s the gevent version of gethostbyname(). Run gevent_test.py with Python 2 by typing the following (in bold): $ python2 gevent_test.py 66.6.44.4 74.125.142.121 78.136.12.50 gevent.spawn() creates a greenlet (also known sometimes as a green thread or a microthread) to execute each gevent.socket.gethostbyname(url). The difference from a normal thread is that it doesn’t block. If something occurred that would have blocked a normal thread, gevent switches control to one of the other greenlets. —————————————— twisted twisted is an asynchronous, event-driven networking framework. You connect functions to events such as data received or connection closed, and those functions are called when those events occur. This is a callback design, and if you’ve written anything in JavaScript, it might seem familiar. If it’s new to you, it can seem backwards. For some developers, callback-based code becomes harder to manage as the application grows. Like gevent, twisted has not yet been ported to Python 3. We’ll use the Python 2 installer and interpreter for this section. Type the following to install it: $ pip2 install twisted twisted is a large package, with support for many Internet protocols on top of TCP and UDP. To be short and simple, we’ll show a little knock-knock server and client, adapted from twisted examples. First, let’s look at the server, knock_server.py (notice the Python 2 syntax for print()): from twisted.internet import protocol, reactor class Knock(protocol.Protocol): def dataReceived(self, data): print 'Client:', data if data.startswith("Knock knock"): response = "Who's there?" else: response = data + " who?" print 'Server:', response self.transport.write(response) class KnockFactory(protocol.Factory): def buildProtocol(self, addr): return Knock() reactor.listenTCP(8000, KnockFactory()) reactor.run() Now, let’s take a glance at its trusty companion, knock_client.py: from twisted.internet import reactor, protocol class KnockClient(protocol.Protocol): def connectionMade(self): self.transport.write("Knock knock") def dataReceived(self, data): if data.startswith("Who's there?"): response = "Disappearing client" self.transport.write(response) else: self.transport.loseConnection() reactor.stop() class KnockFactory(protocol.ClientFactory): protocol = KnockClient def main(): f = KnockFactory() reactor.connectTCP("localhost", 8000, f) reactor.run() if __name__ == '__main__': main() Start the server first: $ python2 knock_server.py Then start the client: $ python2 knock_client.py The server and client exchange messages, and the server prints the conversation: Client: Knock knock Server: Who's there? Client: Disappearing client Server: Disappearing client who? Our trickster client then ends, keeping the server waiting for the punch line. If you’d like to enter the twisted passages, try some of the other examples from its documentation. —————————————— reconcile 调和 proposed 提出了 —————————————— asyncio Recently, Guido van Rossum (remember him?) became involved with the Python concurrency issue. Many packages had their own event loop, and each event loop kind of likes to be the only one. How could he reconcile mechanisms such as callbacks, greenlets, and others? After many discussions and visits, he proposed Asynchronous IO Support Rebooted: the “asyncio” Module, code-named Tulip. This first appeared in Python 3.4 as the asyncio module. For now, it offers a common event loop that could be compatible with twisted, gevent, and other asynchronous methods. The goal is to provide a standard, clean, well-performing asynchronous API. Watch it expand in future releases of Python. —————————————— mingled 混杂在一起 sentinel 哨兵 —————————————— Following part is very interesting about concurrency. —————————————— Redis(1) Our earlier dishwashing code examples, using processes or threads, were run on a single machine. Let’s take another approach to queues that can run on a single machine or across a network. Even with multiple singing processes and dancing threads, sometimes one machine isn’t enough, You can treat this section as a bridge between single-box (one machine) and multiple-box concurrency. To try the examples in this section, you’ll need a Redis server and its Python module. You can see where to get them in Redis. In that chapter, Redis’s role is that of a database. Here, we’re featuring its concurrency personality. A quick way to make a queue is with a Redis list. A Redis server runs on one machine; this can be the same one as its clients, or another that the clients can access through a network. In either case, clients talk to the server via TCP, so they’re networking. One or more provider clients pushes messages onto one end of the list. One or more client workers watches this list with a blocking pop operation. If the list is empty, they all just sit around playing cards. As soon as a message arrives, the first eager worker gets it. Like our earlier process- and thread-based examples, redis_washer.py generates a sequence of dishes: import redis conn = redis.Redis() print('Washer is starting') dishes = ['salad', 'bread', 'entree', 'dessert'] for dish in dishes: msg = dish.encode('utf-8') conn.rpush('dishes', msg) print('Washed', num) conn.rpush('dishes', 'quit') print('Washer is done') The loop generates four messages containing a dish name, followed by a final message that says “quit.” It appends each message to a list called dishes in the Redis server, similar to appending to a Python list. And as soon as the first dish is ready, redis_dryer.py does its work: import redis conn = redis.Redis() print('Dryer is starting') while True: msg = conn.blpop('dishes') if not msg: break val = msg[1].decode('utf-8') if val == 'quit': break print('Dried', val) print('Dishes are dried') This code waits for messages whose first token is “dishes” and prints that each one is dried. It obeys the quit message by ending the loop. —————————————— Redis(2) Start the dryer, and then the washer. Using the & at the end puts the first program in the background; it keeps running, but doesn’t listen to the keyboard anymore. This works on Linux, OS X, and Windows, although you might see different output on the next line. In this case (OS X), it’s some information about the background dryer process. Then, we start the washer process normally (in the foreground). You’ll see the mingled output of the two processes: $ python redis_dryer.py & [2] 81691 Dryer is starting $ python redis_washer.py Washer is starting Washed salad Dried salad Washed bread Dried bread Washed entree Dried entree Washed dessert Washer is done Dried dessert Dishes are dried [2]+ Done python redis_dryer.py As soon as dish IDs started arriving at Redis from the washer process, our hard-working dryer process started pulling them back out. Each dish ID was a number, except the final sentinel value, the string 'quit'. When the dryer process read that quit dish ID, it quit, and some more background process information printed to the terminal (also system-dependent). You can use a sentinel (an otherwise invalid value) to indicate something special from the data stream itself—in this case, that we’re done. Otherwise, we’d need to add a lot more program logic, such as the following: ■Agreeing ahead of time on some maximum dish number, which would kind of be a sentinel anyway. ■Doing some special out-of-band (not in the data stream) interprocess communication. ■Timing out after some interval with no new data. Let’s make a few last changes: ■Create multiple dryer processes. ■Add a timeout to each dryer rather than looking for a sentinel. The new redis_dryer2.py: def dryer(): import redis import os import time conn = redis.Redis() pid = os.getpid() timeout = 20 print('Dryer process %s is starting' % pid) while True: msg = conn.blpop('dishes', timeout) if not msg: break val = msg[1].decode('utf-8') if val == 'quit': break print('%s: dried %s' % (pid, val)) time.sleep(0.1) print('Dryer process %s is done' % pid) import multiprocessing DRYERS=3 for num in range(DRYERS): p = multiprocessing.Process(target=dryer) p.start() Start the dryer processes in the background, and then the washer process in the foreground: $ python redis_dryer2.py & Dryer process 44447 is starting Dryer process 44448 is starting Dryer process 44446 is starting $ python redis_washer.py Washer is starting Washed salad 44447: dried salad Washed bread 44448: dried bread Washed entree 44446: dried entree Washed dessert Washer is done 44447: dried dessert One dryer process reads the quit ID and quits: Dryer process 44448 is done After 20 seconds, the other dryer processes get a return value of None from their blpop calls, indicating that they’ve timed out. They say their last words and exit: Dryer process 44447 is done Dryer process 44446 is done After the last dryer subprocess quits, the main dryer program ends: [1]+ Done python redis_dryer2.py —————————————— assembly 组装 banquet 宴会 pending 等待 —————————————— Beyond Queues With more moving parts, there are more possibilities for our lovely assembly lines to be disrupted. If we need to wash the dishes from a banquet, do we have enough workers? What if the dryers get drunk? What if the sink clogs? Worries, worries! How will you cope with it all? Fortunately, there are some techniques available that you can apply. They include the following: Fire and forget Just pass things on and don’t worry about the consequences, even if no one is there. That’s the dishes-on-the-floor approach. Request-reply The washer receives an acknowledgement from the dryer, and the dryer from the put-away-er, for each dish in the pipeline. Back pressure or throttling This technique directs a fast worker to take it easy if someone downstream can’t keep up. In real systems, you need to be careful that workers are keeping up with the demand; otherwise, you hear the dishes hitting the floor. You might add new tasks to a pending list, while some worker process pops the latest message and adds it to a working list. When the message is done, it’s removed from the working list and added to a completed list. This lets you know what tasks have failed or are taking too long. You can do this with Redis yourself, or use a system that someone else has already written and tested. Some Python-based queue packages that add this extra level of management—some of which use Redis—include: celery This particular package is well worth a look. It can execute distributed tasks synchronously or asynchronously, using the methods we’ve discussed: multiprocessing, gevent, and others. thoonk This package builds on Redis to provide job queues and pub-sub (coming in the next section). rq This is a Python library for job queues, also based on Redis. Queues This site offers a discussion of queuing software, Python-based and otherwise. —————————————— span 跨度 distributing 分发 —————————————— Networks In our discussion of concurrency, we talked mostly about time: single-machine solutions (processes, threads, green threads). We also briefly touched upon some solutions that can span networks (Redis, ZeroMQ). Now, we’ll look at networking in its own right, distributing computing across space. —————————————— fanout 扇出 fanin 扇入 subscribe 订阅 —————————————— Patterns You can build networking applications from some basic patterns. The most common pattern is request-reply, also known as client-server. This pattern is synchronous: the client waits until the server responds. You’ve seen many examples of request-reply in this book. Your web browser is also a client, making an HTTP request to a web server, which returns a reply. Another common pattern is push, or fanout: you send data to any available worker in a pool of processes. An example is a web server behind a load balancer. The opposite of push is pull, or fanin: you accept data from one or more sources. An example would be a logger that takes text messages from multiple processes and writes them to a single log file. One pattern is similar to radio or television broadcasting: publish-subscribe, or pub-sub. With this pattern, a publisher sends out data. In a simple pub-sub system, all subscribers would receive a copy. More often, subscribers can indicate that they’re interested only in certain types of data (often called a topic), and the publisher will send just those. So, unlike the push pattern, more than one subscriber might receive a given piece of data. If there’s no subscriber for a topic, the data is ignored. —————————————— The Publish-Subscribe Model Publish-subscribe is not a queue but a broadcast. One or more processes publish messages. Each subscriber process indicates what type of messages it would like to receive. A copy of each message is sent to each subscriber that matched its type. Thus, a given message might be processed once, more than once, or not at all. Each publisher is just broadcasting and doesn’t know who—if anyone—is listening. —————————————— emits 发出 breed 品种 accompanying 伴随 criteria 标准 —————————————— Redis You can build a quick pub-sub system by using Redis. The publisher emits messages with a topic and a value, and subscribers say which topics they want to receive. Here’s the publisher, redis_pub.py: import redis import random conn = redis.Redis() cats = ['siamese', 'persian', 'maine coon', 'norwegian forest'] hats = ['stovepipe', 'bowler', 'tam-o-shanter', 'fedora'] for msg in range(10): cat = random.choice(cats) hat = random.choice(hats) print('Publish: %s wears a %s' % (cat, hat)) conn.publish(cat, hat) Each topic is a breed of cat, and the accompanying message is a type of hat. Here’s a single subscriber, redis_sub.py: import redis conn = redis.Redis() topics = ['maine coon', 'persian'] sub = conn.pubsub() sub.subscribe(topics) for msg in sub.listen(): if msg['type'] == 'message': cat = msg['channel'] hat = msg['data'] print('Subscribe: %s wears a %s' % (cat, hat)) The subscriber just shown wants all messages for cat types 'maine coon' and 'persian', and no others. The listen() method returns a dictionary. If its type is 'message', it was sent by the publisher and matches our criteria. The 'channel' key is the topic (cat), and the 'data' key contains the message (hat). If you start the publisher first and no one is listening, it’s like a mime falling in the forest (does he make a sound?), so start the subscriber first: $ python redis_sub.py Next, start the publisher. It will send 10 messages, and then quit: $ python redis_pub.py Publish: maine coon wears a stovepipe Publish: norwegian forest wears a stovepipe Publish: norwegian forest wears a tam-o-shanter Publish: maine coon wears a bowler Publish: siamese wears a stovepipe Publish: norwegian forest wears a tam-o-shanter Publish: maine coon wears a bowler Publish: persian wears a bowler Publish: norwegian forest wears a bowler Publish: maine coon wears a stovepipe The subscriber cares about only two types of cat: $ python redis_sub.py Subscribe: maine coon wears a stovepipe Subscribe: maine coon wears a bowler Subscribe: maine coon wears a bowler Subscribe: persian wears a bowler Subscribe: maine coon wears a stovepipe We didn’t tell the subscriber to quit, so it’s still waiting for messages. If you restart the publisher, the subscriber will grab a few more messages and print them. You can have as many subscribers (and publishers) as you want. If there’s no subscriber for a message, it disappears from the Redis server. However, if there are subscribers, the messages stay in the server until all subscribers have retrieved them. —————————————— bind 绑定 —————————————— ZeroMQ Remember those ZeroMQ PUB and SUB sockets from a few pages ago? This is what they’re for. ZeroMQ has no central server, so each publisher writes to all subscribers. Let’s rewrite the cat-hat pub-sub for ZeroMQ. The publisher, zmq_pub.py, looks like this: import zmq import random import time host = '*' port = 6789 ctx = zmq.Context() pub = ctx.socket(zmq.PUB) pub.bind('tcp://%s:%s' % (host, port)) cats = ['siamese', 'persian', 'maine coon', 'norwegian forest'] hats = ['stovepipe', 'bowler', 'tam-o-shanter', 'fedora'] time.sleep(1) for msg in range(10): cat = random.choice(cats) cat_bytes = cat.encode('utf-8') hat = random.choice(hats) hat_bytes = hat.encode('utf-8') print('Publish: %s wears a %s' % (cat, hat)) pub.send_multipart([cat_bytes, hat_bytes]) Notice how this code uses UTF-8 encoding for the topic and value strings. The file for the subscriber is zmq_sub.py: import zmq host = '127.0.0.1' port = 6789 ctx = zmq.Context() sub = ctx.socket(zmq.SUB) sub.connect('tcp://%s:%s' % (host, port)) topics = ['maine coon', 'persian'] for topic in topics: sub.setsockopt(zmq.SUBSCRIBE, topic.encode('utf-8')) while True: cat_bytes, hat_bytes = sub.recv_multipart() cat = cat_bytes.decode('utf-8') hat = hat_bytes.decode('utf-8') print('Subscribe: %s wears a %s' % (cat, hat)) In this code, we subscribe to two different byte values: the two strings in topics, encoded as UTF-8. Note It seems a little backward, but if you want all topics, you need to subscribe to the empty bytestring b''; if you don’t, you’ll get nothing. Notice that we call send_multipart() in the publisher and recv_multipart() in the subscriber. This makes it possible for us to send multipart messages, and use the first part as the topic. We could also send the topic and message as a single string or bytestring, but it seems cleaner to keep cats and hats separate. Start the subscriber: $ python zmq_sub.py Start the publisher. It immediately sends 10 messages, and then quits: $ python zmq_pub.py Publish: norwegian forest wears a stovepipe Publish: siamese wears a bowler Publish: persian wears a stovepipe Publish: norwegian forest wears a fedora Publish: maine coon wears a tam-o-shanter Publish: maine coon wears a stovepipe Publish: persian wears a stovepipe Publish: norwegian forest wears a fedora Publish: norwegian forest wears a bowler Publish: maine coon wears a bowler The subscriber prints what it requested and received: Subscribe: persian wears a stovepipe Subscribe: maine coon wears a tam-o-shanter Subscribe: maine coon wears a stovepipe Subscribe: persian wears a stovepipe Subscribe: maine coon wears a bowler —————————————— broker 代理 mellifluous 流畅的 —————————————— Other Pub-sub Tools You might like to explore some of these other Python pub-sub links: RabbitMQ This is a well-known messaging broker, and pika is a Python API for it. See the pika documentation and a pub-sub tutorial. pypi.python.org Go to the upper-right corner of the search window and type pubsub to find Python packages like pypubsub. pubsubhubbub This mellifluous protocol enables subscribers to register callbacks with publishers. —————————————— terminate 终止 innovation 创新 conventions 约定 flow 流 Datagram 数据报 Transmission 传输 duplication 重复 acknowledged 告知已收到 handshake 握手 cable 电缆 router 路由器 atop 在 brevity 简洁 —————————————— TCP/IP We’ve been walking through the networking house, taking for granted that whatever’s in the basement works correctly. Now, let’s actually visit the basement and look at the wires and pipes that keep everything running above ground. The Internet is based on rules about how to make connections, exchange data, terminate connections, handle timeouts, and so on. These are called protocols, and they are arranged in layers. The purpose of layers is to allow innovation and alternative ways of doing things; you can do anything you want on one layer as long as you follow the conventions in dealing with the layers above and below you. The very lowest layer governs aspects such as electrical signals; each higher layer builds on those below. In the middle, more or less, is the IP (Internet Protocol) layer, which specifies how network locations are addressed and how packets (chunks) of data flow. In the layer above that, two protocols describe how to move bytes between locations: UDP (User Datagram Protocol) This is used for short exchanges. A datagram is a tiny message sent in a single burst, like a note on a postcard. TCP (Transmission Control Protocol) This protocol is used for longer-lived connections. It sends streams of bytes and ensures that they arrive in order without duplication. UDP messages are not acknowledged, so you’re never sure if they arrive at their destination. If you wanted to tell a joke over UDP: Here's a UDP joke. Get it? TCP sets up a secret handshake between sender and receiver to ensure a good connection. A TCP joke would start like this: Do you want to hear a TCP joke? Yes, I want to hear a TCP joke. Okay, I'll tell you a TCP joke. Okay, I'll hear a TCP joke. Okay, I'll send you a TCP joke now. Okay, I'll receive the TCP joke now. ... (and so on) Your local machine always has the IP address 127.0.0.1 and the name localhost. You might see this called the loopback interface. If it’s connected to the Internet, your machine will also have a public IP. If you’re just using a home computer, it’s behind equipment such as a cable modem or router. You can run Internet protocols even between processes on the same machine. Most of the Internet with which we interact—the Web, database servers, and so on—is based on the TCP protocol running atop the IP protocol; for brevity, TCP/IP. Let’s first look at some basic Internet services. After that, we’ll explore general networking patterns. —————————————— tedious 单调乏味的 eerie 怪异的 cope 应对 reassemble 重新组装 —————————————— Sockets(1) We’ve saved this topic until now because you don’t need to know all the low-level details to use the higher levels of the Internet. But if you like to know how things work, this is for you. The lowest level of network programming uses a socket, borrowed from the C language and the Unix operating system. Socket-level coding is tedious. You’ll have more fun using something like ZeroMQ, but it’s useful to see what lies beneath. For instance, messages about sockets often turn up when networking errors take place. Let’s write a very simple client-server exchange. The client sends a string in a UDP datagram to a server, and the server returns a packet of data containing a string. The server needs to listen at a particular address and port—like a post office and a post office box. The client needs to know these two values to deliver its message, and receive any reply. In the following client and server code, address is a tuple of (address, port). The address is a string, which can be a name or an IP address. When your programs are just talking to one another on the same machine, you can use the name 'localhost' or the equivalent address '127.0.0.1'. First, let’s send a little data from one process to another and return a little data back to the originator. The first program is the client and the second is the server. In each program, we’ll print the time and open a socket. The server will listen for connections to its socket, and the client will write to its socket, which transmits a message to the server. Here’s the first program, udp_server.py: from datetime import datetime import socket server_address = ('localhost', 6789) max_size = 4096 print('Starting the server at', datetime.now()) print('Waiting for a client to call.') server = socket.socket(socket.AF_INET, socket.SOCK_DGRAM) server.bind(server_address) data, client = server.recvfrom(max_size) print('At', datetime.now(), client, 'said', data) server.sendto(b'Are you talking to me?', client) server.close() The server has to set up networking through two methods imported from the socket package. The first method, socket.socket, creates a socket, and the second, bind, binds to it (listens to any data arriving at that IP address and port). AF_INET means we’ll create an Internet (IP) socket. (There’s another type for Unix domain sockets, but those work only on the local machine.) SOCK_DGRAM means we’ll send and receive datagrams—in other words, we’ll use UDP. At this point, the server sits and waits for a datagram to come in (recvfrom). When one arrives, the server wakes up and gets both the data and information about the client. The client variable contains the address and port combination needed to reach the client. The server ends by sending a reply and closing its connection. Let’s take a look at udp_client.py: import socket from datetime import datetime server_address = ('localhost', 6789) max_size = 4096 print('Starting the client at', datetime.now()) client = socket.socket(socket.AF_INET, socket.SOCK_DGRAM) client.sendto(b'Hey!', server_address) data, server = client.recvfrom(max_size) print('At', datetime.now(), server, 'said', data) client.close() The client has most of the same methods as the server (with the exception of bind()). The client sends and then receives, whereas the server receives first. Start the server first, in its own window. It will print its greeting and then wait with an eerie calm until a client sends it some data: $ python udp_server.py Starting the server at 2014-02-05 21:17: 41.945649 Waiting for a client to call. Next, start the client in another window. It will print its greeting, send data to the server, print the reply, and then exit: $ python udp_client.py Starting the client at 2014-02-05 21:24:56.509682 At 2014-02-05 21:24:56.518670 ('127.0.0.1', 6789) said b'Are you talking to me?' Finally, the server will print something like this, and then exit: At 2014-02-05 21:24:56.518473 ('127.0.0.1', 56267) said b'Hey!' The client needed to know the server’s address and port number but didn’t need to specify a port number for itself. That was automatically assigned by the system—in this case, it was 56267. Note UDP sends data in single chunks. It does not guarantee delivery. If you send multiple messages via UDP, they can arrive out of order, or not at all. It’s fast, light, connectionless, and unreliable. —————————————— Sockets(2) Which brings us to TCP (Transmission Control Protocol). TCP is used for longer-lived connections, such as the Web. TCP delivers data in the order in which you send it. If there were any problems, it tries to send it again. Let’s shoot a few packets from client to server and back with TCP. tcp_client.py acts like the previous UDP client, sending only one string to the server, but there are small differences in the socket calls, illustrated here: import socket from datetime import datetime address = ('localhost', 6789) max_size = 1000 print('Starting the client at', datetime.now()) client = socket.socket(socket.AF_INET, socket.SOCK_STREAM) client.connect(address) client.sendall(b'Hey!') data = client.recv(max_size) print('At', datetime.now(), 'someone replied', data) client.close() We’ve replaced SOCK_DGRAM with SOCK_STREAM to get the streaming protocol, TCP. We also added a connect() call to set up the stream. We didn’t need that for UDP because each datagram was on its own in the wild, wooly Internet. tcp_server.py also differs from its UDP cousin: from datetime import datetime import socket address = ('localhost', 6789) max_size = 1000 print('Starting the server at', datetime.now()) print('Waiting for a client to call.') server = socket.socket(socket.AF_INET, socket.SOCK_STREAM) server.bind(address) server.listen(5) client, addr = server.accept() data = client.recv(max_size) print('At', datetime.now(), client, 'said', data) client.sendall(b'Are you talking to me?') client.close() server.close() server.listen(5) is configured to queue up to five client connections before refusing new ones. server.accept() gets the first available message as it arrives. The client.recv(1000) sets a maximum acceptable message length of 1,000 bytes. As you did earlier, start the server and then the client, and watch the fun. First, the server: $ python tcp_server.py Starting the server at 2014-02-06 22:45:13.306971 Waiting for a client to call. At 2014-02-06 22:45:16.048865 said b'Hey!' Now, start the client. It will send its message to the server, receive a response, and then exit: $ python tcp_client.py Starting the client at 2014-02-06 22:45:16.038642 At 2014-02-06 22:45:16.049078 someone replied b'Are you talking to me?' The server collects the message, prints it, responds, and then quits: At 2014-02-06 22:45:16.048865 said b'Hey!' Notice that the TCP server called client.sendall() to respond, and the earlier UDP server called client.sendto(). TCP maintains the client-server connection across multiple socket calls and remembers the client’s IP address. This didn’t look so bad, but if you try to write anything more complex, you’ll see how low-level sockets really are. Here are some of the complications with which you need to cope: UDP sends messages, but their size is limited, and they’re not guaranteed to reach their destination. TCP sends streams of bytes, not messages. You don’t know how many bytes the system will send or receive with each call. To exchange entire messages with TCP, you need some extra information to reassemble the full message from its segments: a fixed message size (bytes), or the size of the full message, or some delimiting character. Because messages are bytes, not Unicode text strings, you need to use the Python bytes type. For more information on that, see Chapter 7. After all of this, if you find yourself fascinated by socket programming, check out the Python socket programming HOWTO for more details. —————————————— Lego 乐高 impose 强加 —————————————— ZeroMQ(1) We’ve already seen ZeroMQ sockets used for pub-sub. ZeroMQ is a library. Sometimes described as sockets on steroids, ZeroMQ sockets do the things that you sort of expected plain sockets to do: ■Exchange entire messages ■Retry connections ■Buffer data to preserve it when the timing between senders and receivers doesn’t line up The online guide is well written and witty, and it presents the best description of networking patterns that I’ve seen. The printed version (ZeroMQ: Messaging for Many Applications, by Pieter Hintjens, from that animal house, O’Reilly) has that good code smell and a big fish on the cover, rather than the other way around. All the examples in the printed guide are in the C language, but the online version lets you pick from multiple languages for each code example. The Python examples are also viewable. In this chapter, I’ll show you some basic uses for ZeroMQ in Python. ZeroMQ is like a Lego set, and we all know that you can build an amazing variety of things from a few Lego shapes. In this case, you construct networks from a few socket types and patterns. The basic “Lego pieces” presented in the following list are the ZeroMQ socket types, which by some twist of fate look like the network patterns we’ve already discussed: ■ REQ (synchronous request) ■ REP (synchronous reply) ■ DEALER (asynchronous request) ■ ROUTER (asynchronous reply) ■ PUB (publish) ■ SUB (subscribe) ■ PUSH (fanout) ■ PULL (fanin) To try these yourself, you’ll need to install the Python ZeroMQ library by typing this command: $ pip install pyzmq The simplest pattern is a single request-reply pair. This is synchronous: one socket makes a request and then the other replies. First, the code for the reply (server), zmq_server.py: import zmq host = '127.0.0.1' port = 6789 context = zmq.Context() server = context.socket(zmq.REP) server.bind("tcp://%s:%s" % (host, port)) while True: # Wait for next request from client request_bytes = server.recv() request_str = request_bytes.decode('utf-8') print("That voice in my head says: %s" % request_str) reply_str = "Stop saying: %s" % request_str reply_bytes = bytes(reply_str, 'utf-8') server.send(reply_bytes) We create a Context object: this is a ZeroMQ object that maintains state. Then, we make a ZeroMQ socket of type REP (for REPly). We call bind() to make it listen on a particular IP address and port. Notice that they’re specified in a string such as 'tcp://localhost:6789' rather than a tuple, as in the plain socket examples. This example keeps receiving requests from a sender and sending a response. The messages can be very long—ZeroMQ takes care of the details. Following is the code for the corresponding request (client), zmq_client.py. Its type is REQ (for REQuest), and it calls connect() rather than bind(). import zmq host = '127.0.0.1' port = 6789 context = zmq.Context() client = context.socket(zmq.REQ) client.connect("tcp://%s:%s" % (host, port)) for num in range(1, 6): request_str = "message #%s" % num request_bytes = request_str.encode('utf-8') client.send(request_bytes) reply_bytes = client.recv() reply_str = reply_bytes.decode('utf-8') print("Sent %s, received %s" % (request_str, reply_str)) Now it’s time to start them. One interesting difference from the plain socket examples is that you can start the server and client in either order. Go ahead and start the server in one window in the background: $ python zmq_server.py & Start the client in the same window: $ python zmq_client.py You’ll see these alternating output lines from the client and server: That voice in my head says 'message #1' Sent 'message #1', received 'Stop saying message #1' That voice in my head says 'message #2' Sent 'message #2', received 'Stop saying message #2' That voice in my head says 'message #3' Sent 'message #3', received 'Stop saying message #3' That voice in my head says 'message #4' Sent 'message #4', received 'Stop saying message #4' That voice in my head says 'message #5' Sent 'message #5', received 'Stop saying message #5' Our client ends after sending its fifth message, but we didn’t tell the server to quit, so it sits by the phone, waiting for another message. If you run the client again, it will print the same five lines, and the server will print its five also. If you don’t kill the zmq_server.py process and try to run another one, Python will complain that the address is already is use: $ python zmq_server.py & [2] 356 Traceback (most recent call last): File "zmq_server.py", line 7, in server.bind("tcp://%s:%s" % (host, port)) File "socket.pyx", line 444, in zmq.backend.cython.socket.Socket.bind (zmq/backend/cython/socket.c:4076) File "checkrc.pxd", line 21, in zmq.backend.cython.checkrc._check_rc (zmq/backend/cython/socket.c:6032) zmq.error.ZMQError: Address already in use —————————————— ZeroMQ(2) The messages need to be sent as byte strings, so we encoded our example’s text strings in UTF-8 format. You can send any kind of message you like, as long as you convert it to bytes. We used simple text strings as the source of our messages, so encode() and decode() were enough to convert to and from byte strings. If your messages have other data types, you can use a library such as MessagePack. Even this basic REQ-REP pattern allows for some fancy communication patterns, because any number of REQ clients can connect() to a single REP server. The server handles requests one at a time, synchronously, but doesn’t drop other requests that are arriving in the meantime. ZeroMQ buffers messages, up to some specified limit, until they can get through; that’s where it earns the Q in its name. The Q stands for Queue, the M stands for Message, and the Zero means there doesn’t need to be any broker. Although ZeroMQ doesn’t impose any central brokers (intermediaries), you can build them where needed. For example, use DEALER and ROUTER sockets to connect multiple sources and/or destinations asynchronously. Multiple REQ sockets connect to a single ROUTER, which passes each request to a DEALER, which then contacts any REP sockets that have connected to it. This is similar to a bunch of browsers contacting a proxy server in front of a web server farm. It lets you add multiple clients and servers as needed. The REQ sockets connect only to the ROUTER socket; the DEALER connects to the multiple REP sockets behind it. ZeroMQ takes care of the nasty details, ensuring that the requests are load balanced and that the replies go back to the right place. Another networking pattern called the ventilator uses PUSH sockets to farm out asynchronous tasks, and PULL sockets to gather the results. The last notable feature of ZeroMQ is that it scales up and down, just by changing the connection type of the socket when it’s created: tcp between processes, on one or more machines ipc between processes on one machine inproc between threads in a single process That last one, inproc, is a way to pass data between threads without locks, and an alternative to the threading example in Threads. After using ZeroMQ, you might never want to write raw socket code again. Note ZeroMQ is certainly not the only message-passing library that Python supports. Message passing is one of the most popular ideas in networking, and Python keeps up with other languages. The Apache project, whose web server we saw in Apache, also maintains the ActiveMQ project, including several Python interfaces using the simple-text STOMP protocol. RabbitMQ is also popular, and has useful online Python tutorials. —————————————— investigation 调查 intimidating 令人生畏的 inclined 倾向于 —————————————— Scapy Sometimes you need to dip into the networking stream and see the bytes swimming by. You might want to debug a web API, or track down some security issue. The scapy library is an excellent Python tool for packet investigation, and much easier than writing and debugging C programs. It’s actually a little language for constructing and analyzing packets. I planned to include some example code here but changed my mind for two reasons: ■ scapy hasn’t been ported to Python 3 yet. That hasn’t stopped us before, when we’ve used pip2 and python2, but … ■ The installation instructions for scapy are, I think, too intimidating for an introductory book. If you’re so inclined, take a look at the examples in the main documentation site. They might encourage you to brave an installation on your machine. Finally, don’t confuse scapy with scrapy, which is covered in Crawl and Scrape. —————————————— automate 自动化 —————————————— Internet Services Python has an extensive networking toolset. In the following sections, we’ll look at ways to automate some of the most popular Internet services. The official, comprehensive documentation is available online. —————————————— critical 至关重要的 clue 线索 —————————————— Domain Name System Computers have numeric IP addresses such as 85.2.101.94, but we remember names better than numbers. The Domain Name System (DNS) is a critical Internet service that converts IP addresses to and from names via a distributed database. Whenever you’re using a web browser and suddenly see a message like “looking up host,” you’ve probably lost your Internet connection, and your first clue is a DNS failure. Some DNS functions are found in the low-level socket module. gethostbyname() returns the IP address for a domain name, and the extended edition gethostbyname_ex() returns the name, a list of alternative names, and a list of addresses: >>> import socket >>> socket.gethostbyname('www.crappytaxidermy.com') '66.6.44.4' >>> socket.gethostbyname_ex('www.crappytaxidermy.com') ('crappytaxidermy.com', ['www.crappytaxidermy.com'], ['66.6.44.4']) The getaddrinfo() method looks up the IP address, but it also returns enough information to create a socket to connect to it: >>> socket.getaddrinfo('www.crappytaxidermy.com', 80) [(2, 2, 17, '', ('66.6.44.4', 80)), (2, 1, 6, '', ('66.6.44.4', 80))] The preceding call returned two tuples, the first for UDP, and the second for TCP (the 6 in the 2, 1, 6 is the value for TCP). You can ask for TCP or UDP information only: >>> socket.getaddrinfo('www.crappytaxidermy.com', 80, socket.AF_INET, socket.SOCK_STREAM) [(2, 1, 6, '', ('66.6.44.4', 80))] Some TCP and UDP port numbers are reserved for certain services by IANA, and are associated with service names. For example, HTTP is named http and is assigned TCP port 80. These functions convert between service names and port numbers: >>> import socket >>> socket.getservbyname('http') 80 >>> socket.getservbyport(80) 'http' —————————————— Python Email Modules The standard library contains these email modules: ■ smtplib for sending email messages via Simple Mail Transfer Protocol (SMTP) ■ email for creating and parsing email messages ■ poplib for reading email via Post Office Protocol 3 (POP3) ■ imaplib for reading email via Internet Message Access Protocol (IMAP) The official documentation contains sample code for all of these libraries. If you want to write your own Python SMTP server, try smtpd. A pure-python SMTP server called Lamson allows you to store messages in databases, and you can even block spam. —————————————— Other protocols Using the standard ftplib module, you can push bytes around by using the File Transfer Protocol (FTP). Although it’s an old protocol, FTP still performs very well. You’ve seen many of these modules in various places in this book, but also try the documentation for standard library support of Internet protocols. —————————————— targeted 有针对性的 mashups 混搭 minimal 最小的 fledged 成熟的 Representational 表征 outlet 出路 —————————————— Web Services and APIs Information providers always have a website, but those are targeted for human eyes, not automation. If data is published only on a website, anyone who wants to access and structure the data needs to write scrapers (as shown in Crawl and Scrape), and rewrite them each time a page format changes. This is usually tedious. In contrast, if a website offers an API to its data, the data becomes directly available to client programs. APIs change less often than web page layouts, so client rewrites are less common. A fast, clean data pipeline also makes it easier to build mashups—combinations that might not have been foreseen but can be useful and even profitable. In many ways, the easiest API is a web interface, but one that provides data in a structured format such as JSON or XML rather than plain text or HTML. The API might be minimal or a full-fledged RESTful API (defined in Web APIs and Representational State Transfer), but it provides another outlet for those restless bytes. At the very beginning of this book, you can see a web API: it picks up the most popular videos from YouTube. This next example might make more sense now that you’ve read about web requests, JSON, dictionaries, lists, and slices: import requests url = "https://gdata.youtube.com/feeds/api/standardfeeds/top_rated?alt=json" response = requests.get(url) data = response.json() for video in data['feed']['entry'][0:6]: print(video['title']['$t']) APIs are especially useful for mining well-known social media sites such as Twitter, Facebook, and LinkedIn. All these sites provide APIs that are free to use, but they require you to register and get a key (a long-generated text string, sometimes also known as a token) to use when connecting. The key lets a site determine who’s accessing its data. It can also serve as a way to limit request traffic to servers. The YouTube example you just looked at did not require an API key for searching, but it would if you made calls that updated data at YouTube. Here are some interesting service APIs: ■ New York Times ■ YouTube ■ Twitter ■ Facebook ■ Weather Underground ■ Marvel Comics http://developer.marvel.com —————————————— Remote Processing Most of the examples in this book have demonstrated how to call Python code on the same machine, and usually in the same process. Thanks to Python’s expressiveness, you can also call code on other machines as though they were local. In advanced settings, if you run out of space on your single machine, you can expand beyond it. A network of machines gives you access to more processes and/or threads. —————————————— Procedure 过程 serializing 序列化 —————————————— Remote Procedure Calls Remote Procedure Calls (RPCs) look like normal functions but execute on remote machines across a network. Instead of calling a RESTful API with arguments encoded in the URL or request body, you call an RPC function on your own machine. Here’s what happens under the hood of the RPC client: It converts your function arguments into bytes (sometimes this is called marshalling, or serializing, or just encoding). It sends the encoded bytes to the remote machine. And here’s what happens on the remote machine: 1. It receives the encoded request bytes. 2. After receiving the bytes, the RPC client decodes the bytes back to the original data structures (or equivalent ones, if the hardware and software differ between the two machines). 3. The client then finds and calls the local function with the decoded data. 4. Next, it encodes the function results. 5. Last, the client sends the encoded bytes back to the caller. And finally, the machine that started it all decodes the bytes to return values. RPC is a popular technique, and people have implemented it in many ways. On the server side, you start a server program, connect it with some byte transport and encoding/decoding method, define some service functions, and light up your RPC is open for business sign. The client connects to the server and calls one of its functions via RPC. The standard library includes one RPC implementation that uses XML as the exchange format: xmlrpc. You define and register functions on the server, and the client calls them as though they were imported. First, let’s explore the file xmlrpc_server.py: from xmlrpc.server import SimpleXMLRPCServer def double(num): return num * 2 server = SimpleXMLRPCServer(("localhost", 6789)) server.register_function(double, "double") server.serve_forever() The function we’re providing on the server is called double(). It expects a number as an argument and returns the value of that number times two. The server starts up on an address and port. We need to register the function to make it available to clients via RPC. Finally, start serving and carry on. Now, you guessed it, xmlrpc_client.py: import xmlrpc.client proxy = xmlrpc.client.ServerProxy("http://localhost:6789/") num = 7 result = proxy.double(num) print("Double %s is %s" % (num, result)) The client connects to the server by using ServerProxy(). Then, it calls the function proxy.double(). Where did that come from? It was created dynamically by the server. The RPC machinery magically hooks this function name into a call to the remote server. Give it a try—start the server and then run the client: $ python xmlrpc_server.py Next, run the client: $ python xmlrpc_client.py Double 7 is 14 The server then prints the following: 127.0.0.1 - - [13/Feb/2014 20:16:23] "POST / HTTP/1.1" 200 - Popular transport methods are HTTP and ZeroMQ. Common encodings besides XML include JSON, Protocol Buffers, and MessagePack. There are many Python packages for JSON-based RPC, but many of them either don’t support Python 3 or seem a bit tangled. Let’s look at something different: MessagePack’s own Python RPC implementation. Here’s how to install it: $ pip install msgpack-rpc-python This will also install tornado, a Python event-based web server that this library uses as a transport. As usual, the server comes first (msgpack_server.py): from msgpackrpc import Server, Address class Services(): def double(self, num): return num * 2 server = Server(Services()) server.listen(Address("localhost", 6789)) server.start() The Services class exposes its methods as RPC services. Go ahead and start the client, msgpack_client.py: from msgpackrpc import Client, Address client = Client(Address("localhost", 6789)) num = 8 result = client.call('double', num) print("Double %s is %s" % (num, result)) To run these, follow the usual drill: start the server, start the client, see the results: $ python msgpack_server.py $ python msgpack_client.py Double 8 is 16 —————————————— tribute 称赞 —————————————— Salt Salt started as a way to implement remote execution, but it grew to a full-fledged systems management platform. Based on ZeroMQ rather than SSH, it can scale to thousands of servers. Salt has not yet been ported to Python 3. In this case, I won’t show Python 2 examples. If you’re interested in this area, read the documents, and watch for announcements when they do complete the port. —————————————— vinyl 黑胶唱片 consecutive 连续 exceeds 超过 batch 批处理 rival 竞争对手 Alas 唉 parallel 平行 —————————————— Big Fat Data and MapReduce As Google and other Internet companies grew, they found that traditional computing solutions didn’t scale. Software that worked for single machines, or even a few dozen, could not keep up with thousands. Disk storage for databases and files involved too much seeking, which requires mechanical movement of disk heads. (Think of a vinyl record, and the time it takes to move the needle from one track to another manually. And think of the screeching sound it makes when you drop it too hard, not to mention the sounds made by the record’s owner.) But you could stream consecutive segments of the disk more quickly. Developers found that it was faster to distribute and analyze data on many networked machines than on individual ones. They could use algorithms that sounded simplistic, but actually worked better overall with massively distributed data. One of these is MapReduce, which spreads a calculation across many machines and then gathers the results. It’s similar to working with queues. After Google published its results in a paper, Yahoo followed with an open source Java-based package named Hadoop (named after the toy stuffed elephant of the lead programmer’s son). The phrase big data applies here. Often it just means “data too big to fit on my machine”: data that exceeds the disk, memory, CPU time, or all of the above. To some organizations, if big data is mentioned somewhere in a question, the answer is always Hadoop. Hadoop copies data among machines, running them through map and reduce programs, and saving the results on disk at each step. This batch process can be slow. A quicker method called Hadoop streaming works like Unix pipes, streaming the data through programs without requiring disk writes at each step. You can write Hadoop streaming programs in any language, including Python. Many Python modules have been written for Hadoop, and some are discussed in the blog post “A Guide to Python Frameworks for Hadoop”. The Spotify company, known for streaming music, open sourced its Python component for Hadoop streaming, Luigi. The Python 3 port is still incomplete. A rival named Spark was designed to run ten to a hundred times faster than Hadoop. It can read and process any Hadoop data source and format. Spark includes APIs for Python and other languages. You can find the installation documents online. Another alternative to Hadoop is Disco, which uses Python for MapReduce processing and Erlang for communication. Alas, you can’t install it with pip; see the documentation. See Appendix C for related examples of parallel programming, in which a large structured calculation is distributed among many machines. —————————————— novelty 新奇的事物 fee 费 leased 租赁 redundantly 多余地 fallacies 谬论 Latency 潜在因素 Bandwidth 带宽 infinite 无限 Topology 拓扑结构 homogeneous 同等的 livestock 牲畜 maintenance 维护 dashboards 指示板 elastic 有弹性的 alerted 提醒 exceeds 超过 threshold 阈值 corporate 企业 spiked 飙升 —————————————— Working in the Clouds Not so long ago, you would buy your own servers, bolt them into racks in data centers, and install layers of software on them: operating systems, device drivers, file systems, databases, web servers, email servers, name servers, load balancers, monitors, and more. Any initial novelty wore off as you tried to keep multiple systems alive and responsive. And you worried constantly about security. Many hosting services offered to take care of your servers for a fee, but you still leased the physical devices and had to pay for your peak load configuration at all times. With more individual machines, failures are no longer infrequent: they’re very common. You need to scale services horizontally and store data redundantly. You can’t assume that the network operates like a single machine. The eight fallacies of distributed computing, according to Peter Deutsch, are as follows: ■ The network is reliable. ■ Latency is zero. ■ Bandwidth is infinite. ■ The network is secure. ■ Topology doesn’t change. ■ There is one administrator. ■ Transport cost is zero. ■ The network is homogeneous. You can try to build these complex distributed systems, but it’s a lot of work, and a different toolset is needed. To borrow an analogy, when you have a handful of servers, you treat them like pets—you give them names, know their personalities, and nurse them back to health when needed. But at scale, you treat servers more like livestock: they look alike, have numbers, and are just replaced if they have any problems. Instead of building, you can rent servers in the cloud. By adopting this model, maintenance is someone else’s problem, and you can concentrate on your service, or blog, or whatever you want to show the world. Using web dashboards and APIs, you can spin up servers with whatever configuration you need, quickly and easily—they’re elastic. You can monitor their status, and be alerted if some metric exceeds a given threshold. Clouds are currently a pretty hot topic, and corporate spending on cloud components has spiked. Let’s see how Python interacts with some popular clouds. —————————————— deploy 部署 —————————————— Google Google uses Python a lot internally, and it employs some prominent Python developers (even Guido van Rossum himself, for some time). Go to the App Engine site and then, under “Choose a Language,” click in the Python box. You can type Python code into the Cloud Playground and see results just below. Just after that are links and directions to download the Python SDK to your machine. This allows you to develop against Google’s cloud APIs on your own hardware. Following this are details on how to deploy your application to AppEngine itself. From Google’s main cloud page, you can find details on its services, including these: App Engine A high-level platform, including Python tools such as flask and django. Compute Engine Create clusters of virtual machines for large distributed computing tasks. Cloud Storage Object storage (objects are files, but there are no directory hierarchies).Cloud Datastore A large NoSQL database. Cloud SQL A large SQL database. Cloud Endpoints Restful access to applications.BigQuery Hadoop-like big data. Google services compete with Amazon and OpenStack, a segue if there ever was one. —————————————— thereabouts 在那附近 henceforth 从今以后 memo 备忘录 Elastic 有弹性的 —————————————— Amazon As Amazon was growing from hundreds to thousands to millions of servers, developers ran into all the nasty problems of distributed systems. One day in 2002 or thereabouts, CEO Jeff Bezos declared to Amazon employees that, henceforth, all data and functionality needed to be exposed only via network service interfaces—not files, or databases, or local function calls. They had to design these interfaces as though they were being offered to the public. The memo ended with a motivational nugget: “Anyone who doesn’t do this will be fired.” Not surprisingly, developers got to work, and over time built a very large service-oriented architecture. They borrowed or innovated many solutions, evolving into Amazon Web Services (AWS), which now dominates the market. It now contains dozens of services, but the most relevant are the following: Elastic Beanstalk High-level application platform EC2 (Elastic Compute) Distributed computing S3 (Simple Storage Service) Object storage RDS Relational databases (MySQL, PostgreSQL, Oracle, MSSQL) DynamoDB NoSQL database Redshift Data warehouse EMR Hadoop For details on these and other AWS services, download the Amazon Python SDK and read the help section. The official Python AWS library, boto, is another footdragger, not yet fully ported to Python 3. You’ll need to use Python 2, or try an alternative, which you can do by searching the Python Package Index for “aws” or “amazon.” —————————————— Telemetry 遥测 metrics 指标 metering 计量 incubation 孵化 explanatory 说明 dashboard 仪表盘 vendors 供应商 accelerating 加速 proprietary 专有的 —————————————— OpenStack The second most popular cloud service provider has been Rackspace. In 2010, it formed an unusual partnership with NASA to merge some of their cloud infrastructure into OpenStack. This is a freely available open source platform to build public, private, and hybrid clouds. A new release is made every six months, the most recent containing over 1.25 million lines of Python from many contributors. OpenStack is used in production by a growing number of organizations, including CERN and PayPal. OpenStack’s main APIs are RESTful, with Python modules providing programmatic interfaces, and command-line Python programs for shell automation. Here are some of the standard services in the current release: Keystone Identity service, providing authentication (for example, user/password), authorization (capabilities), and service discovery. Nova Compute service, distributing work across networked servers. Swift Object storage, such as Amazon’s S3. It’s used by Rackspace’s Cloud Files service. Glance Mid-level image storage service. Cinder Low-level block storage service.Horizon Web-based dashboard for all the services. Neutron Network management service. Heat Orchestration (multicloud) service. Ceilometer Telemetry (metrics, monitoring, and metering) service. Other services are proposed from time to time, which then go through an incubation process and might become part of the standard OpenStack platform. OpenStack runs on Linux or within a Linux virtual machine (VM). The installation of its core services is still somewhat involved. The fastest way to install OpenStack on Linux is to use Devstack and watch all the explanatory text flying by as it runs. You’ll end up with a web dashboard that can view and control the other services. If you want to install some or all of OpenStack manually, use your Linux distribution’s package manager. All of the major Linux vendors support OpenStack and are providing official packages on their download servers. Browse the main OpenStack site for installation documents, news, and related information. OpenStack development and corporate support are accelerating. It’s been compared to Linux when it was disrupting the proprietary Unix versions. —————————————— Things to Do —————————————— 11.1 Use a plain socket to implement a current-time-service. When a client sends the string time to the server, return the current date and time as an ISO string. udp_time_server.py: import socket from datetime import datetime address = ('localhost', 6789) max_size = 4096 print('Starting the server at', datetime.now()) print('Waiting for a client to call.') server = socket.socket(socket.AF_INET, socket.SOCK_DGRAM) # AF_INET means we’ll create an Internet (IP) socket; #SOCK_DGRAM means we’ll send and receive datagrams—in other words, we’ll use UDP. #What is UDP (User Datagram Protocol)? #This is used for short exchanges. A datagram is a tiny message sent in a single burst, like a note on a postcard. server.bind(address) data, client = server.recvfrom(max_size) if data == b'time': server.sendto(str(datetime.utcnow()).encode('utf-8'), client) server.close() udp_client.py: import socket from datetime import datetime server_address = ('localhost', 6789) max_size = 4096 print('Starting the client at', datetime.now()) client = socket.socket(socket.AF_INET, socket.SOCK_DGRAM) client.sendto(b'time', server_address) date, server = client.recvfrom(max_size) print('Now time is', date) client.close() —————————————— 11.2 Use ZeroMQ REQ and REP sockets to do the same thing. Firstly, you’ll need to install the Python ZeroMQ library by this command: $ python -m pip install --upgrade pip $ pip install pyzmq zmq_time_server.py: import zmq from datetime import datetime print('Starting the server at', datetime.now()) print('Waiting for a client to call.') server = zmq.Context().socket(zmq.REP) server.bind('tcp://{host}:{port}'.format(host='127.0.0.1', port=6789)) data = server.recv() if data == b'time': server.send(str(datetime.utcnow()).encode('utf-8')) zmq_client.py: import zmq from datetime import datetime print('Starting the client at', datetime.now()) client = zmq.Context().socket(zmq.REQ) client.connect('tcp://{host}:{port}'.format(host='127.0.0.1', port=6789)) client.send(b'time') date = client.recv() print('Now time is', date) —————————————— 11.3 Try the same with XMLRPC. xmlrpc_time_server.py: from xmlrpc.server import SimpleXMLRPCServer def a_function(text): if text == 'time': from datetime import datetime return str(datetime.utcnow()) server = SimpleXMLRPCServer(('127.0.0.1', 6789)) server.register_function(a_function, 'a_function') server.serve_forever() xmlrpc_client.py: import xmlrpc.client proxy = xmlrpc.client.ServerProxy('http://127.0.0.1:6789/') result = proxy.a_function('time') print(result) —————————————— Chapter 12. Be a Pythonista Always wanted to travel back in time to try fighting a younger version of yourself? Software development is the career for you! —Elliot Loh This chapter is devoted to the art and science of Python development, with “best practice” recommendations. Absorb them, and you too can be a card-carrying Pythonista. —————————————— About Programming First, a few notes about programming, based on personal experience. My original career path was science, and I taught myself programming to analyze and display experimental data. I expected computer programming to be like my impression of accounting—precise but dull. I was surprised to find that I enjoyed it. Part of the fun was its logical aspects—like solving puzzles—but part was creative. You had to write your program correctly to get the right results, but you had the freedom to write it any way you wanted. It was an unusual balance of right-brain and left-brain thinking. After I wandered off into a career in programming, I also learned that the field had many niches, with very different tasks and types of people. You could delve into computer graphics, operating systems, business applications—even science. If you’re a programmer, you might have had a similar experience yourself. If you’re not, you might try programming a bit to see if it fits your personality, or at least helps you to get something done. As I may have mentioned much earlier in this book, math skills are not so important. It seems that the ability to think logically is most important, and that an aptitude for languages seems to help. Finally, patience helps, especially when you’re tracking down an elusive bug in your code. —————————————— Find Python Code When you need to develop some code, the fastest solution is to steal it. Well…that is, from a source from which you’re allowed to steal code. The Python standard library is wide, deep, and mostly clear. Dive in and look for those pearls. Like the halls of fame for various sports, it takes time for a module to get into the standard library. New packages are appearing outside constantly, and throughout this book I’ve highlighted some that either do something new or do something old better. Python is advertised as batteries included, but you might need a new kind of battery. So where, outside the standard library, should you look for good Python code? The first place to look is the Python Package Index (PyPI). Formerly named the Cheese Shop after a Monty Python skit, this site is constantly updated with Python packages—over 39,000 as I write this. When you use pip (see the next section), it searches PyPI. The main PyPI page shows the most recently added packages. You can also conduct a direct search. Another popular repository is GitHub. See what Python packages are currently popular. Popular Python recipes has over four thousand short Python programs on every subject. —————————————— Install Packages There are three ways to install Python packages: ■ Use pip if you can. You can install most of the Python packages you’re likely to encounter with pip. ■ Sometimes, you can use a package manager for your operating system. ■ Install from source. If you’re interested in several packages in the same area, you might find a Python distribution that already includes them. For instance, in Appendix C, you can try out a number of numeric and scientific programs that would be tedious to install individually but are included with distributions such as Anaconda. —————————————— Use pip Python packaging has had some limitations. An earlier installation tool called easy_install has been replaced by one called pip, but neither had been in the standard Python installation. If you’re supposed to install things by using pip, from where did you get pip? Starting with Python 3.4, pip will finally be included with the rest of Python to avoid such existential crises. If you’re using an earlier version of Python 3 and don’t have pip, you can get it from http://www.pip-installer.org. The simplest use of pip is to install the latest version of a single package by using the following command: $ pip install flask You will see details on what it’s doing, just so you don’t think it’s goofing off: downloading, running setup.py, installing files on your disk, and other details. You can also ask pip to install a specific version: $ pip install flask==0.9.0 Or, a minimum version (this is useful when some feature that you can’t live without turns up in a particular version): $ pip install 'flask>=0.9.0' In the preceding example, those single quotes prevent the > from being interpreted by the shell to redirect output to a file called =0.9.0. If you want to install more than one Python package, you can use a requirements file. Although it has many options, the simplest use is a list of packages, one per line, optionally with a specific or relative version: $ pip -r requirements.txt Your sample requirements.txt file might contain this: flask==0.9.0 django psycopg2 —————————————— Use a Package Manager Apple’s OS X includes the third-party packagers homebrew (brew) and ports. They work a little like pip, but aren’t restricted to Python packages. Linux has a different manager for each distribution. The most popular are apt-get, yum, dpkg, and zypper. Windows has the Windows Installer and package files with a .msi suffix. If you installed Python for Windows, it was probably in the MSI format. —————————————— Install from Source Occasionally, a Python package is new, or the author hasn’t managed to make it available with pip. To build the package, you generally do the following: 1. Download the code. 2. Extract the files by using zip, tar, or another appropriate tool if they’re archived or compressed. 3. Run python install setup.py in the directory containing a setup.py file. Note As always, be careful what you download and install. It’s a little harder to hide malware in Python programs, which are readable text, but it has happened. —————————————— Integrated Development Environments I’ve used a plain-text interface for programs in this book, but that doesn’t mean that you need to run everything in a console or text window. There are many free and commercial integrated development environments (IDEs), which are GUIs with support for such tools as text editors, debuggers, library searching, and so on. IDLE IDLE is the only Python IDE that’s included with the standard distribution. It’s based on tkinter, and its GUI is plain. PyCharm PyCharm is a recent graphic IDE with many features. The community edition is free, and you can get a free license for the professional edition to use in a classroom or an open source project. Figure 12-1 shows its initial display. IPython iPython, which you can see in Appendix C, is a publishing platform as well as an extensive IDE. —————————————— Name and Document You won’t remember what you wrote. There are times when I look at code I wrote even recently and wonder where on earth it came from. That’s why it helps to document your code. Documentation can include comments and docstrings, but it can also incorporate informative naming of variables, functions, modules, and classes. Don’t be obsessive, as in this example: >>> # I'm going to assign 10 to the variable "num" here: ... num = 10 >>> # I hope that worked ... print(num) 10 >>> # Whew. Instead, say why you assigned the value 10. Point out why you called the variable num. If you were writing the venerable Fahrenheit to Celsius converter, you might name variables to explain what they do, rather than a lump of magic code. And a little test code wouldn’t hurt: def ftoc(f_temp): "Convert Fahrenheit temperature to Celsius and return it." f_boil_temp = 212.0 f_freeze_temp = 32.0 c_boil_temp = 100.0 c_freeze_temp = 0.0 f_range = f_boil_temp - f_freeze_temp c_range = c_boil_temp - c_freeze_temp f_c_ratio = c_range / f_range c_temp = (f_temp - f_freeze_temp) * f_c_ratio + c_freeze_temp return c_temp if __name__ == '__main__': for f_temp in [-40.0, 0.0, 32.0, 100.0, 212.0]: c_temp = ftoc(f_temp) print('%f F => %f C' % (f_temp, c_temp)) Let’s run the tests: $ python ftoc1.py -40.000000 F => -40.000000 C 0.000000 F => -17.777778 C 32.000000 F => 0.000000 C 100.000000 F => 37.777778 C 212.000000 F => 100.000000 C We can make (at least) two improvements: ■ Python doesn’t have constants, but the PEP8 stylesheet recommends using capital letters and underscores (e.g., ALL_CAPS) when naming variables that should be considered constants. Let’s rename those constant-y variables in our example. ■ Because we precompute values based on constant values, let’s move them to the top level of the module. Then, they’ll only be calculated once rather than in every call to the ftoc() function. Here’s the result of our rework: F_BOIL_TEMP = 212.0 F_FREEZE_TEMP = 32.0 C_BOIL_TEMP = 100.0 C_FREEZE_TEMP = 0.0 F_RANGE = F_BOIL_TEMP - F_FREEZE_TEMP C_RANGE = C_BOIL_TEMP - C_FREEZE_TEMP F_C_RATIO = C_RANGE / F_RANGE def ftoc(f_temp): "Convert Fahrenheit temperature to Celsius and return it." c_temp = (f_temp - F_FREEZE_TEMP) * F_C_RATIO + C_FREEZE_TEMP return c_temp if __name__ == '__main__': for f_temp in [-40.0, 0.0, 32.0, 100.0, 212.0]: c_temp = ftoc(f_temp) print('%f F => %f C' % (f_temp, c_temp)) —————————————— Testing Your Code Once in a while, I’ll make some trivial code change and say to myself, “Looks good, ship it.” And then it breaks. Oops. Every time I do this (thankfully, less and less over time) I feel like a doofus, and I swear to write even more tests next time. The very simplest way to test Python programs is to add print() statements. The Python interactive interpreter’s Read-Evaluate-Print Loop (REPL) lets you edit and test changes quickly. However, you probably don’t want print() statements in production code, so you need to remember to take them all out. Furthermore, cut-and-paste errors are really easy to make. —————————————— Check with pylint, pyflakes, and pep8 The next step, before creating actual test programs, is to run a Python code checker. The most popular are pylint and pyflakes. You can install either or both by using pip: $ pip install pylint $ pip install pyflakes These check for actual code errors (such as referring to a variable before assigning it a value) and style faux pas (the code equivalent of wearing plaids and stripes). Here’s a fairly meaningless program with a bug and style issue: a = 1 b = 2 print(a) print(b) print(c) Here’s the initial output of pylint: $ pylint style1.py No config file found, using default configuration ************* Module style1 C: 1,0: Missing docstring C: 1,0: Invalid name "a" for type constant (should match (([A-Z_][A-Z0-9_]*)|(__.*__))$) C: 2,0: Invalid name "b" for type constant (should match (([A-Z_][A-Z0-9_]*)|(__.*__))$) E: 5,6: Undefined variable 'c' Much further down, under Global evaluation, is our score (10.0 is perfect): Your code has been rated at -3.33/10 Ouch. Let’s fix the bug first. That pylint output line starting with an E indicates an Error, which occurred because we didn’t assign a value to c before we printed it. Let’s fix that: a = 1 b = 2 c = 3 print(a) print(b) print(c) $ pylint style2.py No config file found, using default configuration ************* Module style2 C: 1,0: Missing docstring C: 1,0: Invalid name "a" for type constant (should match (([A-Z_][A-Z0-9_]*)|(__.*__))$) C: 2,0: Invalid name "b" for type constant (should match (([A-Z_][A-Z0-9_]*)|(__.*__))$) C: 3,0: Invalid name "c" for type constant (should match (([A-Z_][A-Z0-9_]*)|(__.*__))$) Good, no more E lines. And our score jumped from -3.33 to 4.29: Your code has been rated at 4.29/10 pylint wants a docstring (a short text at the top of a module or function, describing the code), and it thinks short variable names such as a, b, and c are tacky. Let’s make pylint happier and improve style2.py to style3.py: "Module docstring goes here" def func(): "Function docstring goes here. Hi, Mom!" first = 1 second = 2 third = 3 print(first) print(second) print(third) func() $ pylint style3.py No config file found, using default configuration Hey, no complaints. And our score? Your code has been rated at 10.00/10 Not too shabby at all, right? Another style checker is pep8, which you can install in the usual way: $ pip install pep8 What does it say about our style makeover? $ pep8 style3.py style3.py:3:1: E302 expected 2 blank lines, found 1 To be really stylish, it’s recommending that I add a blank line after the initial module docstring. —————————————— Test with unittest(1) We’ve verified that we’re no longer insulting the style senses of the code gods, so let’s move on to actual tests of the logic in your program. It’s a good practice to write independent test programs first, to ensure that they all pass before you commit your code to any source control system. Writing tests can seem tedious at first, but they really do help you find problems faster—especially regressions (breaking something that used to work). Painful experience teaches all developers that even the teeniest change, which they swear could not possibly affect anything else, actually does. If you look at well-written Python packages, they always include a test suite. The standard library contains not one, but two test packages. Let’s start with unittest. We’ll write a module that capitalizes words. Our first version just uses the standard string function capitalize(), with some unexpected results as we’ll see. Save this as cap.py: def just_do_it(text): return text.capitalize() The basis of testing is to decide what outcome you want from a certain input (here, you want the capitalized version of whatever text you input), submit the input to the function you’re testing, and then check whether it returned the expected results. The expected result is called an assertion, so in unittest you check your results by using methods with names that begin with assert, like the assertEqual method shown in the following example. Save this test script as test_cap.py: import unittest import cap class TestCap(unittest.TestCase): def setUp(self): pass def tearDown(self): pass def test_one_word(self): text = 'duck' result = cap.just_do_it(text) self.assertEqual(result, 'Duck') def test_multiple_words(self): text = 'a veritable flock of ducks' result = cap.just_do_it(text) self.assertEqual(result, 'A Veritable Flock Of Ducks') if __name__ == '__main__': unittest.main() The setUp() method is called before each test method, and the tearDown() method is called after each. Their purpose is to allocate and free external resources needed by the tests, such as a database connection or some test data. In this case, our tests are self-contained, and we wouldn’t even need to define setUp() and tearDown(), but it doesn’t hurt to have empty versions there. The heart of our test is the two functions named test_one_word() and test_multiple_words(). Each runs the just_do_it() function we defined with different input and checks whether we got back what we expect. Okay, let’s run it. This will call our two test methods: $ python test_cap.py F. ====================================================================== FAIL: test_multiple_words (__main__.TestCap) ---------------------------------------------------------------------- Traceback (most recent call last): File "test_cap.py", line 20, in test_multiple_words self.assertEqual(result, 'A Veritable Flock Of Ducks') AssertionError: 'A veritable flock of ducks' != 'A Veritable Flock Of Ducks' - A veritable flock of ducks ? ^ ^ ^ ^ + A Veritable Flock Of Ducks ? ^ ^ ^ ^ ---------------------------------------------------------------------- Ran 2 tests in 0.001s FAILED (failures=1) It liked the first test (test_one_word) but not the second (test_multiple_words). The up arrows (^) shows where the strings actually differed. What’s special about multiple words? Reading the documentation for the string capitalize function yields an important clue: it capitalizes only the first letter of the first word. Maybe we should have read that first. —————————————— Test with unittest(2) Consequently, we need another function. Gazing down that page a bit, we find title(). So, let’s change cap.py to use title() instead of capitalize(): def just_do_it(text): return text.title() Rerun the tests, and let’s see what happens: $ python test_cap.py .. ---------------------------------------------------------------------- Ran 2 tests in 0.000s OK Everything is great. Well, actually, they’re not. We need to add at least one more method to test_cap.py: def test_words_with_apostrophes(self): text = "I'm fresh out of ideas" result = cap.just_do_it(text) self.assertEqual(result, "I'm Fresh Out Of Ideas") Go ahead and try it again: $ python test_cap.py ..F ====================================================================== FAIL: test_words_with_apostrophes (__main__.TestCap) ---------------------------------------------------------------------- Traceback (most recent call last): File "test_cap.py", line 25, in test_words_with_apostrophes self.assertEqual(result, "I'm Fresh Out Of Ideas") AssertionError: "I'M Fresh Out Of Ideas" != "I'm Fresh Out Of Ideas" - I'M Fresh Out Of Ideas ? ^ + I'm Fresh Out Of Ideas ? ^ ---------------------------------------------------------------------- Ran 3 tests in 0.001s FAILED (failures=1) Our function capitalized the m in I'm. A quick run back to the documentation for title() shows that it doesn’t handle apostrophes well. We really should have read the entire text first. At the bottom of the standard library’s string documentation is another candidate: a helper function called capwords(). Let’s use it in cap.py: def just_do_it(text): from string import capwords return capwords(text) $ python test_cap.py ... ---------------------------------------------------------------------- Ran 3 tests in 0.004s OK At last, we’re finally done! Uh, no. One more test to add to test_cap.py: def test_words_with_quotes(self): text = "\"You're despicable,\" said Daffy Duck" result = cap.just_do_it(text) self.assertEqual(result, "\"You're Despicable,\" Said Daffy Duck") Did it work? $ python test_cap.py ...F ====================================================================== FAIL: test_words_with_quotes (__main__.TestCap) ---------------------------------------------------------------------- Traceback (most recent call last): File "test_cap.py", line 30, in test_words_with_quotes self.assertEqual(result, "\"You're Despicable,\" Said Daffy Duck") AssertionError: '"you\'re Despicable," Said Daffy Duck' != '"You\'re Despicable," Said Daffy Duck' - "you're Despicable," Said Daffy Duck ? ^ + "You're Despicable," Said Daffy Duck ? ^ ---------------------------------------------------------------------- Ran 4 tests in 0.004s FAILED (failures=1) It looks like that first double quote confused even capwords, our favorite capitalizer thus far. It tried to capitalize the ", and lowercased the rest (You're). We should have also tested that our capitalizer left the rest of the string untouched. People who do testing for a living have a knack for spotting these edge cases, but developers often have blind spots when it comes to their own code. unittest provides a small but powerful set of assertions, letting you check values, confirm whether you have the class you want, determine whether an error was raised, and so on. —————————————— Test with doctest The second test package in the standard library is doctest. With this package, you can write tests within the docstring itself, also serving as documentation. It looks like the interactive interpreter: the characters >>>, followed by the call, and then the results on the following line. You can run some tests in the interactive interpreter and just paste the results into your test file. We’ll modify cap.py (without that troublesome last test with quotes): def just_do_it(text): """ >>> just_do_it('duck') 'Duck' >>> just_do_it('a veritable flock of ducks') 'A Veritable Flock Of Ducks' >>> just_do_it("I'm fresh out of ideas") "I'm Fresh Out Of Ideas" """ from string import capwords return capwords(text) if __name__ == '__main__': import doctest doctest.testmod() When you run it, it doesn’t print anything if all tests passed: $ python cap.py Give it the verbose (-v) option to see what actually happened: $ python cap.py -v Trying: just_do_it('duck') Expecting: 'Duck' ok Trying: just_do_it('a veritable flock of ducks') Expecting: 'A Veritable Flock Of Ducks' ok Trying: just_do_it("I'm fresh out of ideas") Expecting: "I'm Fresh Out Of Ideas" ok 1 items had no tests: __main__ 1 items passed all tests: 3 tests in __main__.just_do_it 3 tests in 2 items. 3 passed and 0 failed. Test passed. —————————————— Test with nose The third-party package called nose is another alternative to unittest. Here’s the command to install it: $ pip install nose You don’t need to create a class that includes test methods, as we did with unittest. Any function with a name matching test somewhere in its name will be run. Let’s modify our last version of our unittest tester and save it as test_cap_nose.py: import cap from nose.tools import eq_ def test_one_word(): text = 'duck' result = cap.just_do_it(text) eq_(result, 'Duck') def test_multiple_words(): text = 'a veritable flock of ducks' result = cap.just_do_it(text) eq_(result, 'A Veritable Flock Of Ducks') def test_words_with_apostrophes(): text = "I'm fresh out of ideas" result = cap.just_do_it(text) eq_(result, "I'm Fresh Out Of Ideas") def test_words_with_quotes(): text = "\"You're despicable,\" said Daffy Duck" result = cap.just_do_it(text) eq_(result, "\"You're Despicable,\" Said Daffy Duck") Run the tests: $ nosetests test_cap_nose.py ...F ====================================================================== FAIL: test_cap_nose.test_words_with_quotes ---------------------------------------------------------------------- Traceback (most recent call last): File "/Users/.../site-packages/nose/case.py", line 198, in runTest self.test(*self.arg) File "/Users/.../book/test_cap_nose.py", line 23, in test_words_with_quotes eq_(result, "\"You're Despicable,\" Said Daffy Duck") AssertionError: '"you\'re Despicable," Said Daffy Duck' != '"You\'re Despicable," Said Daffy Duck' ---------------------------------------------------------------------- Ran 4 tests in 0.005s FAILED (failures=1) This is the same bug we found when we used unittest for testing; fortunately, there’s an exercise to fix it at the end of this chapter. —————————————— Other Test Frameworks For some reason, people like to write Python test frameworks. If you’re curious, you can check out some other popular ones, including tox and py.test. —————————————— Continuous Integration When your group is cranking out a lot of code daily, it helps to automate tests as soon as changes arrive. You can automate source control systems to run tests on all code as it’s checked in. This way, everyone knows if someone broke the build and just disappeared for an early lunch. These are big systems, and I’m not going into installation and usage details here. In case you need them someday, you’ll know where to find them: buildbot Written in Python, this source control system automates building, testing, and releasing. jenkins This is written in Java and seems to be the preferred CI tool of the moment. travis-ci This automates projects hosted at GitHub, and it’s free for open source projects. —————————————— Debugging Python Code Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it. — Brian Kernighan Test first. The better your tests are, the less you’ll have to fix later. Yet, bugs happen and need to be fixed when they’re found later. Again, the simplest way to debug in Python is to print out strings. Some useful things to print include vars(), which extracts the values of your local variables, including function arguments: >>> def func(*args, **kwargs): ... print(vars()) ... >>> func(1, 2, 3) {'args': (1, 2, 3), 'kwargs': {}} >>> func(['a', 'b', 'argh']) {'args': (['a', 'b', 'argh'],), 'kwargs': {}} As you read in Decorators, a decorator can call code before or after a function without modifying the code within the function itself. This means that you can use a decorator to do something before or after any Python function, not just ones that you wrote. Let’s define the decorator dump to print the input arguments and output values of any function as it’s called (designers know that a dump often needs decorating): def dump(func): "Print input arguments and output value(s)" def wrapped(*args, **kwargs): print("Function name: %s" % func.__name__) print("Input arguments: %s" % ' '.join(map(str, args))) print("Input keyword arguments: %s" % kwargs.items()) output = func(*args, **kwargs) print("Output:", output) return output return wrapped Now the decoratee. This is a function called double() that expects numeric arguments, either named or unnamed, and returns them in a list with their values doubled: from dump1 import dump @dump def double(*args, **kwargs): "Double every argument" output_list = [ 2 * arg for arg in args ] output_dict = { k:2*v for k,v in kwargs.items() } return output_list, output_dict if __name__ == '__main__': output = double(3, 5, first=100, next=98.6, last=-40) Take a moment to run it: $ python test_dump.py Function name: double Input arguments: 3 5 Input keyword arguments: dict_items([('last', -40), ('first', 100), ('next', 98.6)]) Output: ([6, 10], {'last': -80, 'first': 200, 'next': 197.2}) —————————————— Debug with pdb(1) These techniques help, but sometimes there’s no substitute for a real debugger. Most IDEs include a debugger, with varying features and user interfaces. Here, I’ll describe use of the standard Python debugger, pdb. Note If you run your program with the -i flag, Python will drop you into its interactive interpreter if the program fails. Here’s a program with a bug that depends on data—the kind of bug that can be particularly hard to find. This is a real bug from the early days of computing, and it baffled programmers for quite a while. We’re going to read a file of countries and their capital cities, separated by a comma, and write them out as capital, country. They might be capitalized incorrectly, so we should fix that also when we print. Oh, and there might be extra spaces here and there, and you’ll want to get rid of those, too. Finally, although it would make sense for the program to just read to the end of the file, for some reason our manager told us to stop when we encounter the word quit (in any mixture of uppercase and lowercase characters). Here’s a sample data file: France, Paris venuzuela,caracas LithuniA,vilnius quit Let’s design our algorithm (method for solving the problem). This is pseudocode—it looks like a program, but is just a way to explain the logic in normal language before converting it to an actual program. One reason programmers like Python is because it looks a lot like pseudocode, so there’s less work involved to convert it to a working program: for each line in the text file: read the line strip leading and trailing spaces if `quit` occurs in the lower-case copy of the line: stop else: split the country and capital by the comma character trim any leading and trailing spaces convert the country and capital to titlecase print the capital, a comma, and the country We need to strip initial and trailing spaces from the names because that was a requirement. Likewise for the lowercase comparison with quit and converting the city and country names to title case. That being the case, let’s whip out capitals.py, which is sure to work perfectly: def process_cities(filename): with open(filename, 'rt') as file: for line in file: line = line.strip() if 'quit' in line.lower(): return country, city = line.split(',') city = city.strip() country = country.strip() print(city.title(), country.title(), sep=',') if __name__ == '__main__': import sys process_cities(sys.argv[1]) Let’s try it with that sample data file we made earlier. Ready, fire, aim: $ python capitals.py cities1.csv Paris,France Caracas,Venuzuela Vilnius,Lithunia Looks great! It passed one test, so let’s put it in production, processing capitals and countries from around the world—until it fails, but only for this data file: argentina,buenos aires bolivia,la paz brazil,brasilia chile,santiago colombia,Bogotá ecuador,quito falkland islands,stanley french guiana,cayenne guyana,georgetown paraguay,Asunción peru,lima suriname,paramaribo uruguay,montevideo venezuela,caracas quit The program ends after printing only 5 lines of the 15 in the data file, as demonstrated here: $ python capitals.py cities2.csv Buenos Aires,Argentina La Paz,Bolivia Brazilia,Brazil Santiago,Chile Bogotá,Colombia What happened? We can keep editing capitals.py, putting print() statements in likely places, but let’s see if the debugger can help us. —————————————— Debug with pdb(2) To use the debugger, import the pdb module from the command line by typing -m pdb, like so: $ python -m pdb capitals.py cities2.csv > /Users/williamlubanovic/book/capitals.py(1)() -> def process_cities(filename): (Pdb) This starts the program and places you at the first line. If you type c (continue), the program will run until it ends, either normally or with an error: (Pdb) c Buenos Aires,Argentina La Paz,Bolivia Brazilia,Brazil Santiago,Chile Bogotá,Colombia The program finished and will be restarted > /Users/williamlubanovic/book/capitals.py(1)() -> def process_cities(filename): It completed normally, just as it did when we ran it earlier outside of the debugger. Let’s try again, using some commands to narrow down where the problem lies. It seems to be a logic error rather than a syntax problem or exception (which would have printed error messages). Type s (step) to single-step through Python lines. This steps through all Python code lines: yours, the standard library’s, and any other modules you might be using. When you use s, you also go into functions and single-step within them. Type n (next) to single-step but not to go inside functions; when you get to a function, a single n causes the entire function to execute and take you to the next line of your program. Thus, use s when you’re not sure where the problem is; use n when you’re sure that a particular function isn’t the cause, especially if it’s a long function. Often you’ll single-step through your own code and step over library code, which is presumably well tested. We’ll use s to step from the beginning of the program, into the function process_cities(): (Pdb) s > /Users/williamlubanovic/book/capitals.py(12)() -> if __name__ == '__main__': (Pdb) s > /Users/williamlubanovic/book/capitals.py(13)() -> import sys (Pdb) s > /Users/williamlubanovic/book/capitals.py(14)() -> process_cities(sys.argv[1]) (Pdb) s --Call-- > /Users/williamlubanovic/book/capitals.py(1)process_cities() -> def process_cities(filename): (Pdb) s > /Users/williamlubanovic/book/capitals.py(2)process_cities() -> with open(filename, 'rt') as file: Type l (list) to see the next few lines of your program: (Pdb) l 1 def process_cities(filename): 2 -> with open(filename, 'rt') as file: 3 for line in file: 4 line = line.strip() 5 if 'quit' in line.lower(): 6 return 7 country, city = line.split(',') 8 city = city.strip() 9 country = country.strip() 10 print(city.title(), country.title(), sep=',') 11 (Pdb) The arrow (->) denotes the current line. —————————————— Debug with pdb(3) We could continue using s or n, hoping to spot something, but let’s use one of the main features of a debugger: breakpoints. A breakpoint stops execution at the line you indicate. In our case, we want to know why process_cities() bails out before it’s read all of the input lines. Line 3 (for line in file:) will read every line in the input file, so that seems innocent. The only other place where we could return from the function before reading all of the data is at line 6 (return). Let’s set a breakpoint on line 6: (Pdb) b 6 Breakpoint 1 at /Users/williamlubanovic/book/capitals.py:6 Next, let’s continue the program until it either hits the breakpoint or reads all of the input lines and finishes normally: (Pdb) c Buenos Aires,Argentina La Paz,Bolivia Brasilia,Brazil Santiago,Chile Bogotá,Colombia > /Users/williamlubanovic/book/capitals.py(6)process_cities() -> return Aha, it stopped at our line 6 breakpoint. This indicates that the program wants to return early after reading the country after Colombia. Let’s print the value of line to see what we just read: (Pdb) p line 'ecuador,quito' What’s so special about—oh, never mind. Really? *quit*o? Our manager never expected the string quit to turn up inside normal data, so using it as a sentinel (end indicator) value like this was a boneheaded idea. You march right in there and tell him that, while I wait here. If at this point you still have a job, you can see all your breakpoints by using a plain b command: (Pdb) b Num Type Disp Enb Where 1 breakpoint keep yes at /Users/williamlubanovic/book/capitals.py:6 breakpoint already hit 1 time An l will show your code lines, the current line (->), and any breakpoints (B). A plain l will start listing from the end of your previous call to l, so include the optional starting line (here, let’s start from line 1): (Pdb) l 1 1 def process_cities(filename): 2 with open(filename, 'rt') as file: 3 for line in file: 4 line = line.strip() 5 if 'quit' in line.lower(): 6 B-> return 7 country, city = line.split(',') 8 city = city.strip() 9 country = country.strip() 10 print(city.title(), country.title(), sep=',') 11 Okay, let’s fix that quit test to only match the full line, not within other characters: def process_cities(filename): with open(filename, 'rt') as file: for line in file: line = line.strip() if 'quit' == line.lower(): return country, city = line.split(',') city = city.strip() country = country.strip() print(city.title(), country.title(), sep=',') if __name__ == '__main__': import sys process_cities(sys.argv[1]) Once more, with feeling: $ python capitals2.py cities2.csv Buenos Aires,Argentina La Paz,Bolivia Brasilia,Brazil Santiago,Chile Bogotá,Colombia Quito,Ecuador Stanley,Falkland Islands Cayenne,French Guiana Georgetown,Guyana Asunción,Paraguay Lima,Peru Paramaribo,Suriname Montevideo,Uruguay Caracas,Venezuela That was a skimpy overview of the debugger—just enough to show you what it can do and what commands you’d use most of the time. Remember: more tests, less debugging. —————————————— Logging Error Messages At some point you might need to graduate from using print() statements to logging messages. A log is usually a system file that accumulates messages, often inserting useful information such as a timestamp or the name of the user who’s running the program. Often logs are rotated (renamed) daily and compressed; by doing so, they don’t fill up your disk and cause problems themselves. When something goes wrong with your program, you can look at the appropriate log file to see what happened. The contents of exceptions are especially useful in logs because they show you the actual line at which your program croaked, and why. The standard Python library module is logging. I’ve found most descriptions of it somewhat confusing. After a while it makes more sense, but it does seem overly complicated at first. The logging module includes these concepts: ■ The message that you want to save to the log ■ Ranked priority levels and matching functions: debug(), info(), warn(), error(), and critical() ■ One or more logger objects as the main connection with the module ■ Handlers that direct the message to your terminal, a file, a database, or somewhere else ■ Formatters that create the output ■ Filters that make decisions based on the input For the simplest logging example, just import the module and use some of its functions: >>> import logging >>> logging.debug("Looks like rain") >>> logging.info("And hail") >>> logging.warn("Did I hear thunder?") WARNING:root:Did I hear thunder? >>> logging.error("Was that lightning?") ERROR:root:Was that lightning? >>> logging.critical("Stop fencing and get inside!") CRITICAL:root:Stop fencing and get inside! Did you notice that debug() and info() didn’t do anything, and the other two printed LEVEL:root: before each message? So far, it’s like a print() statement with multiple personalities, some of them hostile. But it is useful. You can scan for a particular value of LEVEL in a log file to find particular messages, compare timestamps to see what happened before your server crashed, and so on. A lot of digging through the documentation answers the first mystery (we’ll get to the second one in a page or two): the default priority level is WARNING, and that got locked in as soon as we called the first function (logging.debug()). We can set the default level by using basicConfig(). DEBUG is the lowest level, so this enables it and all the higher levels to flow through: >>> import logging >>> logging.basicConfig(level=logging.DEBUG) >>> logging.debug("It's raining again") DEBUG:root:It's raining again >>> logging.info("With hail the size of hailstones") INFO:root:With hail the size of hailstones We did all that with the default logging functions, without actually creating a logger object. Each logger has a name. Let’s make one called bunyan: >>> import logging >>> logging.basicConfig(level='DEBUG') >>> logger = logging.getLogger('bunyan') >>> logger.debug('Timber!') DEBUG:bunyan:Timber! If the logger name contains any dot characters, they separate levels of a hierarchy of loggers, each with potentially different properties. This means that a logger named quark is higher than one named quark.charmed. The special root logger is at the top, and is called ''. So far, we’ve just printed messages, which is not a great improvement over print(). We use handlers to direct the messages to different places. The most common is a log file, and here’s how you do it: >>> import logging >>> logging.basicConfig(level='DEBUG', filename='blue_ox.log') >>> logger = logging.getLogger('bunyan') >>> logger.debug("Where's my axe?") >>> logger.warn("I need my axe") >>> Aha, the lines aren’t on the screen anymore; instead, they’re in the file named blue_ox.log: DEBUG:bunyan:Where's my axe? WARNING:bunyan:I need my axe Calling basicConfig() with a filename argument created a FileHandler for you and made it available to your logger. The logging module includes at least 15 handlers to send messages to places such as email and web servers as well as the screen and files. Finally, you can control the format of your logged messages. In our first example, our default gave us something similar to this: WARNING:root:Message... If you provide a format string to basicConfig(), you can change to the format of your preference: >>> import logging >>> fmt = '%(asctime)s %(levelname)s %(lineno)s %(message)s' >>> logging.basicConfig(level='DEBUG', format=fmt) >>> logger = logging.getLogger('bunyan') >>> logger.error("Where's my other plaid shirt?") 2014-04-08 23:13:59,899 ERROR 1 Where's my other plaid shirt? We let the logger send output to the screen again, but changed the format. The logging module recognizes a number of variable names in the fmt format string. We used asctime (date and time as an ISO 8601 string), levelname, lineno (line number), and the message itself. There are other built-ins, and you can provide your own variables, as well. There’s much more to logging than this little overview can provide. You can log to more than one place at the same time, with different priorities and formats. The package has a lot of flexibility, but sometimes at the cost of simplicity. —————————————— Optimize Your Code Python is usually fast enough—until it isn’t. In many cases, you can gain speed by using a better algorithm or data structure. The trick is knowing where to do this. Even experienced programmers guess wrong surprisingly often. You need to be like the careful quiltmaker, and measure before you cut. And this leads us to timers. —————————————— Measure Timing You’ve seen that the time function in the time module returns the current epoch time as a floating-point number of seconds. A quick way of timing something is to get the current time, do something, get the new time, and then subtract the original time from the new time. Let’s write this up and call it time1.py: from time import time t1 = time() num = 5 num *= 2 print(time() - t1) In this example, we’re measuring the the time it takes to assign the value 5 to the name num and multiply it by 2. This is not a realistic benchmark, just an example of how to measure some arbitrary Python code. Try running it a few times, just to see how much it can vary: $ python time1.py 2.1457672119140625e-06 $ python time1.py 2.1457672119140625e-06 $ python time1.py 2.1457672119140625e-06 $ python time1.py 1.9073486328125e-06 $ python time1.py 3.0994415283203125e-06 That was about two or three millionths of a second. Let’s try something slower, such as sleep. If we sleep for a second, our timer should take a tiny bit more than a second. Save this as time2.py: from time import time, sleep t1 = time() sleep(1.0) print(time() - t1) Let’s be certain of our results, so run it a few times: $ python time2.py 1.000797986984253 $ python time2.py 1.0010130405426025 $ python time2.py 1.0010390281677246 As expected, it takes about a second to run. If it didn’t, either our timer or sleep() should be embarrassed. There’s a handier way to measure code snippets like this: the standard module timeit. It has a function called (you guessed it) timeit(), which will run your test code count times and print some results. The syntax is: timeit.timeit( code, number, count ). In the examples in this section, the code needs to be within quotes so that it is not executed after you press the Return key but is executed inside timeit(). (In the next section, you’ll see how to time a function by passing its name to timeit().) Let’s run our previous example just once and time it. Call this file timeit1.py: from timeit import timeit print(timeit('num = 5; num *= 2', number=1)) Run it a few times: $ python timeit1.py 2.5600020308047533e-06 $ python timeit1.py 1.9020008039660752e-06 $ python timeit1.py 1.7380007193423808e-06 Again, these two code lines ran in about two millionths of a second. We can use the repeat argument of the timeit module’s repeat() function to run more sets. Save this as timeit2.py: from timeit import repeat print(repeat('num = 5; num *= 2', number=1, repeat=3)) Try running it to see what transpires: $ python timeit2.py [1.691998477326706e-06, 4.070025170221925e-07, 2.4700057110749185e-07] The first run took two millionths of a second, and the second and third runs were faster. Why? There could be many reasons. For one thing, we’re testing a very small piece of code, and its speed could depend on what else the computer was doing in those instants, how the Python system optimizes calculations, and many other things. Or, it could be just chance. Let’s try something more realistic than variable assignment and sleep. We’ll measure some code to help compare the efficiency of a few algorithms (program logic) and data structures (storage mechanisms). —————————————— Algorithms and Data Structures The Zen of Python declares that There should be one—and preferably only one—obvious way to do it. Unfortunately, sometimes it isn’t obvious, and you need to compare alternatives. For example, is it better to use a for loop or a list comprehension to build a list? And what do we mean by better? Is it faster, easier to understand, using less memory, or more “Pythonic”? In this next exercise, we’ll build a list in different ways, comparing speed, readability, and Python style. Here’s time_lists.py: from timeit import timeit def make_list_1(): result = [] for value in range(1000): result.append(value) return result def make_list_2(): result = [value for value in range(1000)] return result print('make_list_1 takes', timeit(make_list_1, number=1000), 'seconds') print('make_list_2 takes', timeit(make_list_2, number=1000), 'seconds') In each function, we add 1,000 items to a list, and we call each function 1,000 times. Notice that in this test we called timeit() with the function name as the first argument rather than code as a string. Let’s run it: $ python time_lists.py make_list_1 takes 0.14117428699682932 seconds make_list_2 takes 0.06174145900149597 seconds The list comprehension is at least twice as fast as adding items to the list by using append(). In general, comprehensions are faster than manual construction. Use these ideas to make your own code faster. —————————————— Cython, NumPy, and C Extensions If you’re pushing Python as hard as you can and still can’t get the performance you want, you have yet more options. Cython is a hybrid of Python and C, designed to translate Python with some performance annotations to compiled C code. These annotations are fairly small, like declaring the types of some variables, function arguments, or function returns. For scientific-style loops of numeric calculations, adding these hints will make them much faster—as much as a thousand times faster. See the Cython wiki for documentation and examples. You can read much more about NumPy in Appendix C. It’s a Python math library, written in C for speed. Many parts of Python and its standard library are written in C for speed and wrapped in Python for convenience. These hooks are available to you for your applications. If you know C and Python and really want to make your code fly, writing a C extension is harder but the improvements can be worth the trouble. —————————————— PyPy When Java first appeared about 20 years ago, it was as slow as an arthritic schnauzer. When it started to mean real money to Sun and other companies, though, they put millions into optimizing the Java interpreter and the underlying Java virtual machine (JVM), borrowing techniques from earlier languages like Smalltalk and LISP. Microsoft likewise put great effort into optimizing its rival C# language and .NET VM. No one owns Python, so no one has pushed that hard to make it faster. You’re probably using the standard Python implementation. It’s written in C, and often called CPython (not the same as Cython). Like PHP, Perl, and even Java, Python is not compiled to machine language, but translated to an intermediate language (with names such as bytecode or p-code) which is then interpreted in a virtual machine. PyPy is a new Python interpreter that applies some of the tricks that sped up Java. Its benchmarks show that PyPy is faster than CPython in every test—over 6 times faster on average, and up to 20 times faster in some cases. It works with Python 2 and 3. You can download it and use it instead of CPython. PyPy is constantly being improved, and it might even replace CPython some day. Read the latest release notes on the site to see if it could work for your purposes. —————————————— Source Control When you’re working on a small group of programs, you can usually keep track of your changes—until you make a boneheaded mistake and clobber a few days of work. Source control systems help protect your code from dangerous forces, like you. If you work with a group of developers, source control becomes a necessity. There are many commercial and open source packages in this area. The most popular in the open source world where Python lives are Mercurial and Git. Both are examples of distributed version control systems, which produce multiple copies of code repositories. Earlier systems such as Subversion run on a single server. —————————————— Mercurial Mercurial is written in Python. It’s fairly easy to learn, with a handful of subcommands to download code from a Mercurial repository, add files, check in changes, and merge changes from different sources. bitbucket and other sites offer free or commercial hosting. —————————————— Git(1) Git was originally written for Linux kernel development, but now dominates open source in general. It’s similar to Mercurial, although some find it slightly trickier to master. GitHub is the largest git host, with over a million repositories, but there are many other hosts. The standalone program examples in this book are available in a public git repository at GitHub. If you have the git program on your computer, you can download these programs by using this command: $ git clone https://github.com/madscheme/introducing-python You can also download the code by pressing the following buttons on the GitHub page: ■ Click “Clone in Desktop” to open your computer’s version of git, if it’s been installed. ■ Click “Download ZIP” to get a zipped archive of the programs. If you don’t have git but would like to try it, read the installation guide. I’ll talk about the command-line version here, but you might be interested in sites such as GitHub that have extra services and might be easier to use in some cases; git has many features, but is not always intuitive. Let’s take git for a test drive. We won’t go far, but the ride will show a few commands and their output. Make a new directory and change to it: $ mkdir newdir $ cd newdir Create a local git repository in your current directory newdir: $ git init Initialized empty Git repository in /Users/williamlubanovic/newdir/.git/ Create a Python file called test.py with these contents in newdir: print('Oops') Add the file to the git repository: $ git add test.py What do you think of that, Mr. Git? $ git status On branch master Initial commit Changes to be committed: (use "git rm --cached ..." to unstage) new file: test.py This means that test.py is part of the local repository but its changes have not yet been committed. Let’s commit it: $ git commit -m "simple print program" [master (root-commit) 52d60d7] my first commit 1 file changed, 1 insertion(+) create mode 100644 test.py That -m "my first commit" was your commit message. If you omitted that, git would pop you into an editor and coax you to enter the message that way. This becomes a part of the git change history for that file. Let’s see what our current status is: $ git status On branch master nothing to commit, working directory clean —————————————— Git(2) Okay, all current changes have been committed. This means that we can change things and not worry about losing the original version. Make an adjustment now to test.py—change Oops to Ops! and save the file: print('Ops!') Let’s check to see what git thinks now: $ git status On branch master Changes not staged for commit: (use "git add ..." to update what will be committed) (use "git checkout -- ..." to discard changes in working directory) modified: test.py no changes added to commit (use "git add" and/or "git commit -a") Use git diff to see what lines have changed since the last commit: $ git diff diff --git a/test.py b/test.py index 76b8c39..62782b2 100644 --- a/test.py +++ b/test.py @@ -1 +1 @@ -print('Oops') +print('Ops!') If you try to commit this change now, git complains: $ git commit -m "change the print string" On branch master Changes not staged for commit: modified: test.py no changes added to commit That staged for commit phrase means you need to add the file, which roughly translated means hey git, look over here: $ git add test.py You could have also typed git add . to add all changed files in the current directory; that’s handy when you actually have edited multiple files and want to ensure that you check in all their changes. Now we can commit the change: $ git commit -m "my first change" [master e1e11ec] my first change 1 file changed, 1 insertion(+), 1 deletion(-) If you’d like to see all the terrible things that you’ve done to test.py, most recent first, use git log: $ git log test.py commit e1e11ecf802ae1a78debe6193c552dcd15ca160a Author: William Lubanovic Date: Tue May 13 23:34:59 2014 -0500 change the print string commit 52d60d76594a62299f6fd561b2446c8b1227cfe1 Author: William Lubanovic Date: Tue May 13 23:26:14 2014 -0500 simple print program —————————————— Clone This Book You can get a copy of all the programs in this book. Visit the git repository and follow the directions to copy it to your local machine. If you have git, run the command git clone https://github.com/madscheme/introducing-python to make a git repository on your computer. You can also download the files in zip format. —————————————— How You Can Learn More This is an introduction. It almost certainly says too much about some things that you don’t care about and not enough about some things that you do. Let me recommend some Python resources that I’ve found helpful. —————————————— Books I’ve found the books in the list that follows to be especially useful. These range from introductory to advanced, with mixtures of Python 2 and 3. Barry, Paul. Head First Python. O’Reilly, 2010. Beazley, David M. Python Essential Reference (4th Edition). Addison-Wesley, 2009. Beazley, David M. and Brian K. Jones. Python Cookbook (3rd Edition). O’Reilly, 2013. Chun, Wesley. Core Python Applications Programming (3rd Edition). Prentice Hall, 2012. McKinney, Wes. Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython. O’Reilly, 2012. Summerfield, Mark. Python in Practice: Create Better Programs Using Concurrency, Libraries, and Patterns. Addison-Wesley, 2013. Of course, there are many more. —————————————— Websites Here are some websites where you can find helpful tutorials: ■ Learn Python the Hard Way by Zed Shaw. ■ Dive Into Python 3 by Mark Pilgrim. ■ Mouse Vs. Python by Michael Driscoll. If you’re interested in keeping up with what’s going on in the Pythonic world, check out these news websites: ■ comp.lang.python ■ comp.lang.python.announce ■ python subreddit ■ Planet Python Finally, here are some good websites for downloading code: ■ The Python Package Index ■ stackoverflow Python questions ■ ActiveState Python recipes ■ Python packages trending on GitHub —————————————— Groups Computing communities have varied personalities: enthusiastic, argumentative, dull, hipster, button-down, and many others across a broad range. The Python community is friendly and civil. You can find Python groups based on location—meetups and local user groups around the world. Other groups are distributed and based on common interests. For instance, PyLadies is a support network for women who are interested in Python and open source. —————————————— Conferences Of the many conferences and workshops around the world, the largest are held annually in North America and Europe. —————————————— Coming Attractions But wait, there’s more! Appendixes A, B, and C offer tours of Python in the arts, business, and science. You’ll find at least one package that you’ll want to explore. Bright and shiny objects abound on the net. Only you can tell which are costume jewelry and which are silver bullets. And even if you’re not currently pestered by werewolves, you might want some of those silver bullets in your pocket. Just in case. Finally, we have answers to those annoying end-of-chapter exercises, details on installation of Python and friends, and a few cheat sheets for things that I always need to look up. Your brain is almost certainly better tuned, but they’re there if you need them. ——————————————