Advanced Python or Understanding Python


Uploaded by GoogleTechTalks on 08.10.2007

Transcript:
>>
Hello. Yes, hi. Welcome to the latest in our series of talks in advance topics on programming
languages. Today we're very lucky to have Thomas Wouters whose here to talk to us about
features that are currently available in Python but are advanced features. Last week, we had--we
were very lucky to have Guido, that awesome talk about upcoming features; they will be
talking about existing features, right? Yes? I wanted to--as always, this is the latest
in our series of talks. I always want to make a pitch so that people will actually give
talks of their own. It's a really useful thing to do. If you have specialized knowledge or
if you have interesting knowledge, to share that knowledge with as many Googlers as possible
and potentially even the rest of the world. So, please come and see me and we can set
up a talk and you can give--yet another, in our series of talks here. And with that, I
will turn it over to Thomas who shall give a very interesting topic. Thank you.
>> WOUTERS: Is this on? Right. Although there's an echo. So I'm going to give a talk about
Advanced Python which is not an easy subject in that there's a lot of Python that can be
considered advanced. I'm not sure what your level of advancement is so I'm going to cover
basically everything from the start until the end and we'll see how far we'll cover.
These are the subjects I'll be covering. I'll certainly be explaining what the Python design,
objects, iterators and generators and hopefully decorators. Everything after that is a bonus.
I'm going to keep the pace pretty high, so if there's any questions, just wave your arms
and I'll stop and explain. But if everyone can keep up then we can probably cover the
more interesting topics at the end as well. And if there's specific interest in any of
the topics especially, for instance, Unicode, I can skip over the advanced stuff and jump
right to Unicode. So, first about Python, Python was developed by Guido as a middle
system language between shell scripting and system programming. So it was intended to
be easy to use by normal programmers but still allow more complex structures than a shell
script--scripting language. It turns out that it's also convenient for library programmers.
People doing the actual programming for end users because you can hide a lot of smarts
inside of objects or modules and just expose a very simple API to end users. This has gotten
progressively better with later releases of Python. There are multiple implementations
of Python; CPython is the one everyone uses except for those that are developing the others.
Jython is--has been around for quite awhile it has--is actually in use by a lot of people
but not as much as CPython. IronPython and PyPy are still actively being developed although,
IronPython is apparently very usable. It is designed by implementation in that if you
don't come up with an implementation of a feature that you want implemented, it's not
going to happen. But it's not a slave to the implementation and it doesn't mean that if
you implement an idea, it will get in. It's still a very designed language. We have feature
releases every one to two years. It used to be every year or so and now, the last one
was in the works for two years and there's a lot of bugfix releases as well. An important
note about the feature releases, if you do not get any warnings in using a particular
minor release, upgrading to newer minor release, the one following that minor release will
almost certainly work. Any--anything that changes semantics or possibly breaks code
will be warning for at least one release. And bugfix releases have a similar requirement
that you can always upgrade to a newer bugfix release of the same minor version and nothing
will break that wasn't already broken. Important in realizing how Python works is that everything
is runtime, even compiletime is actually runtime. As you may know, there are cached files, .pyc
files that are cached bytecode, that's basically borrowed runtime, someone else's runtime that
you can skip. Also important is that all execution happens in a namespace and that namespace
is actually a dict, it's a dictionary object with keys and values. Modules, functions and
classes all have their own little namespace and they don't affect other namespaces unless
they explicitly do so. And modules are executed top to bottom. Script start running at line
one and they run all the way down to the end or until loops or whatever make them wait.
So in that regard, it's very much like a script. When you import a module, it does the same
thing, it just starts executing line one and runs on down until in the end and the import
is done. And another import aspect is that the def and class statements don't define
a function or at compilation time, it's a runtime thing. It actually happens when bad
statement is reached. So some more about functions and classes as well because they're important
features. At compiletime which is runtime, the code for the function will be compiled
into a code object. It's a separate object from whatever the rest of the code is compiled
into. And then at def time when the def statement is executed, it will be--the code object will
be wrapped into a function object along with whatever arguments it takes, any defaults
to those arguments. And essentially def is an assignment operation. If you look at the
bytecode, you'll see that it just uses the same bytecodes as normal assignment operations.
So here's an example of a function. The red part is the actual def statement that gets
evaluated when it's reached including the part in blue which is an argument default.
That part gets evaluated at the moment of def. So the Foo object is constructed or the
Foo function is called, if it's a function, at the moment the def is being executed and
the green part has been compiled at that time because compiletime has been in the past but
it's not being executed until later which is when you call the function. So this is
when you call the function, there's an inner function in there, so what happens is that
the green lines you see here, they've been compiled into their own code object and when
you call func, the inner func gets created at that time and again the default argument
of the defaults for the three argument is evaluated at the moment you call def. If you
look after the definition of inner func, you'll see that arg1 and arg2 are reassigned. That
means that whatever arg2 was, will be assigned to inner func's arc3 and not whatever it is
at the moment you call inner func. However, inside inner func, we also use arg1 which
is using [INDISTINCT] scope which I'll be mentioning later as well, which means that
it will be using the outer functions arg1 and that will be whatever arg1 is at the moment
you call inner func. So there's a very big difference between those two uses.
>> Question? >> WOUTERS: Yes?
>> That very final line, was that evaluating inner func or [INDISTINCT] function?
>> WOUTERS: No, it's just returning function object.
>> Okay, thanks. >> WOUTERS: So it's not calling anything,
only the parenthesis do actual calling. Yes? >> [INDISTINCT] could you repeat the question...
>> WOUTERS: Sorry? Yes. >> [INDISTINCT]
>> WOUTERS: The question was is the last line actually evaluating inner func? It is not,
it's just returning the function object that is inner func. So class statements are very
similar but also very different. A class statement is followed by a block of code, a suite, that's
executed when the class statement is reached right away. It is however executed in separate
namespace, a dict again. And then after that code of block has been executed, the namespace
is passed to the--or is used to create the actual class object. Any functions that you
define inside the code block then get turned into methods by magic, basically. And--well,
as people are already programmed in Python will know the self argument of this, for Python,
is passed implicitly when you call a method but your method still have to receive it explicitly
as the first argument. And an important thing to realize is that the code block in a class
is actually a normal code block. You can do whatever you want in here, for instance, at
the top I create a class attribute which is just something stored in the class, result
of calling func. I define a method which is just a normal function but it takes self [INDISTINCT]
argument and it assigns to an attribute of self when it is called. I also define a helper
function which just did some silly arithmetic and call that helper function inside the class
block itself to calculate or recalculate the class header, and at the end I delete helper
because otherwise, it would end up in the class and since it doesn't have self [INDISTINCT]
first argument, it wouldn't do very helpful things. Other important parts about Python,
variables aren't containers like in other languages. They're not allocated memory that
you can use to store data. People often say about Python that everything is an object
and they also sometimes say everything is a reference. That's both true but both are
not true when you apply them to variables because variables are not objects and variables
are not references. Variables are only labels. Everything concrete, everything that allocates
memory, everything you can toss around is an object and whenever you hold an object,
whenever you store it anywhere, you're actually storing reference. You're not actually own
an object, you only own references to an object. Variables don't have type information, they
don't have information on the scope or the variable or how many objects live around or
when they were created, the only thing they do is refer to an object and that's it. If
you want to look at it from an implementation standpoint, it's a dict. It's a dictionary
mapping names to objects, that's it. So, scopes also related to namespaces. Python has a very
simple two scope rule which is actually three scopes. Names are either local or they are
global or built in. Local means it's--it exists in a current function or class or module but
it doesn't exist outside. Global means it exists in the current module, it doesn't mean
exist everywhere, just in a current module. And builtin is a special namespace that adduced
by builtin functions that you can actually modify. It's just a special module that you
can modify yourself if you want to. When the Python compiler examines a function and compiles
a function, it keeps track of whatever names you assign to and assumes correctly because
by definition, those names you assign to are local. You can change this by using the global
declaration which is the only actual declaration Python has. But don't do it too often, it's
usually a sign that your code is incorrect. All other names that the compiler finds are
either global or builtin and the lookup is it looks in global and then it looks in builtin.
So if you have a global of the same name as a builtin, all the code in the module will
find the global of that name instead of the builtin. You can actually use this on other
modules as well. If you import a module and then assign to an attribute of the module,
you'll be assigning to a global name in that module and you can mask whatever "itthings"
is a builtin name by assigning to that attribute. There's also a trick with nested scopes which
were added later in Python. I think in Python 2.1 where you can--as I pointed out before,
you can refer in an inner function to a variable in an outer function but that is only read-only.
You cannot assign to a name in an outer function. This isn't really a mechanical problem in
Python. It would be possible to assign if we added it but there's no syntax to say and
one I assigned to another--outer scope variable. Global only assigns to a global name and not
to outer scopes. Apparently, Python 3 might be getting--Python 3.0 might be getting syntax
for this. So I mentioned modules. Modules are there to organize code. They're very convenient
because they have their own namespace with global names. They also keep the rest of the
module tidy. They're always executed on first import and then cached in sys.modules. This
cache is just about everything that you import in your program and it's also the only thing
that keeps track of what you have imported. So if you toss something out of sys.modules
and import it again, it will get executed again. Import is just syntactic sugar, just
like--basically everything in Python is syntactic sugar. It calls the __import__ builtin function.
If you want to do an import over module whose name you have in a string object, you use
__import. If you want to replace __import, you can just write your own function and replace
the one in the builtin namespace. Another trick is that sys.modules does not have to
contain modules. It's a mapping of name to whatever object you want import to return.
So you can toss in any--any all object in there and then importing that name will return
your objects. Storing none in sys.modules just means this module couldn't be found.
So, if you want to prevent some module from being imported, you can insert in sys.modules
"none" under that name and then it'll raise ImportError whenever you try to import it.
Python objects in general are described under--in various terms, Mutability is a common one.
That means the object is changeable. List, for instance, are mutable, Tuples are not.
Mutability is very important in Python because everything is a reference. If you have a mutable
object and you end up changing it by accident and changing it for everyone, that's rather
inconvenient. So you have to keep in mind whether objects are mutable or not. Fortunately,
this mostly happens by accident correct, anyway. Related concept is Hashability, whether you
can take the hash of an object. In normal operation, mutable objects are not hashable
at most, immutable objects are hashable. For instance, Tuples are hashable but they're
only hashable if all their values are also hashable. Hash--hashes are being used just
for dicts and sets right now, so any dict key has to be hashable in any set item, anything
you want to store in the set has to be hashable as well. But it's not inconceivable that more
stuff uses hashes. And then there's the imaginary abstract base classes that are often referred
saying that an object is file-like or it's a sequence or it's a mapping. Those are abstract
concepts that are somewhat like protocols or interfaces or whatever you want in another
language but they're just informal. They're just saying whenever an object is sequenced;
it has the sequence operations implemented. You can look in and you can iterate over it,
etcetera. Some of them are overlapping, for instance, sequences, mappings and files are
all iterable so the iterable interface applies to all of them. And actually all of the objects
you define, all that you're defining is implementation of syntax. Python defines syntax and you can
plug in whatever implementation you want in that syntax. And you don't do it by--in other
languages you say you implement the less than operator, in Python you say you implement
the less than operation so that it can be used even when you're not actually using the
less than operator. Here's a short list of hooks that Python supports. I'm not going
to go over all of them. Most should be--should be common enough. The one thing that's worth
mentioning is conversions. You can define a few hooks that define how your object gets
converted to other types. That just works for builtin types of course, because the hooks
don't exist for arbitrary types. But most arbitrary types that want to do something
like this can just say if you have an __Foo method, I'll call it to turn you into a Foo
object if I want to. But it's not very common. These things are not Python hooks. These things
you cannot configure on your object. Assignment. Assignment is an operation that changes the
namespace and not the object. Since an object is just the reference to an object, you have
no control over where your references are going. Another thing you cannot control are
type checks. You cannot control when people are comparing your type or doing instance
checks, you cannot control what they get as a result. Related identity comparison, the
"is" operator, it checks whether two references are pointing to or referring to the same objects,
you have no control over that. "And" and "or" and "not" are Boolean Operations and they
just call the one truth value test that you're operated--your object can implement and you
have no control over what they actually return. Some may know "and" and "or" in Python return,
one of the two values were as "not" returns always a Boolean. And Methodcalls, you cannot
define in a single operation whatever will happen when someone calls a method on you
because methods are getting an attribute followed by calling that object. They're two separate
steps. So in order to implement methodcalls in your object, you would have to implement
getattr to return some in go-between object and then have that go-between object do something
when it's called. I'm sorry. So on to some implementation details in C. This is just--applies
to CPython of course. Python objects are implemented in struct that holds various bookkeeping information
like ref counts, the actual type that an object is, as well as arbitrary C data. It can be
pointers, it can be in C, it can be an array, whatever you want. Types are what describes
what an object is and how it behaves. There are separate struct which is also a PyObject
struct. The arbitrary data in a PyType struct is the function pointers and various other
things that describe the type. The PyObject structs are not relocatable. You cannot move
them around once you've given them out to anyone else. It's a blog of memory that's
allocated and that's it. If you want to move it around, you have to destroy the object
which means you have to get rid of all the references to it. That also means that variable
sized objects like lists that have a portion that needs to be reallocated and might move
around, are--just use an extra layer of indirection. It's just a pointer stored somewhere in. And
of course because it's C, it doesn't have ref counting by nature so ref counting is
on manually, whenever you receive a reference you incref and when it's over, whenever you
toss one out, you decref. It sounds simple, it can get quite complicated but the Python
C API is mostly clear enough that it's not too hard once you get used to it. Another
feature that is done reasonably manually is weak references. Weak references are references
that get informed when their object goes away so that they can do some cleaning up and not
crash. The weak references are just callbacks basically in the PyObject struct arbitrary
data. One of the major problems with ref counting is reference cycles. That is two objects that
refer to each other causing them both never to be cleaned up. Two objects referring to
each other as of course to the simple problem and the complex problem is a huge loop of
object referring to objects and everything. Python has a cyclic garbage collector which
keeps track of objects that might participate in cycles, for instance, lists and dicts and
walks them every now and then to find object groups that are not reachable through anything
but themselves and then it cleans up the entire cycle. This is all done in C, if you write
a C extension or a C type, you don't usually have to bother with it too much. There are
some methods you have to implement if you want to participate in cyclic-GC, when you
think you might be participating in cycles. So in Python, when you have an object, what
do you have? Well, you have an object that has two special attributes, the __dict attribute
which is a dictionary holding per object data, and you have the __class attribute which refers
to the class. And like in C, the class defines behavior, and in newer Pythons the class is
actually a PyType object and it all matches. The way to define the behavior is not through
function pointers because Python doesn't have function pointers. It's with specially made
methods. Methods that start and end with __, those are specials Python. They're not special
because they get allocated differently, they're just special because some of them get called
by Python automatically in some situations. You can define your own __ methods and use
them whenever you want. There's nothing special on that regard, it's just the--some of them
signals a Python, this needs to be called whenever that happens. In general, you should
not be calling another objects __ methods yourself, not directly. You should always
use them through whatever mechanism wraps them. So for instance, if you want to have
the string representation of an object, you shouldn't be calling object .__str__, you
should be just be calling StrObject. Another feature of Python is that class attributes,
that is attributes of the class object, are also reachable through the instance. If you
do the instance.attr and it's not defined in the __dict, it would be fetched from the
class and the class might have pair of classes so they--those get searched as well. And of
course in Python, the whole point of Python is that you don't have to do stuff that Python
can do for you, so refcounting, weak references and cyclic-GC are all done automatically,
you never have to worry about them. Typical Python code does not use type checks because--partly
because it was never possible until a couple of years ago to sub class builtin types. So
pretending to be a builtin type meant that other people would not have to use type--must
not use type checks or you could never pretend to be a builtin type. It's also very convenient,
it just means you can pretend to be whatever type you want, implement the right methods
and it'll just work. The C layer sometimes needs specific types, if you want to multiply
a string by an integer, it needs to have, actually have a string and an integer or there
won't be any multiplication in C. So the C layer has conversions. When it wants an integer,
there are special methods that say I need an integer at this argument, it will convert
whatever gets passed in--and it will convert whatever get passed in to an integer or a
string and do the multiplication that way. If you really must use type checks, for instance,
because you're faking C code or you're wrapping C code or whatever, use isinstance instead
of type. Checking type for a specific value means you could--you only accept objects of
that type where isinstance does proper instance checks so that someone who does subclass whatever
type you expect works the right way. Functions become methods by magic in Python, it happens
when you get an attribute off of an object or rather when you get an attribute of a class
through an object. Whenever you get an attribute of a class that is a function, it get turn—-gets
turn into unbound method which is a method that knows it's a method and it knows which
class it belongs to but it doesn't know which instance, so when you call that method it
knows that the first argument you passed must be an instance of that class and then it'll
complain if it's not an instance of the right type. Of course that type check uses isinstance,
so when you have the methods on bound--or if you have an on bound method you can pass
a subclass of whatever class it express--expects and it works. Bound methods on the other hand
are methods that know they're instance and they will be passing that argument automatically
so you call them without the first argument, you start at the second argument and it all
looks normal. So any questions so far? No? All right, on to iterators then.
[pause] >> WOUTERS: So iterators in Python are really
simple, they are helper objects for iteration; they encapsulate, if you will, the iteration
state and they're created and used automatically. If you don't want to bother with them, you
don't have to and it all just happens by magic. If you do want to bother with them, you can
do a lot of fun stuff with them, even more so if you combine them with generators which
I'll be talking about next. Iterators are two methods, __iter__ and next, notice that
there's no __ around next, because next is actually intended to be called directly sometimes.
So there's no __ around it or people would think that they shouldn't be calling it. Because
they're simple, iterators in Python are not rewindable, they're not reversible, they're
not copyable, none of that, they're the lowest common denominator in iterators. There are
however ways to write reversible iterators if you want, you can just write your own iterator
class and add methods to rewind or reverse or whatever. You can also just nest iterators.
And iterators themselves are iterable, they are just—-actually an iterator is required
in its own __iter method to return itself or it wouldn't be an actual iterator. So in
the example I have here, I create—-exclusively create an iterator over range which is a list
of number from 0 to 10, not including 10, and then I call zip on it which takes a number
of iterables and takes one element of each iterable and wraps it in tuple and stores
that in a list which it returns. So I created a single iterator pass the same iterator to
zip twice, and as you can see zip takes one element of each iterator consuming the iterator
as it goes and ends up with having two elements at a time basically from the original list
of zero through nine. So generators are a convenient way to write iterators, they are
lazy functions that get executed as you iterate over the result basically. Generators use
the yield key keyword instead of return, it works very much the same way except after
a yield statement execution continues they're--the next time you call it or the next time you
iterate. The function with yield in it will return a generator, it will return an iterator
that you can loop over and there's—-this is terribly convenient for __iter__ methods
because you can just write what you would expect in it'll just work. In Python 2.5 generators
were complexified and they can now also receive data and--and you can use them to build co-routines
very easily, nothing you cannot also do with 2.4 and earlier iterators, just more conveniently
and with extra magic. There's a lot of generator-like iterators in the iter tools module which is
I think new in Python 2.2 or 2.3, there's a whole bunch of stuff in there you can use
to chain and filter and whatever with iterators that are just really convenient for chaining
their various combinations. So here's a generator example, the bottom function is map, as you
may--may know it accept--it only takes one iterable that goes in map function, it creates
a list of results from applying the function to every item in the iterable--iterable. In
the bottom, there's a two line function that is the generator version and then there's
a one line function that is the original map implemented in terms of the generator. As
you can see you just basically lose half the function if you use a generator because you
generate items--each item on the fly and you return generator instead of an actual list,
any questions? >> [INDISTINCT] now you can do it like that?
>> WOUTERS: Can I define what a co-routines and how would I do it in Python? A co-routine
is a routine that is basically like the generator, it stops execution, passing data to someone
else but where a generator returns results to its caller, a co-routine returns results
to someone else, another function. So you can have two routines where they both consume
the output of the other and then the end result is handled data. How would you do them in
Python? Well as I said you can do them in Python 2.5 with the new sending data to a
generator thing, before 2.5 you would write a class that was itself an iterator and just
write it in bits and pieces and--it wouldn't be as convenient as co-routines in actual
languages that have co-routines because you don't have a single blog of code, you have
a whole bunch of separate pieces of code that get called at the right time, I wouldn't bother
implementing them in Python right now. Maybe with 2.5 outer 2.5 out and people getting
used to the new stuff you can do with generators will get actual coroutines in Python. Yes?
>> [INDISTINCT] >> WOUTERS: The question was if generators
are lazy, can you write a generator that loops infinitely and just keeps on yielding results
as long as you ask for it? Yes, yes there are...
>> Then you just ask for finite number? >> WOUTERS: Well it wouldn't—-do you ask
for a finite number? You can if you use the iter tools module, you can slice iterators,
you can say up to so many items in this iterator and it'll return in iterator that starts consuming
the original iterator until the slice ends. But you don't have to do that. The iterator
is something you loop over, so if you have a loop that loops over two things at once
and it stops whenever the condition is met, if you don't--and you can just loop over an
infinite object and rely on the fact that your loop will end itself for other reasons
than that the iterator stops. There's actually an infinite number of infinite iterators in
the iter tools module like itertools.count which starts counting in zero and just keeps
on counting forever and ever, unless you stop for some reason.
[pause] >> WOUTERS: So on to decorators. Decorators
are syntax in Python 2.4 for a feature that's been around forever which is function wrapping.
Decorators are just funcs—-are functions or callables rather, that take a function
as argument and return replacement callable or they can return the original function if
they want. However, the syntax means that it can get confusing if you have a decorator
that also has to accept its own arguments because now you have a function that should
return a callable that should accept a callable and then return the original callable, so
we'll see some examples of that. Another problem is that because functions are special when
they're attributes of a class they become methods, when you have the decorators that
returns something that's not an actual Python function but something that acts like a Python
function for--in most regards, it won't get turned into method unless you implement the
same magic that methods are--that functions have that turn them into methods which is
__get which I'll maybe explain later. So if you write decorators make sure their methods
or make sure they're functions or they won't get turned into methods. And anyway, as I
said simple concept but the devil is in detail and we'll see about that. Here's a simple
decorator, it's a spam decorator that says whenever the function that you decorate, the
same function at the bottom is called, it actually calls the wrapper function at the
top which loops for ten times calling dif--original function. So in this piece of code the original
function get a called--gets called ten times and there's no return value of the actual
function call which is [INDISTINCT] then the original either. Here's an example of a decorator
with an argument, the decorator is spam ten which is not--no longer--spam is no longer
the decorator, it's rather the decorator generator that takes a number which is a number of times
to repeat. And then in spam, we define the decorator function which accepts the function
as an argument and then has a wrapper function which codes the original function and then
return--return wrapper and then of course return the decorator from the decorator generator.
So that will--looks--looks okay, I mean, it takes some getting used to, the nesting and
everything. But there's another problem, what about interspection? Maybe Python users don't
care about the name of their function but some tools do, like Pydot for instance, which
is the Python documentation tool, it actually looks at function objects and looks the other
name and their signature and their DocString and whatever. And because we replaced the
original function, when you asked for the documentation of sing, it'll actually say,
it's called the function wrapper and it has a signature of *args and **kwargs and it has
no DocString. That's probably not what you want. Another thing is that if you have another
decorator in there, you can chain the decorators that changes attributes of the function; those
changes will go away because you're not doing anything with the attributes of the function.
So, some people write code like this which is the original spam with the repeats argument,
with the decorator function in there, with the wrapper function in there. And then after
defining wrapper, we assigned the _name_, the DocString, the _module_ and--and the dict
of the original function back to wrapper so that all those things will actually be the
same for the new function and the old one. And assigning dict like that actually works,
you can--you can copy—-its not a copy it's a reference assignment, the original function—-all
of the original function dicts or attributes will be accessible in the wrapper function
and when you assign to either one of them it'll--it'll appear in the other one as well,
they just share their attributes space. Now this is not the easiest way to write a decorator,
so in Python 2.5 in the func tools module there is a decorator for decorators that does
this for you. So you have a decorator that you apply to decorator or to decorator generators
and then that decorator generated decorator gets applied to whatever function you have.
So as you can see, the devil is in the details, they can get confusing somewhat quickly, any
questions? All right, how are we for time? >> You got [INDISTINCT].
>> WOUTERS: All right. New-style classes. When I say new-style classes, when anyone
says new-style classes they actually mean all old new-style classes because they were
added in Python 2.2 which was released I think six or seven years ago something like that,
they're old. And it's a unification of the C types that I explained and the Python classes
because before or actually still in classic classes, instances and classes are distinct
C types. There is a C type called class object or class obj—-that implements all the magic
that I talked about--about turning methods into or turning functions into methods and
there's the C type instance which make sure that instances work as expected with the __dict
and everything. So they're separate types and if you ask for the type of an Int it will
say it's an Int, but if you ask a type of—-about the type of any class that tries to be an
Int, it'll say "it's an instance". So you have no way of checking all that. And another—-a
problem with the original approach was that you cannot sub class builtin types, so Guido
worked on--in Python 2.2 on unifying C types and Python classes and the end result is pretty
good. You can mix them or match them and everything, it worked good. But it only works because
a lot of new general mechanisms were added. They were necessary to bridge divide between
C objects and Python types and things that you assigned from Python have to be inserted
in as a C data type in a C struct rather than as a Python object pointer. Classic classes
are still the default so if you write a new class and you don't specifically make it a
new-style class, it'll still be a classic class and that was for compatibility reasons
because there's a lot of stuff that's slightly different between—-well a lot of stuff.
There's a couple of things that are slightly different between classic classes and new-style
classes mostly with multiple inheritance and you can check if any class or--actually any
instance of a class is an instance of a new-style class because it inherits from object instead
of nothing. So you can do actually do is isinstance, my thing object and you'll know that it's
an instance of a new-style class. The first of the mechanism that I am going to explain
is descriptors. [pause]
>> WOUTERS: Descriptors are a generalization of the magic that happened with functions
to turn them into methods. A descriptor is any object that lives in the class namespace
that are class attribute space, so it's an attribute of the class and that has __get__,
__set__, __delete__ methods. It doesn't have to be--have all of them I think, you can just
do it __get or __set or __delete for specific operations. Whenever an attribute is retrieved--attempted
to be retrieved from an object whose class has an attribute with a descriptor in it,
those methods on the descriptor will be called and the result of those calls will be passed
back to the object. Same for setting, it'll call the set method no result occurs and deleting
they call the delete method. The delete method is not called __del__ because that was already
taken for some other hook apparently. It's now the method behind--the mechanism behind
methods. So if you want to have a function like object that behaves the same way as functions
do becoming a method, you can do that by defining __get__ and it's also the mechanism behind
property which is a trick of hiding accessories behind normal attribute access. So here's
an example of properties, I'm not going to give an example of actual descriptors because
it's too boring and you won't be using it anyway, but here's a property. We have a class,
we define the function get prop, it takes a self argument even though it's not going
to be a method, it takes a self argument, it doesn't stop there and to return whatever
the value is of the extra property. And then we wrap it in property and store it in a local—-in
a name that'll eventually be a class attribute. Oh, I see I have an error right there. So
we instantiate the class, I should have had Foo instead of X there and then we do foo.prop,
foo.prop calls to the get prop and because it's a property, even though it's not a method
because it's just a function inside the class log, the property type knows that it needs
to pass the instance for convenience, the instance that it actually--is called on, onto
the function that wraps it. If you look at this you can say, "Oh, this can be a decorator,
too," and it's true you can just say at property at the top of def get_prop, except that property
takes multiple arguments you can also pass a set prop and the del prop if you want it,
I didn't do it in this example for brevity, but if you just have a get there, you can
just say at the top at property instead of prop = property(get_prop) at the bottom, any
questions about this? All right. So the other general mechanisms, kind of related, they're
also descriptors, classmethods and staticmethods. Before Python 2.2, Python only had "instance
methods" that is normal methods, methods that take self as the first argument, they got
called in an instance and if you try to call them on a class without having an instance,
you get an error. So in the Python 2.2, we got classmethods and staticmethods. Classmethods
take class as the actual class objects as the first argument and that's terribly useful,
I'll show why in a moment. Staticmethods take no magic argument and their not very useful
even though Java and C++ programmers come into Python often say, "Oh I need a staticmethod
for this." Generally not, they're only useful for one particular thing and I'll show that
in a minute. So here's a classmethod. Again it's--you can--if you're using Python 2.4,
you can use classmethod at classmethod at the top for the decorator syntax, if not you'll
have to use it at the bottom. So say we have a FancyDict, that sub class of dict, and we
define a method to create a dict from a set of keys with the single value, so we don't
have to generate this list of key value pairs, we can just say "generate it from keys". So
what I have here is we create a list of that key value pairs and pass that to the class
and because it's a class method and gets the actual class passed, we can call it on any
sub class for FancyDict without anything, in particular happening in the sub classes,
and it'll create a sub class of FancyDict instead of a FancyDict itself. So whenever
you think, "Oh, I should have a staticmethod and I'll do something with my own class,"
you should actually use a classmethod and do something with the first argument. Now,
this is a rather silly example because dict already has this exact thing implemented.
There's already a fromkeys method that is a class method in the dict type and it's very
useful whenever you sub class dict which is not too often. Anyway at the bottom it's shown
what happens when you use it. So staticmethods, they're not very useful, the main use is protecting
dependencies from becoming methods. When you use dependency injection as I do here in the
example, you don't know what you're actually injecting into your class. If it happens to
be a method or--if it happens to be a function or something that does something magical when
used as an attribute of a class, this won't do what you want it to do; it won't do the
same thing as calling sendmail.sendmail where I'm now calling self.sendmail. So you can
wrap it in a staticmethods to prevent it from becoming an actual method. That's the only
thing I've ever seen that makes sense for using staticmethod. Although as we'll see
later, Python actually has a staticmethod itself which is a good example of something
that should have been a classmethod. Another new feature, __slots__ which is for omitting
the __dict__ attribute for arbitrary attributes, it basically prevents you from assigning arbitrary
attributes to another object. It reduces memory use because the dict isn't allocated and it's
a more compact form to store the same number of attributes as a dict, but it's not going
to be much faster than a dict even for a few attributes. The main reason to have it is
when you want to emulate or actually implement immutable Python classes like we add immutable
types where you don't want attributes assigned to them arbitrarily and then your well-—there's
a tiny, tiny, tiny class for showing slots, they're right there. If you actually try to
assign to something other than self the value, either in init or anywhere else, it'll actually
throw an exception, except for stuff that's already in the class but, I mean, __init the
def statement won't be throwing an exception of course, because Python knows that it's
already there. Another new thing in Python 2.2 is the actual constructor. Before Python
2.2 there was just __init which is an initializer and other constructor, it gets called after
the object has been created and it's your job to initialize it and set the right attributes
and whatever, but the data is already there, the objects are already there. So __new is
called actually to construct an object, allocate method--or allocate memory for it, make sure
it's alright. In Python, it's used--actually just for implementing immutable types because
if you have an __init to set attributes, it's too late because the types who are to be created
so it can be immutable if you can assign to it in __init. So you need to do it in __new.
And this is the example of a staticmethod that shouldn't actually be a staticmethod.
It cannot be an instancemethod because its job is to create the instance, so there's
no instance to be a method of it. So, Guido made it a staticmethod before he added classmethods
in the development cycle of Python 2.2, I think. It could have been a classmethod but,
well it's too late now. It's the staticmethod that takes the actual classes' first argument
and you need to pass it around explicitly whenever you call the super classes new. When
you want to actually implement __new, you generally always call object on __new or your
super class __new to do the actual location because there is no other way to allocate
memory in Python. However, your __new method or staticmethod can also return existing types,
existing objects, you can just return any object you want from __new whereas __init,
either has to return self or return none because there is nothing else--you can't actually
change whatever it returns, __new you can return whatever you want and that's the result
of instantiating your class. There are--there's one caveat there, when you return an instance
of your own class, whether it's an actual instance of your class or a sub class of your
class, the __init is still called, even if it's an already created object. That's because
python can't know that you're __new is returning an old object or a new object. So it always
calls __init. Of course, if you return something that's not your type, that's not a sub class
of your type. It knows that it's already been initialized, so it doesn't call under __init.
So, here's an example of an __new, WrappingInt which is an immutable type in Python. We set
slots to the empty list so it doesn't get arbitrary attributes. And then in __new we
take the value which is whatever our constructor was called with. We modular it so it doesn't
go pass 255 or 256 or whatever. And then we create--we actually create self by calling
the parent class method. As you can see here, it's a staticmethod because, even though we're
calling it on a class, we're passing the class we were passed as well exquisitely. Any questions
so far? Yes. >> You do you make a—-an object of view
[INDISTINCT]. How do you define class? [INDISTINCT] make sure if it is immutable.
>> WOUTERS: How do you create class and make sure it's immutable? By not providing any
of the things that are--that mutate to the object. So, for instance, this is an easy
example because Int is its own value, so you're not storing anything in the value. We don't
have—-we don't accept arbitrary—-attributes and we let our parent create the object and
it's done. If you want to not sub class ability to immutable type, you have to do a lot of
more work because you need somewhere to store the data and then provide properties to read
the data but not write the data. That's basically how you create. So, you do the same thing
as here and you have some magic in here that sets a hidden variable basically, that then
properties can't get out but nobody else can get right access to. It's not easy and it's
not, usually not worth it. Mostly Python classes just are--just implemented in terms of existing
types and if you want an immutable type you either want something that is int like, string
like, or tuple like and you can just sub class in string or tuple and be done with it. Any
other questions? Alright, is there any interested in Metaclasses? So I mentioned them--alright.
So, Metaclasses are these theoretically, satisfying and mind boggling concept where you can draw
this graphs between what the classes and it's Metaclass and what the class of the Metaclasses
and then what instance-—where object its an instance of the--the general ideas that
the Metaclass is the class of a class. It's the type that are classes, it's whatever implements
a class. And of course the Metaclass is an instance of something, so the Metaclass has
a Metaclass and they are all in Python, the base Metaclass is called type. And type's
type is type. And type is actually an object; the parent of the type of—-parent class
of type is object. All objects have a base class that's object, so you can see how it
gets confusing. Of course, the type of object is type, so you can draw very funny graphs
that way, but it's all, you know, it doesn't matter because in python everything is simple
and you can just--you can just say, the class--the Metaclasses, the class that creates the class.
So, of course it doesn't apply to type or object because they are created secretly in
python by magic, but every other class you define is Metaclasses that whose job it is
to create the class from that name space that is the cold walk of the class. So, we go [INDISTINCT],
yes well, if we go back to all the way up to the class, here, this is all done before
the Metaclasses is even looked at. And then when this piece of code, the blue parts and
the green parts are all compiled and nicely wrapped up in the name space executed, nicely
wrapped up in the name space, then that dict is passed to the Metaclass along with a name
and the parents whatever you want to sub class. And it says, you know, "Create me a class
and return it." And then the result of that instantiation is your new class. So, it's practically simple. And whatever
you want to use it for in Python you can use it for the stuff that you normally define
in a class to define how an instance of the class behaves. You can do the very same thing
with a Metaclass and it'll define how the class behaves. So the magic that creates,
for instance, functions which is hidden in the class and stuff that calls descriptors
which is hidden in the class is actually called by get at or get attribute which is—-I should
probably should have mentioned. __new and __init are called to construct the class in
to initialize the class, that all happens to same way that you would expect. So you
can overwrite them and you can do as many things as you want, the thing there most useful
for is post processing a class just doing some wonderful renaming or mangling or other
stuff with a class after it's been defined, before it's visible to any other Python code
without requiring an explicit post processing step. As I said, you can do evil stuff in
get at or get attribute if you want. It's probably not worth it. It'll just make your
code much more complex. So here's a Metaclass example in Python. We--sub plus type because
it's convenient and I think it's necessary as well. You define in __new which is the
staticmethod as usual. I forgot to mention, you don't actually have to explicitly make
new staticmethod but you can, if you want to. The under--the Metaclasses __new gets
passed—-the class that is actually ourselves because it's a staticmethod of ourselves or
a class method of ourselves. The name of the class that needs to be created, the basis
which is the tupule of the basis that are named in the class statement and the dict
that is the name space that--of the code object, it's just a dict with attributes. So what
we do here is we go over all the attributes, we skip any of that start with __ because
you don't want, by accident do the wrong thing with the magic methods. And then we name mangle
whatever attributes are leftover by appending or prepending Foo under--to them to make them
look funny, we delete the original attribute and then we--at the end we call the base class
new with the same arguments but with the modified attributes. And then to use it, to use the
Metaclass you have to explicitly tell Python to use a different Metaclass than whatever
is the default which is type. You do __metaclass__ is MangledType in the class dict or if you
want at the top of the module. Now Metaclasses is getting heritage, so if you sub class mangled
class you automatically get the mangled type as Metaclass. If you want to sub class and
have a different type your—-or different Metaclass--your Metaclass has to be a sub
class of your super classes Metaclass. So if I wanted to have a more mangled class with
a more mangled Metaclass, I have to sub class mangling type to get a more mangling type
and have that as my Metaclass. So, any questions there?
>> Are mangle type [INDISTINCT]? >> WOUTERS: Yes, sorry that's a typo. The—-it
should say mangling type at the bottom and not mangled type. Yes.
>> I think I remember that django is a Metaclass at the bottom [INDISTINCT], is that true?
>> WOUTERS: Yes. >> Do you know how would it do?
>> WOUTERS: Yes. I don't have an example right now. I have some django code and it's very
interesting how it works in django. Sorry. The question was django uses Metaclasses and
how is that done. Django has an API where you define a class with various attributes
to describe your data model. And then you can have some extra classes inside the class
to describe admin options and display options and whatever. What it does is it actually--just
like this, it goes over all the attributes that were the result of executing the code
inside the class statement and examines them. And the order doesn't matter to django for
the most part but where it does order it abuses the fact when compiling a class for executing
a class, it executes top--top to bottom. So it calls a field, let me see, you do--a field
is a particular type and the type is a--is an instantiation of a class. So you say, "field
is in field" and "field two is car field". And it keeps track of--in what order are those
type were created by keeping a global counter for instance, so I'd know which order the
fields are in the class statement. That's about the only magic thing that the django
thing does. And then, for the rest is just--is just examining whatever's in the others dict
that gets passed in the Metaclass to write down the sequel that's whatever the database
backhand needs to store and whatever options there are, etcetera. Does that answer your
question? >> Yes.
>> WOUTERS: Alright. Any other questions? All right. So, I'm going to cover multiple
inheritance unless everyone here goes, "No, no, don't ever use multiple inheritance,"
alright? So, multiple inheritance in Python and in other languages is something that's
frequently argued whether it's good enough or sane or insane. Well, it's generally not
a good idea but occasionally, especially in python, it could make stuff a lot easier.
New style objects have a different way of doing multiple inheritance in that they usually
C3 method resolution order which is an algorithm described, I think in a Dylan paper, describing
how to do when you have multiple inheritance, in particular, involving diamonds where multiple
super classes inherit from the same super, super class, how to handle that correctly.
And the algorithm is pretty simple, it just goes a depth-first, left-to-right search through
all the classes, but then it removes all duplicates except the last one. So, if a class appears
two times, it'll be visited, not the first time it appears, but the last time it appears.
And in Python, we also have a convenience--contingency object called super which can help you continue
method resolution... All your parent classes are visited after you're visited are therefore
visited before you're visited. But, you might—-they might not be the next phase class, if you're
super class might not be the next super class to be visited and your sub classes might not
have been visited right before you. That's rather important to realize. So calling your
base class method directly saying "my parent.foo" whatever, is never right. Because there's
always going to be some situation where that will do wrong thing and skip classes when
visiting methods. So, the way to do it in Python is you have a single base class with
a full interface of whatever your tree is going to implement, so that any object can
always call whatever method they want to call within that tree on any other class within
that tree. But--in usual cases those--those functions will probably be do nothing functions.
They shouldn't raise errors because then you can't safely call them all the time but they
should—-if anything, they should do nothing. The signature of those methods should never
change because you cannot know which order of classes will be called in. If you have
to change a signature of a method in a particular part of the multiple inheritances tree, you
should have a second master basically, in the multiple inheritance tree and make sure
that it's a separate section of the multiple inheritance tree. And then you should have-—you
should use super everywhere you want to have anything from a base class, anywhere. And
that can be annoying because all of your code has to follow it in all of the classes and
you're usually not the only one developing all those classes, so you have to convince
everyone to use super everywhere. And as I can show here, using super is not actually
convenient right now. You call super, passing this class, the current class the class you're
coding in, and the instance or class if you want-—if you have class methods that you
were passed, you were called on. That's--that returns a proxy object and then you can, you
can use that as if it was self to call the original method. It's not too convenient and
I hope we'll have syntax for doing super calls in python 3.O, maybe sooner, but, I'm not
holding my breathe. Any questions about these? All right. I'm going to cover Unicode then
if we have time. >> [INDISTINCT]
>> WOUTERS: This [INDISTINCT] >> [INDISTINCT]
>> WOUTERS: So, Unicode, it's somewhat longer topic though.
>> [INDISTINCT] >> WOUTERS: No, just twice as long as the
previous topic or whatever. So, Unicode is a standard for describing characters. The
way byte strings describe bytes, you say, "A is represented by 65", a Unicode you say,
"A is code point 65" and--there's no relation between Unicode as such and bytes on the disk.
In Python that means the old store type that string holds bytes, it's a byte string, we
call it a byte string nowadays. And Unicode object or Unicode strings they hold characters
which is for ASCII is actually the same thing. But for non ASCII, it's entirely different.
A Unicode has no on disk representation as such, so if you want to store Unicode in disk
or even in memory, you need to use an encoding. Python internally uses either UTF-16 or UCS-4
encodings or UCS-2 and UCS-4 depending on how you exactly define what Python does. But
it uses either 2 or 4 bytes to represent each character, whereas, bytes strings they always
use 1 byte for every character or byte. When you had a byte string and you want to have
a Unicode string, a Unicode object, you have to decode the byte string into Unicode. And
if you--if you have a Unicode string and you want to store it somewhere, you want to pass
it over to the network or when you write it to disk; you have to in code Unicode into
bytes. A convenient encoding is UTF-8, and some people confused UTF-8 with Unicode, for
instance, Postgres, the database, has an encoding code Unicode which is actually UTF-8 and not
Unicode. It's one of those mistakes many people make but UTF-8 is not Unicode. UTF-8 is a
Unicode encoding and it's Unicode encoding that looks like ASCII to all Americans who
are people who don't care about accents or funny characters. But it can actually encode
all of Unicode and it does so by basically screwing Chinese and Japanese people by having
all of their characters be like 7 or 8 different bytes. So, in Python, Unicode is pretty convenient,
except for--it mixes the strings. You can have Unicode literals which look just like
string literals except you have a couple of more escapes besides the normal backslash
X escapes and backslash O, and backslash 0, I mean. You can have backslash U which is
a short Unicode escape and a backslash capital U which is a long Unicode escape. The short
U has, as you can see, 2 bytes and then the long U has 4 bytes. And the long U isn't very
necessary until you start working with the higher code planes that were added last to
Unicode and anything. Also, instead of Chr, the heart to create a single character, you
have Unichr which creates from a number or any character, any Unicode character. And
we support in the compiler at compile time Unicode names. You can use backslash and then
in the curly braces, the name for any Unicode code point. Then the code defines all these
names, we have them all in the interpreter at compile time, that actually results in
a single character Unicode string with a euro sign in there. It's a single character Unicode,
but when you encode it in—-in an encoding that supports the euro sign, it'll look actually
an entirely different character or multiple bytes or whatever. If you want to do this
name look ups in reverse, if you have a character you want to look up in the Unicode name, the
Unicode data module does that for you. If you have the name and you want the actual
character, Unicode data does that, too. So Unicode objects, they behave exactly like
strings, it's very convenient, you can slice them and you actually slicing characters instead
of encoded data. The length is right, you can iterate every character, everything is
great, except when they mix with non-ASCII byte strings. When they mix with ASCII byte
strings, Python will automatically upgrade the byte string into a Unicode object which
is with the ASCII encoding. So that works for ASCII bytes strings, but if there's a
funny character in the byte string that's not ASCII, it'll blow up, because it tries
to interpret it as ASCII and it sees that it's actually not ASCII and it doesn't know
what you want to do with it, so don't do that. Another problem with the python approach is
that the decode and encode methods of strings and Unicode objects are generalized. They
don't just do encoding to byte string or decoding into Unicode object, you can actually convert
strings into strings and byte strings into byte strings and integers into whatever you
want or two integers. It's--it's inconvenient and I'm not entirely sure if that should be
fixed or not, but it's inconvenient when you only care about Unicode. On the other hand
they do have convenient features. So, using Unicode in python is very simple. Never mix
Unicode objects and byte strings. It's a very simple rule, if you do that everything would
be great, except of course it's not always easy not to mix byte strings in Unicode. If
you write libraries, you might get pass the string when you don't expect it or you get--might
get pass a Unicode object when you don't expect it. If you have your application--if you write
your application, you have to make sure that anywhere you get a string; you get it as Unicode
object or you get it as a byte string and you translate it yourself. So decode--the
best way to do it is decode byte strings into Unicode when you get them and in code Unicode
into whatever output you have when you're outputting. And of course you have to remember
to use the right encoding, so you have to remember what the encoding would be like when
you get input or should output and there's no way to detect and encoding. Because it's
just bytes and there's no marker in there that says "this is UTF-8" or "this is a latin-1"
or whatever, or UTF-16 for that matter. And it all looks vaguely familiar when you're
actually looking at the bytes, but that might not mean that's the correct thing to decode
with that encoding. Fortunately, if you can figure out which encoding to use Python does
have some conveniency functions and modules, the codecs module, in particular. Codecs module
has an open function that behaves like the builtin open function, except it takes on
encoding and it'll automatically decode data as a—-as if you just read from it. So when
you read from codecs with open objects, you're actually reading Unicode and you write Unicode
to it it'll automatically encode it into whatever encode you passed in. If you want to wrap
existing streams, like sockets or files you partially read, you can use codecs to get
reader or get writer to transform the on the file transform the string from byte string
to Unicode or the other way around. And lastly, when you do write stuff like it and you're
debugging your code and there's some problem with mixing Unicode and byte strings, pay
attention to the exceptions because there's two exceptions you could get; there's Unicode
decoder which you get when decoding a byte string into Unicode goes wrong and there's
Unicode encoder which is the other way around. And if you use str.encode, so you're trying
to encode a byte string into a Unicode encoding, what it'll actually do is silently try to
decode the string first into Unicode and then encode it with the encoding you gave it. So,
that actually tries to apply the default encoding which is ASCII to str, and even though you
call str at encode, you will be getting a decode error if it ends up that str is not
an ASCII string. So, that was it. That was all my topics I'm glad we went over all them.
Here's some more information if you want; descriptors, metaclasses and super all describe
in Guido's original descrintro tutorial for python 2.2 which is still very relevant. Iterators
and generators are well described in a--under kooklinks tutorial on functional programming,
you don't have to follow the whole thing if you don't like functional programming, but
the parts about iterators and generators are very good. And if you want to learn about
writing C types, the standard documentation is a great source, as well as the Python's
source, the actual Python modules are all in the same API and they're very readable
source even if I do say so myself and it's highly recommended. And always the python
standard library and the Python C code are all great sources. That was it. Any more questions?
>> [INDISTINCT] somewhere, that we can get up?
>> WOUTERS: I can put it up somewhere. >> How about the previous presentation about
the upcoming [INDISTINCT]. >> WOUTERS: Sorry?
>> The previous presentation, I guess, it was [INDISTINCT] wheel about the feature of
Python? Is there any record of that somewhere that we can look at up?
>> WOUTERS: There's like a four or five different movies, sorry.
>> It's on my laptop you can upload it. >> Okay great.
>> WOUTERS: Any other questions? >> What's
a [INDISTINCT]-—what's the good resource for a sort of module import resolution? You
know, like, when you're changing—-when you're moving Python [INDISTINCT] from one place
to another. [INDISTINCT] and is there, like a, standard way of how you do of all that.
>> WOUTERS: So, you mean from one string to another?
>> From, you know, one string to another or what [INDISTINCT] code base [INDISTINCT].
You start mixing it [INDISTINCT] and things like that or in like [INDISTINCT] libraries.
It's got to be like when you do [INDISTINCT]... >> WOUTERS: Usually a byte store...