Google Python Class Day 2 Part 2


Uploaded by GoogleDevelopers on 01.03.2010

Transcript:
>> PARLANTE: So, in this section, I want to play up this idea of modules of existing code
that you might want to use to just sort of solve common problems. In this case, I'm going
to show you some file system interface stuff and also how you call an external process
and capture its output and the like to do something. So you can imagine using Python
sort of--you might use Bash but just--it's sort of a better Bash to sort of glue something
together some sort of--some sort of utility. So I'll start off in the interpreter here,
fire up Python. And the first module I want to talk about is the OS module since we're
operating system, I think, and I'm just going to do a DIR on it. So I import the OS module.
I'm going to look inside of there, and you can see there are all sorts of functions in
there. There's obviously, you know, "setpgid" and "nice." There are obviously kind of operating
system-oriented utilities, have a very kind of a UNIX-y feeling. In theory, these are--these
try to be platform-independent. So if you write a--wrote a Python program and it's running
on Windows, some of these are stubbed out where you could call, you know, and try to
get the current time or whatever, and it's going to translate it. I don't believe it's
done perfectly, but it tries. So going through those, there's a--there's at least, theoretically,
a degree of platform independence. So I would like to show you a couple--so obviously, there's
tons of stuff in here. The one I'd like to show you for starters is listdir. That one.
So, actually, I could do help on it. Just to show you how that works, so I say, "os.listdir."
So it says--okay, what this does, it takes--nice summary--it takes a path and it's going to
give me a list of strings. So what it's going to do is I give it a path to a directory and
then it's going to figure out what all the filenames are in that directory and just return
it to me as a list of Python strings. So let me go to the interpreter here. So to demonstrate
this, what I thought I'll do is I'll modify the long-suffering hello.py example to just,
you know, I don't know, list files. So I'll say "import os" here. I'll rename this list,
upper case L. Let's say this will take a directory. So let's see, I'll say, "filenames = os.listdir
(dir)" and I just sort of as I've been encouraging you to do for the exercise--well, I'll just
print what that gives me for starters. So here I'll say, lists are here in the main.
I'll just leave it the way it is. So I'll just assume that there's one command line
argument and I'll just list it. So, hopefully, it's in the same directory. So if I say, "hello."
and you could see it's a, you know, it's found so I'll do an ls; that way, we can access
this information. So there's this .DS_Store thing that the Macintosh, like, pathologically
puts everywhere, and other than that, you'll see there's just kind of regular file names.
So let me make this code do something a little more interesting. So at least I printed that
it's there. So here, I'll--let's loop through them. So I'll say, "for filename in filenames:"
sort of typical kind of thing. So one thing I can do, if I want to make a path out of
this--but what's important to understand is that when you do a listdir to get file names
out of a directory, just that filename on its own, just out in space, is not a valid
path, right? It needs to be connected to the directory it came from to make a valid path.
So the way I could do that--and sort--you always sort--you always sort--as soon as you
call listdir, you're disconnecting the filename from path. So you have to realize, you've
immediately--now, with the current directory as dot or something, you might be able to
kind of fudge around some of these but then you'd have a bug if someone was running in
a different directory. So in the OS package, there's an OS--there's a subpart called os.path.
And inside of os.path, there are utilities for manipulating file paths; taking them apart,
putting them together, that kind of stuff. And again, these are a little bit platform-independent,
so on Windows or whatever, like, there's some chance this might work. I'm sorry, it'll--it
would definitely work. So, "join" takes a directory and a filename and then puts them
back together in a platform-valid way, so that makes a valid path. So I could say, like,
"print." I'll just print that. And then also, there's an os.path.abspath; I'll do that on
path. What that's going to do--it's kind of like a PWD--oh, I'm sorry, it's path there.
It takes the path and it's going to just fill it out to be replete. So let's try that. So
if I run that on dot and there's a module--okay, what did I do wrong there? OS--oh, it's not
absbath. All right. Well, let it be said, my demos do not lack for realism. All right.
So here, I'm running it on the directory dot, and so then here, the, you know, the first
line is it just puts it back together with a slash. I think--let's just try--if I said
like, "./" notice, it's smart about not doubling up the slash there where if you just put it
together with a plus, you would have done the wrong thing in there; it would have said
".//". Anyway, it's a nicety of going through the real utility to do it right. And then
here is, you know, this is just on my--oh no, this is on my unit--my Google box or whatever.
That's, you know, that's the full path of the thing. And I'm sort of cheating on--just--I
could have--you know, I could say, like, "/tmp" and like, whatever--God knows what that is.
It's some source thing or whatever. Anyway, so I can--as an argument, I can just give
it any directory. It's going to list it up. All right, so let me go back to--so, mundane
yet useful, all right? You want to be able to manipulate list, do stuff with directory
paths; take them apart; put them together. I've shown you just a few of the utilities.
You can look in there--you know, there's--yeah, there's all the imaginable utilities you would
want for manipulating that kind of stuff. So let me show you--I'm going to drop back
into Python here--I want to show you one other one--just one I want. There's a os.path.exist('/tmp/foo').
That's hopefully--I don't actually know if that exists--oh, it does. All right, of course.
How about "baz"? Oh, okay, that's not there. There's also os.--I'm not going to run this
one--"os.mikdir" you get a path if you want to make something. And then finally, one that
you would never in a million years would it occur to you to find, but there's a module
called, "s-h-u-til" of which I think, historically, was sort of like shell utilities. And in s-h-u-til,
there is a ".copy" and what this does is file copying for you. So you give it a source path
and a dest path and it just kind of like goes right in there. Obviously, you could do it
manually by reading the bytes of the file or whatever, but if--yeah, it's--you just--yeah,
as I was saying, living higher on the food chain, yeah, you just want to call the thing
that does that. I think the name of s-h-u-til--it also shows how, I think, Python has grown
sort of organically, right? It's not like a committee got together and said, well, for
a job, I feel like it's a much more a top-down design with the names and stuff, where, you
know--it's not that a committee got together and said, "Well, I think we should have a
file copying utility" and, you know, "Here's what the names should be done." Instead--I'm
just guessing--like, some guy said, "Oh, here, I've made this s-h-u-til thing, you know,
didn't really give a lot of thought into the name and it was just kind of useful and it's
open source so it just kind of gotten picked up." And so now, by historical accident, like
that's the slightly obscure name for that utility is now, so typical kind of community-driven
open source, you know? It's kind of lovable and powerful, but yet like a little bit undisciplined.
The--all righty, so that is the--that stuff I wanted to show you with OS. Now, I wanted
to show you another--I'm going to stick, I'm going to stick with doing stuff in the interpreter
just to reinforce that though. So, the other thing I wanted to show you is how you launch
an external process and wait for it to finish like very common kind of, you know, utility,
get things done, sort of things to do. There are a bunch of Python modules that do this,
a bizarrely large number. I'm going to show you what I--if you only knew one, I think
this is the most useful one. There's a module called "commands." And inside of commands,
there's a function called, "get status output," I'll do help on it. Oh, boy, the help is pretty
short. What it does is it runs that command. So it's going to shell out as an external
process, it's going to run that command and you're going to block. So it causes you to
wait. It's going to wait for that certain process to exit. And the standard out and
standard error of that cell so process--so process are captured; they're not just written
onto your standard out--standard dir. So the thing is--it's kind of sealed. So once the
thing exits, then what gets--what output is going to do is it returns a Tuple-length tube.
Returning a Tuple is kind of the Python way of saying, "Well look, I wanted to return
two things," or two or three things or whatever so you could just return a Tuple. The Tuple
that it returns is--the first is the "int" exit code. So just in a very typical UNIX-y
kind of way where, you know, you can recover the exit code out of there. And then the second
is a big string, which is all of the output of this thing. And in this case, I think it's
both the standard output and the standard error kind of caught into each other. Now,
there are a bunch of variance of this if you want to capture the standard dir separately
or--all sorts of permutations are covered, but this is the one we're going to use today.
And so I'll get out of here and I think what I'm going to do is I'm going to modify my--well,
here, we'll leave this as list but I'm going to, I'm going to have it work differently
now. So I'm going to say, let's make this command; I'll say, "'ls -l' + dir." It's kind
of weird, right? So as a string, I'm putting together like, "Oh, here's the thing I'd like
to shell out and have it, like, launch the ls program." And so then, I'm going to write
a Tuple so I'll say, "status, output" is equal to--actually, no here, I'm going to--let's
skip this stuff. So the way I like to do these--well, I'll do this one. So the way you call it is
I'll say, "status output" that's the Tuple, "= commands.getstatusoutput" and I'll just
pass in the command I want to do. And then here, we'll just like, you know, print the
output. Get rid of all these. And for a--normally, I would forget to do the import and go through
that but just--since we're short of time, I'll just--I'll go ahead and do it correctly
so import commands. All right, yeah, I think that might work. All right. So I'd enter the
Phyton. So if I say--I'm just going to give it a dot again. Oh, there we go. So what that
did is it put together the ls-l. It went through the commands module. It launched it. My Phyton
number waits, blocks. Eventually, the thing ran. It produced your, you know, typical ls-l
sort of output. And then, then I'm done. All right, so now this is--now, I'm going to fix
this up in a couple of ways. I'll regard this as like, "Not quite right." So one thing I
want to do is I want to notice if this thing failed. And the way I'm going to do that,
the simplest way is I'm just going to say, "if status," if the status is non-zero, then
I want to notice if there was an error. So because status is coming through as an int
(ph--if, you know, if it's zero, that's going to count as false and the other value is kind
of true. So that's sort of the most primitive way of detecting an error here. So then I'm
going to say something like print--I think I could refer to "sys.stderr," you know, there's,
you know, whatever. There was an error. Now, I'm being little picky here because when you
capture the standard error of a subprocess, if I were to sort of squelch it, if I was
just try to kind of eat it and hide it, it makes the system undebuggable. I mean if you
think about software systems where it's, you know, some big thing with a lot of parts,
the key piece of information when it's used incorrectly which of course it is is that
whatever the lowest level was that ran into error reports it. It raises some kind of message
like, "Hey, this didn't work and you are really dependent on that low level letting you know.
Or put the other way, if the low level fails and remains silent, it's very, very difficult
to debug. And I'm pointing this out because this the rare case where we are capturing
the standard error of that thing. And so we are kind of responsible for making sure that
it gets supported. So, I--and, I'll just say something like that. And then I'm going to
say "sys.exit(1)" I'm just going to be like, yes, we are--I'm just--I'm giving up--I'm
terminating. So, that is one--one thing I would want to do. Now, the other thing I'm
going to change here is when I'm--like suppose you have a bug in your baby name's code, you
know, you like did the regular expression wrong. And like, really, what are the consequences
of that? Oh, well, you know, whatever, some of the baby name data is a little bit incorrect
or you missed something. But having--an error in your code, you just get like slightly bad
data which is not that bad, I'm going to say. Now, what if I have a bug here in the string
where I'm putting together a command which I'm about to shell out and run as me? And
I just wanted to point out, the ramifications of doing that wrong are potentially much worse.
All right, that I'm--whenever I write command, I'm immediately on this slightly heightened
sense of paying attention. I'm like, "Okay, well, yeah, I could really delete everything
or whatever." So just to demonstrate that, what if I were to change this to say, "'rm
-rf' *" or let's say, you know, why stop there,"/*," right? Oh, I'm sorry, the directory is already
there. It's an argument. Okay, there, all right? Here, I'll--here, I'll show you. I'm
going to save it, all right? Now, if you're anything like me, like I maybe like, "Oh,
okay, it sounds good, all right, so here's what I recommend doing: when I'm writing this
kind of stuff I'll say "print 'about to do this:,'" Oh, there's the command. And then
I'll just like return, whatever, just don't get to the stuff below because you can sort
of debug your program, all these other reading directories or whatever kind of stuff and
you can still have it printed, here's the command it's going to do. And so it's more
pleasing I think to debug it that way. So let's just try this. So I'm going to save
it and that definitely returns, right? Oh, hey, you know, the snapshot directory will
be out here. It's unscrapable. All right, sorry. I just got on the wrong part. All right,
so what I meant to do is go down here, "hello.py." There was a--what's the problem with that?
Did I forget a "if, print, if status print"--oh, oh, oh, oh--all right, okay, this one's--okay,
never mind that. Let me just--I'm not used to write some text for--so, let me just get
rid of that for now. All right, okay, so I'm about to do this, "'rm -rf.'" So I'll be like,
"Oh, oh, wait a minute, I didn't mean rf--rm-rf, I meant "'ls -l'" so that's our--that's what
we're going to do. So that's just kind of--I mean, you know, in your next exercise, I'm
going to ask you to shell out and so just, you know, just for like saying it or whatever.
Now, this error--I'll try to do it the other way. The print syntax for writing--like normally,
when we say "print," it just go to standard out. But printing to another file handle,
the syntax is sort of terrible. I'm going to--I think--I think I can do "dot" right
there. I'll put this together with a plus. I think that's better. So let's see. Now,
it's doing "'ls -l'," all right. Anyway, so that's the--that is the better syntax for
that. All righty, so let me show you--so those are the two module things I want you to work
on for this next bit. So let me show you our next exercise. All right, so I'm going to
go into day 2 here and the next one I want to work on is "copyspecial." So as before,
there's a printed form of the description of this. So I'm just going to kind of demo
through it, but then you really want to look at the printed direction. So, you know, it's
going to have a part A and a part B. This one's a little smaller and so I want to spend
like a little bit last on this one. If you don't get to Part B, that's okay because then
the third assignment, the last one I think is the most interesting and that's one of
the bigger ones so I want to make sure we save time for that. Okay, so here's the idea
with this. The idea is in the file system, there are certain file names which are special.
In a particular, I'm going to say that a file name is special if it has the pattern that
somewhere in the file name there are two underbars and then one or more word characters followed
by two underbars. And so for example, in this directory, there are two special files. There's
the "hello" and the something and then the solution directory and copyspecial--well,
those aren’t special. So, this is, you know, sort of Google admin kind of thing. You got
at least directories scattered all over the place and you want to move from around and
stuff. So the first thing I'd like to do, let's see--now, if you run the command with
no arguments, it always kind of tells you what the--what the arguments are. So in this
case, I can run it just with a directory. So here, I'm going to run it on "dot" as the
current directory. So if I run it on "dot" what I want you to do--so it takes a directory
as an argument. What I want you to do is I want you to find all the special files and
just list them. And oh, in particular, list them by their absolute paths. The absolute
path is something--if you were write a path to a file or whatever, that's the path that's
nice because it's independent of the process that produced it. It doesn't depend on the
notion of current directory. It's like this really is where that file is. So that's the--that's
the simplest case. Just find them, list them. The next most complicated thing I want you
to do is this thing takes a two-directory argument. So I'll say "/tmp/"--now, I'm thinking
of some random word I haven’t used. What day is it today? Thursday, I'll say Thursday.
All right, so in that case, what I wanted to do is find all the special files and create
that directory if it doesn't exist and copy all the special files to it. So, I'll find
out "cd /tmp/ thus" and do it--oh, somebody checked it out over there. All right, I'll
go back. That's Part B. So then, I'm pretty happy if you get that. But if you just have
enough time, then I also want you to have a two-zip, which is very similar to the two-directory
but instead, now I want to be able to say, "blah.zip" and what I want it to do is I want
to find all the special files, invoke the zip utility to zip them all up into the zip
file named here. So if I call that, and you can see actually my debugging is still in
here, right? "Command I'm about to do," and then, oops, and then here is the zip command.
Zip incidentally by the way--I think the worst man page ever written. I defy you to find
one less useful. It just talks about all the stuff you would never want to do. And it never
talks about the thing that you want to do--my personal experience. So it turns out the command
you want is "zip -j" and then the name of the zip file and then you just--and then you
just have all the paths. Now in this case I used absolute paths--really the zip is going
to have the same current directory as me, so you could do the shorthand--anyway--depending
on your tolerance for that kind of fragility. It's fine. So that will--that will zip it
up. Okay, so that is--that's the next exercise? So I'd like you to go ahead and get started
on that. And then let's say I'll pull you guys back here a little before 2, and then
we'll do the next exercises. All right, go.