A Possible Future of Software Development


Uploaded by GoogleTechTalks on 20.05.2008

Transcript:
>> PAWLIGER: Good afternoon. >> Hello.
>> PAWLIGER: Hi. So my name is Marc Pawliger. I'm engineering director for client applications.
And I'm here today with Sean Parent from Adobe Systems. Sean and I have known each other
since '93, '94 or something like that, long, long time, when we worked together at Adobe
on the Photoshop Team. And Sean has done a tremendous amount primarily in and around
Photoshop up until recent times for the work that he's going to be talking about today
put amongst other things; automation and UI layout and just a tremendous amount of capability
into Photoshop. And of--I guess, the last couple of years has moved on into more of
the research side of Adobe and has concentrated on areas primarily centering around STL and
Adobe editions to them and also sort of looking at the whole problem of user interfaces and
their complexity and behavior and why a lot of the software that we have works maybe despite
itself and despite the way that it's built. So with that, I'll turn it over to Sean.
>> PARENT: Okay. So, is my mic on? Yeah. So, that was a nice introduction. Thanks, Mark.
I've been at Adobe now for something like 13 years and about the last four years working
in either our advanced technology group or--now I run our software technology lab which is
a small team trying to rethink how Adobe goes about building software. Now this talk came
about because Bjarne Stroustrup, who created C++, invited me to give a talk down at Texas
A&M and said, "You're in the industry, why don't you come and give a talk to the students
about the future of software development?" And I said, "I don't know about the future
of software development. I know about what I'm building which is hopefully the future
of software development. So, I'll give a talk on a possible future of software development."
For those of you who don't know much about Adobe, a little background information. We're
a big and growing company about two and a half billion dollars of revenue, not growing
nearly as fast as Google or as big as Google, but we've got about 6,000 employees worldwide,
very distributed. We do a number of products; some of these you've probably heard of. So,
internally, we're structured loosely like this. It varies from business unit to business
unit, but we have product lines. So, Photoshop is not just a product; it's a product line.
There's Photoshop Elements, Photoshop Extended Edition. And we have product teams where we
typically have 20 developers working on a product team, about 30 testers which surprises
most people more QE than engineering. Typically one or two UI designers working with the team,
and then we have about 20 shared technology groups which all have similar structure to
the product teams dealing with libraries for vector graphics, typography, color, help,
localization, sure things like that. So development process, we strive towards an incremental
development process but the business we're in tends to push things into more of a waterfall
process. There's a huge lag between the times that you have to reserve press to print the
manuals, and the manuals have to be done between you--when you can actually ship the products
since we do physical boxes, getting everything done in the quantities we demand takes a manufacturing
ramp. And that tends to push the development cycles into more of a waterfall model where
you sacrifice quality early to get the features in, so you can write the manual and ship later.
We ship mostly Macintosh and Windows, simultaneously, English, French, German, Japanese, our tier
one languages. And by the time we're done, usually about three months after the first
ship, we've covered about 24 languages with the products. So, if you're going to talk
about the future of software development, I thought I'd do a little research on what
do the analysts say. You've got lots of buzzwords. I spent sometime going through analysts' reports.
You hear lots about Java, and C#, and XML, and web services, and open source and some
of these things I agree are important and some I don't think are very important. Best
practices and methodologies is listed first up there because that's where most people
think the problems are. I pulled some quotes from analysts that I just thought were interesting.
The second one being the most interesting, "Microsoft rolling out a new development process,"
this is Directions on Microsoft which is an analyst publication that's usually very friendly
to Microsoft. And the fact that they would go through the trouble of putting together
a whole development process when analysts say they don't think it will necessary help
the company ship its products on time or with fewer bugs. The top one is talking about the
problem of security. And the number of analysts who believe that the solution to security
is more and more testing, which I think has very, very diminishing return. We already
have 30 QE people to 20 engineers. If you double that to 60 QE people testing for security
holes, you might find, you know, another quarter or half of the security holes out there. So,
you're not going to close the holes that way. Ultimately though, this is why I think the
status quo in software development fails. This is a quote from Jon Bentley from his
book "Programming Pearls" published in '86. John went around teaching classes on how to
develop software and gave a test to mostly senior engineers at IBM in Bell Labs and asked
them to write the binary search problem. Given a specification, write a correct binary search.
Ninety percent of the programmers failed finding bugs in their solutions and of the solutions
where they couldn't find bugs, many of them, they still weren't convinced that they were
correct. Okay. To put it in perspective, this is a correct solution to Jon Bentley's binary
search problem. Okay. And they were not given just like a few minutes to do this. They were
several hours to try to do this. Photoshop uses the same problem and has, for many, many
years, as a take home test. And more than 90% of the candidates just fail miserably
at coming up with anything remotely resembling this piece of code. I think the worst case
is probably 48 pages or something like that of code. And once he figured out what the
48 pages of code did, it was wrong. >> Man.
>> PARENT: So, that's a problem. Now Jon Bentley's own solution is considerably more complicated
than this solution, okay, and considerably less efficient. So, our experience at Adobe
in teaching programmers is that the same rule holds and you can take, you know, PhDs from
Stanford which comprises a large section of Adobe's advanced technology group, and 90%
of the PhDs from Stanford can't write a correct binary search. So, how in the world do we
ever ship a product? Okay. If you can't write that much code correctly, how do you get Photoshop
out the door? How do you get Word out the door? How do you get Google Desktop out the
door? Well, these are actual bug graphs from product cycles inside of Adobe and this is
a clue. What's going on here is we're not solving problems; we're approximating solutions.
And some of these curves, this one in particular, I don't believe. What happened here is there
was a projected ship date out here. The little peak here is because QE stayed over a weekend
to finish their regression tests while engineers went home so you had a little peak which caused
a panic, which led to a nearly linear slope. The linear slope is not a fixing bugs, it's
deferring bugs. Okay? So, we never hit a zero bug count, even pretended to. Here, you'll
see we're still off zero. But in reality, what's not shown here is the number of bugs
that were just plain deferred off the bottom. So, that's what our current programming methodologies
do. They allow us to iterate and refine and approximate a solution, but never actually
come up with a correct solution. So, one thing my team is working on is studying how do we
go about writing correct software? Okay. If you--if you want to write it correct to begin
with, how would you go and do that? And this leads us to generic programming, where my
team spends a lot our time. The idea with generic programming is you start with a concrete
algorithm. Okay. You refine the algorithm, reducing it to its minimal requirements. Clusters
of related requirements are known as concepts, and we have a little more formal definition
that we'll get to in a minute. Then you define the algorithm in terms of the concept supporting
maximum reuse. So if you have an algorithm and you can then prove that it's a correct
algorithm, now what you want to do is lift out that algorithm so it's written in terms
of its minimal requirements, so that you can reuse it in the most number of areas. Now
there's a parallel here between Mathematics and generic programming. Programming is Mathematics.
Okay. In generic programming, we have what we call Semantic Requirements. In Mathematics,
they're called Axioms. Okay? What we call a concept in generic programming is an Algebraic
structure like a ring or a monoid in Mathematics. Models, we use the same terminology. In computer
science and in generic programming what we call--what we call an algorithm is just a
theorem. Okay? So an algorithm is written in terms of concept just as a theorem is written
in terms of an algebraic structure. Both rely on axioms for the correct definition. So a
regular function, a function with no side effects, corresponds to just a mathematical
function. And the one thing we have as computer scientists that mathematicians don't have
to worry about is complexity, right? We have to worry about, how long does it actually
take to compute the answer here? Now, we can refine a concept just as mathematicians can
refine an algebraic structure. So a mathematician would that a monoid is a semi-group with an
identity element, if you ever took any abstract algebra. Okay. That's an equivalent statement
to saying--roughly equivalent--a bidirectional iterator is a forward iterator with constant
complexity decrement. Okay. So you can take a concept and add additional axioms, semantics
requirements, okay, in order to refine it and that will give you something which is,
in some sense, more powerful in that you should have algorithms that you can execute more
quickly on it. More algorithms, but you'll have less models. Okay? Less types that you
can actually use with this thing. We also refine algorithms, which is something mathematicians
tend not to do. Mathematicians typically do not refine a theorem. A refined algorithm
is saying if you have a concept which is itself refined, then you can perform the same operation
that you could on the unrefined concept on the refined concept but potentially, you can
do it with better complexity or reduced space requirements. Okay. So, in C++, this is the
way concepts appear in the existing C++ standard. They are tables that appear in the standard
itself. They are not expressed in code. And, in fact, the tables are very, very weak on
their semantic requirements on this end. So, my group takes it a bit more seriously and
applies much stronger semantic requirement to what each of the axioms are. So, we're
here by back up. We defined the notion of copy to say that T is equivalent to U. The
reason why the standard uses equivalence instead of equals is because it doesn't require that
equal be defined in order to define a copy. Okay. But if you want to define what it means
to have a copy, then you have to require a quality. Now, concepts enable equational reasoning
about your code just as axioms and algebraic structures enable equational reasoning about
Mathematics. Through concepts, you can prove algorithm correctness. Okay. And you can then
reason about your system in its entirety. Generic programming often gets confused with,
you know, angle bracket templates in C++. Generic programming is not a programming style
as object-oriented programming is a programming style. Generic Programming is a way to reason
about code. I can reason about code through axioms defined through generic programming
even if the program is written in an object-oriented style. Okay? The same axioms hold, so the
axioms have to exist and the theorems have to be correct if the system functions correctly
and works. And where it doesn't, it's because something's been violated. Okay. So you can
use the same ideas from generic programming to--programming to analyze an existing piece
of code as well as write a correct piece of code. Now, in C++, it's pretty horrible right
now. So, we can't do generic programming in the language. So, what we do is we approximate
it with templates which are really just macros, and we have comments. So, when I say "type
name I" or an STL that might "type name input iterator" or, you know, "random access iterator,"
in here, I say, "I models random access iterator" just to be a little more explicit. Random
access iterator is naming the concept that you can formally define. Regular is naming
the concept that you can formally define. In fact, that was the definition of regular
a couple of slides back. Okay. So in C++0x, if anybody's been following the progression
of the language, much of what's written here--up here in comments, you can now write in code
in your C++. So, this is a big part of what my team is working on. Now, this brings us
to a small pause in the talk. Okay. So, great. So, we can write sort. We can prove sort correct
and we can write reverse. And maybe we can write some nice image processing algorithms
going this way. So, how do you get from there? How do you get from forward iterator to Photoshop?
It just seems like this very large chasm. Okay. How do you get from STL to Microsoft
Word? How do you span that gap? So, we're going to change gears a little bit here. I
have a conjecture, which is that all systems eventually grow and turn into some kind of
network problem. Okay. So if you have any problem of scale, okay, what that usually
means by a problem of scale is that you have to be able to reason about the system as a
whole by being able to reason about it locally, okay? Such systems become graphs with sets
of rules that apply to the graphs and those describe an awful lot of the systems that
we build. So if you go and you look at Photoshop, what you'll find internally is it's become
this mass of network of objects. These objects are all interconnected. They form some giant,
implicit data structure. There is messaging between the objects. So those messages do
something, hopefully, that's an algorithm. Okay. We use tools like design tool or like
design patterns, Scott Meyers', you know, 50 rules writing the C++ program to come up
to come with local rules that, if we apply them, hopefully, we get a correct system out
the back. And then the--once we assemble all these objects, we iteratively refine and refine
and refine and refine until we can ship the product, until it's good enough. So, take
a moment and define a generic algorithm for solving a large system. If you're--if you're
just, you know, building word, how are you going to go do it? Well, identify the components
and how they connect. Okay. This is my generic algorithm. Organize the system into a directed
acyclic graph. Okay. Everybody know what a DAG is? Hopefully engineers here. So, the
reason why a DAG is important, okay, if you have--a DAG is just a directed graph with
no cycles. So if you have a cycle in your system, it makes reasoning about the correctness
of that system very difficult. Okay? You can no longer reason about the correctness of
the system with only local knowledge. You have to have complete knowledge, okay, and
know what the effect of that data feeding back on itself is. So, you want to define
your system in terms of a DIG. Now, of course, you're going to have loops. You're going have
things that do iterate. And what you want to do is encapsulate those, figure out where
the loops in your graph are, and figure out what the algorithm is that's done by that
loop and write that algorithm and make that algorithm be just a node in your graph. It
becomes just processing node in your graph. And then, ensure that for all possible states
in the system, you know--this DAG is going to be changing, things are conditional, stuff
coming on and off--that cycles don't pop up, okay, that this thing stays a DAG. And then
make it generic which means once you have this for Word, figure out how you describe
similar structures to build other applications. Now, this all sounds very complicated but
the idea actually, for me, came from this. One of the things I worked on at Adobe was
a layout engine which is used in most of the Adobe products at this point to lay out user
interfaces. And it's a constraint system of sorts but it always forms DAGs. And I had
this realization one day that I could solve a spreadsheet inside of the layout system.
And that I could turn the layout system into a spreadsheet if I wanted to. And this is
why; they both form DAGs. So, this is just a graphical representation of this. We're
just summing up three numbers while I can draw it like this. Okay? I've got three cells
feeding into the plus operator and coming out with one result. When you start to talk
about constraint systems and declarative programming, which is what we're talking about here, people
have a panic attack mostly because in school they might have been exposed to the wonders
of Prolog. Prolog is a rule-based system. In some sense, a spreadsheet is a rule-based
system. But Prolog's was striving to be a Turing complete system, a complete programming
language where you could write anything. You can sort in Prolog. You really don't want
to sort in Prolog, okay? So, sorting is an iterative loop. If you have a rule-based system
that's iterating, it becomes very difficult to reason about. So, what we're talking about
here is more the long--along the lines of something like HTML or a spreadsheet or SQL
or Lex & YACC. Those are also declarative rule-based system of sorts, okay, that strive
not to be Turing complete. Some of them are accidentally, but in general, you stay from
the scary edges. So part of what my team has been working on is something called the property
model library, which is to apply these ideas to a real system. And this is the domain we
went after. It's a piece of a user interface. So, this is the dataflow of a small section
of an actual user interface in the product that Marc Pawliger is very, very familiar
with. The symbology here are basically these four items in the middle are widgets, UI widgets.
You can think about them that way. So, the top two editing numbers, the bottom two are
check boxes, just Booleans. And we have some code that sets up the dialogue, so it populates
these widgets with stuff. And then, we have a bunch of event handlers, so if I enter a
value into one thing, then it percolates and changes the values in the UI. And I literally
went through the code and drew a connection, drew an arrow anytime one piece of code was
connected to another piece of code; data could flow from there to there. And the complete
section of this dialogue has 14 widgets, I think, in the middle. I stopped after four
and said, "Screw it, it's just a big ball of spaghetti." I did add this one box down
here, which is script validation. So what's happening here is, you know, we have a UI.
This is setting an image size for a particular width and height. And you can also drive it
without a UI attached by executing a script and all the validation logic that's contained
in the spaghetti code of event handlers has to be replicated into the script validation
code so that you make sure that, you know, what ever I record up here, I can play back
without the UI and not end up with an error. So, in fact, if you look at Adobe's code base,
a third of the code in our products ends up being event handling code of this sort, part
of the big ball of spaghetti, and if you survey our bug database, you'll find that half the
bugs during the new development cycle are in this body of code. Now, when I first saw
this, I thought cherry picking. Go after the--this problem, I come up with a solution and I'm
the hero. I eliminated a third of the code and half the bugs. What I didn't realize was
this is indicative that it's a very hard problem. So--and here's why. If writing a correct algorithm
itself is difficult, then writing a correct algorithm when it's implicit, okay, when you
don't actually have the algorithm as loop in your--on one page, what you've got is a
message to another object that eventually loops back to you or not and algorithm is
buried in there, gets to be almost impossible. So, start to break down this problem. How
would we define it? Well, you have to start and say, "What is this thing?" You know, what's
this event handling logic? How can we even define this? And we backed all the way up
and said we don't even have a good definition of what it means to have a user interface.
So, we came up with these as working definitions for what it means to have a user interface.
And it's just a system designed to assist a user in selecting a function and providing
a valid set of parameters to that function. So, what distinguishes a UI and a product
from, say, consuming a movie is the fact that the user is trying to do something. He's selecting
a function and then executing it. Now, a graphical user interface is just a user interface that's
visual and interactive. A big part of the code we're talking about here is not the code
that deals with selecting the function, right? You pick the menu item, you've got the dialogue
box or you're on the palette and the function is implied that's going to set the set of
properties on your document. But what does it mean to assist the user in providing a
valid set of arguments to that function? So, we have two loaded terms there. One is valid,
you know, we know what arguments to a function are. What does it mean that they're valid?
And what does it mean to assist? So, valid is pretty easy. It's just a predicate that
corresponds to the preconditions of the function. Assist, there are many, many, many, many probably
an infinite number of ways that you can assist a user but if you go and look at the applications,
we have general patterns and you can start to categories these patterns. So, here are
four that we're dealing explicitly with. One is validation, which is given that you have
said predicate to know whether or not your preconditions are satisfied, you can simply--validation
is simply rejecting that input, saying no, you can't do that. Okay. correction is the
idea that, let's say I have a valid set of input and now I poke a value and now the system
is invalid, how do I get it back to a valid state? That's correction. So, prediction is
letting the user specify the values for the function in terms of the desired result. Okay.
So, if I want to take a file and split it--split it into 4k chunks, then depending on the size
of the file, I'll get some number files out the back so the user could specify the file
that they want and how many pieces they want out the back and you can calculate how big
the chunk needs to be. So, prediction is specifying the parameters for the function in terms of
the desired result. Then, related values. So, if a user is specifying, we'll use our
example here, the size of a document in pixels which might be the only thing the application
cares about, that might not be a natural user interface. There's a related value there of
inches, which is related to pixels through resolution. So, you end up with related pixels
or related values through which you can define the arguments for the function. So, let's
make this a little more concrete. Flip to a demo here. Are there questions at this point?
People following? Think I'm nuts? Okay. So, this is a little picture of the UI that we
had the big ball of spaghetti about before, and this is fully functioning here. Okay.
So, we can change values, their constraining proportions to the original document within
height just up there. We can toggle our display between percent and pixels. We can unconstraint
and change the value. And then we can re-constrain and the other one updates appropriately, and
if you put this in percent, they'll always be equal. So, that's the basic idea for this
UI. Over there on the left, you see the complete description for the underlying model for that.
And we also have a description through our layout language of what the visual portion
of the UI looks like. So, this is--looks very much like a box model, the solver is a little
more general; it automatically does things line up colons and has a little more smarts
underneath it, but things are described in terms of a basic box model. And you'll see
these bind statements against things. You know, there's bind here, bind here. The bind
statements are what connect the UI to the model. So, the way most people tend to write
an application is the application logic reaches up into the UI and mucks with it. And if your
application is doing that, your application knows all about the UI; you can't have an
alternate UI on it, you can't re-share that logic for scripting. So here, what we've done
is we've factored out the description of the model which is how these properties are connected
together independently from what UI is bound to it. And we can demo that by opening up
another UI. So, this is another UI that we just attached to the same model. We can see
the original document with the height down there and you'll see these are actually attached
to same instance. So, if I change the UI or if I change the values up above, they're reflected
down below and it's running non-modal so I can change it here and it will change up above
and un-constrain and change and re-constrain. Okay. So, two alternate UIs. Or I can do a
third UI and here, I wanted to represent my percentages, sliders. So, I've got two sliders
and it will rule through the numbers and all the other views. And it's pretty traditional
model view controller stuff that you probably could have seen demoed on small talk systems
at Xerox many years ago but somehow, we've forgotten how to do this. What we've forgotten
how to do is define models. Okay? So, that's a basic idea of what we're talking about here.
Let's flip back. Now, when I came up with that piece of code, I went and found one of
the--I think one of the best engineers at Adobe, a guy named Mark Hamburg. He was chief
architect on Photoshop for many years. He's responsible for Photoshop Lightroom which
is recently shipping. And I said, "Hey Mark, this little mini image size is just a portion
of the Photoshop image size. It's just got a couple of connections fully specified. I'll
tell you exactly what the specification for this dialogue is." Write the code in the language
of your choice and the framework of your choice. I don't care about how your UI is laid out.
I just want the logic for this dialogue. This, you can zoom in on it if you want or get really
close to the screen, is his resulting code. It's written in Objective C to Cocoa on the
Mac only. The gray areas here roughly correspond to the logic that's in the model, and the
rest of the code is code that, given a change in the model, reaches up into the UI and manipulates
it. It's all the event handlers going on there. >> This isn't the full program, right? The
full program would also include an .nib file. >> PARENT: Yeah, there is no .nib file here.
There's no description of the UI. This is just the logic for it.
>> Is this showing you're not using any bindings [INDISTINCT]
>> PARENT: What's that? >> Since it's normally not using Apple's binding...
>> PARENT: Oh, no. It is--there is a .nib file and there are bindings. So...
>> It's just not showing it? >> PARENT: This is just not showing it. Okay.
>> If there are any complications using bindings. >> PARENT: Yeah, yeah. This is not like trying
to recreate the world. This is as small and tight as he could get it. Okay? This is my
solution in comparison. >> Now, excuse me.
>> PARENT: Okay. Yeah? >> A huge chunk of code is [INDISTINCT] to
interpret it. >> PARENT: There's--yeah, so the amount of
code to interpret that is 2,000 lines. Hopefully, the next time I get around to rewriting it
will be about 1,500 lines. In comparison, the code for the complete version of image
size in Photoshop is 6,000 lines. So, my complete system for the UI layout engine, the solver
for the models and both the parsers is smaller than 6,000 lines. Okay? So, this is the resulting
structure that, you know, if I were to draw that, this is what we end up drawing. Okay.
So, here, the notation here is a solid arrow with a solid point is a fixed connection.
Okay? That's a directive connection. The soft arrows here can go either way, they can flow
around. The doted line here, this is a conditional connection, so there's a predicate attached
to that which can toggle it on or off. And that describes the complete system. So, a
little more about the structure, it's always expressed as a bipartite graph. It's very
similar to spreadsheets except directionality can be determined at runtime, so you can kind
of flip things the way they're going. Data is flowed from higher priority cells out towards
dependent lower priority cells. So, there's a link reversal algorithm that's employed
to reserve soft cycles which there are systems where if I tried to pin two values, it becomes
over-constrained. What if I only pinned one? Then it's not over-constrained, so that's
a soft cycle. Hard cycles are very easy to detect. So, we just detect and reject them.
And soft cycles, we just solve them. Now, the full notation here, we have sources which
just have N output, syncs just have a single input. A cell can have zero or one input,
meaning it can be derived or self can be--can be a source and the remaining are outputs
on it. And then, relationships have N-inputs and one output always. And a conditional relationship
is the same with just a predicate on it. So, those are the basic computational structures
for the system. Now, we're attacking 30% of the problem. We're getting pretty good traction
right now. Last time I looked, we had--this is going in as part of--part of future of
Photoshop and several other apps. And last time I looked, we've got 20, maybe 25 dialogues
re-expressed this way and all the thousands of lines of code underneath them ripped out.
So, it's--took a long time to get traction but now that I have traction, the roll is
actually going pretty quickly. My estimate is that 85% of the existing code base can
be replaced with small declarative descriptions and a small library of generic algorithms
if you go and look at a product. Now, that's not these small declarative descriptions.
There are other problems that manifest themselves in the applications and they need to be solved.
But that's my gut feeling looking at the application code. In fact, if you would look at all of
it, I think we're about two or just a magnitude off from the minimal expression for any large
application which Photoshop is roughly 3 million lines of code last time I counted, which means
I think you ought to be able to do express all of Photoshop in about 300,000 lines of
code. Am I off there? Sorry, 30,000 lines of code. I'll let you guys read the rest.
I'll pretty much close it in. So, if you guys want more information, I've got some links
up here. We've got our open source site, all the example here, the example application
that I'm running this through, and I can give examples all day of the system. It's available
on opensource.adobe.com. You can download it; all the code is there. Stepanovpapers.com,
Alex Stepanov, if people know who he is, the guy who created STL. He works for me on my
team and we have a website just for his collected works. One of the things we're working on
is a book, kind of bottom to top here. These are the drafts of the book. The first failed
draft, second failed draft and the draft we're on, so we will get this thing out one of these
days. But please read them, give us feedback. We're specifically most interested in feedback
on the talk which is our current attempt. This is the book that I think Alex should
have written when he release STL, well, almost 15 years ago now. People look at STL and think
it's a nice container and algorithm library and not it's a different way to think about
code. So the point of this book is to explain what means to do generic programming, not
to teach you about how to use STL. That's it. Adobe's tag line. Question.
>> What concept that here's long programming that [INDISTINCT] address much here and also
that currency? I mean, does it work in [INDISTINCT]? Are there any problems?
>> PARENT: So... >> Repeat the question or use the microphone?
>> PARENT: Sure. I can--I can repeat the question. Yeah, so the problem of concurrency, we've
done a little work on concurrency usually when we're stuck on something else. Our two
side projects have been--have been threading issues and Unicode issues. I think the same
basic techniques apply to concurrency meaning that successful systems that I've seen that
are concurrent are written basically in one of two ways, either in a structure similar
to a Unix pipe system which is basically flowing data out into a dag and at each node, you
have a cue and the cue provides enough elasticity at the join so that you can wait until you
have enough data to progress along the dag. The other systems are ones where they are
algorithmically parallel where I can do, you know, a reduction algorithm, things of that
nature. So if you look on kind of the algorithm parallelization side, there's a lot of work
that's been done on, you know, there's the staple project at take--Texas A&M which is
parallelizing STL algorithms. I'm drawing blank on an example but there's many out there,
especially in the domains of linear algebra, image processing and we have a fair amount
of work going on at Adobe in those domains. On the how do you structure an application
into a sequence of processes, you know, basically even though maybe they're threaded, they're
not full processes but how do you go about building an application like that and describing
the structure much as you would, say, on the command line of a Unix shell to stream together
a bunch of--a bunch of Unix tools. How would you build complete applications, piping a
bunch of components together? And that's where my team is most interested, and I think there's
some good work to be done there. So I don't know if we'll ever get to it seriously but
that's my thought on it currently. >> I see you're trying to mold STL as the
generic code right now, [INDISTINCT] ask you then. You didn't actually use any [INDISTINCT]
to your solution. I'm just wondering if there's a reason for that.
>> PARENT: Yeah, the code under the hood certainly uses templates and the generic programming
language itself is. >> [INDISTINCT] language in terms of a bunch
of [INDISTINCT] things, the C++. >> PARENT: Oh, do it as like a metaprogramming
language? It's not part of our goals. One of the guys at Adobe has done that for the
layout engine. There's a--there's a domain specific language done as metaprogramming
in C++. Our goal is to get the mathematics correct to describe what the computational
structure of these systems are. And give a generic description of what it means to solve
one of these systems; not to do C++ metaprogramming. >> Well, if you recall, the [INDISTINCT] if
you--like are you trying to convince people that C++ is a good generic programming language
if you're not even using that in your model library?
>> PARENT: Well, all the libraries are implemented in C++. Do I think C++ is a good generic programming
language? I think C++ is the closest approximation that we have to a good generic programming
language and getting better. So, you know, Alex wrote STL in the Ada long before he wrote
it in C++. And it's been written in--portions of it in Scheme and Java and C and we still
tend to product type codes, sometimes in Scheme. So, it's the ideas of generic programming,
supersede what language you express them in. You know, typically, what we do is we work
out what's the generic description of this. What are the axioms and what is the algorithms?
And then you go through the painful process of how do you map that into C++, you know.
What I want is to just be able to write a type function and what I have to do is write
traits classes and partial specializations to get them to dispatch in C++ because I can't
just write a function in the natural terminology that manipulates types. So, yeah. So--and
you had one more point there which was--which is are we trying to convince people of this?
I've got nothing to sell. I mean, we're a very small team. Everything we do is open
source. The reason why we open sourced it is because it's interesting problems and I'm
interested in collaborating with other people. We do a lot of work with Indiana University
and Texas A&M. I see a general trend in this direction. I think eventually, this approach
wins out, you know. By this approach, I just mean a mathematical approach to programming.
I don't think we can continue to treat programming as an approximation. Our systems are just
getting too big and too complex. And without perfect approximation, it just means more
and more and more failures. So... >> What kind of bug rates have you seen with
this approach and are the--are the bugs easier to solve or are they harder?
>> PARENT: So, we're seeing pretty much zero bugs so far. The ones you have, your description
out there. The challenge is getting the model written in the first place and the system
right now does a fair amount of checking, not quite as much as I would like, but it
does enough that it tends to bitch at you. So, the whole reason for coming up with a
way to draw the picture, and eventually we're going to wire that in with graphics and use
dots so you can--the system can automatically draw the pictures for you, is that--is that,
you know, if the system bitches and says there's a cycle here because you over-constrained
the system to finding 200 relationships; you need some way to figure out well, what did
I do wrong and how do I get it right? So, there tends to be more thought process up
front and it's more difficult to get kind of anything working. You know, once you get
it working, you're done. Here it comes. >> So, back to the question whether C++ is
a good language for generic programming. >> PARENT: Yeah?
>> You talked about the inadequacies in C++'s support for concepts and especially about
the goal of expressing concepts at run time not just at compile time. What is missing
from Haskell-type classes that would make them useful as a representation of concepts?
>> PARENT: I would go and read--let's see, what is that? Doug Injako and that crew did
a survey paper not too long ago that covers that in far more depth than I could give it
justice here because I'm not much of a Haskell programmer. My opinion is I would program
in Haskell if I had a reason or some--you know, whatever language worked the best if
I had a path to deploy that in products. To fund my team, I have to ship software. And
right now, that means I ship in C++ so I'm not a religious zealot when it comes to C++.
I do think it's the best language out there for concept-based programming but...
>> So when you're making UIs, like, as a programmer, you sort of already naturally think of it
as a single when you're designing, you know, like it's [INDISTINCT]
>> PARENT: Correct. >> So [INDISTINCT] to you and your newer model
language for describing it. But have you tried complying it to, like, [INDISTINCT] that aren't
already compliant? >> PARENT: We've done somewhat wacky things
with it. We've used the modeling system to describe overload sets for functions in C++.
So you described the relations on a set of parameters and the system will go through
and iterate what all the possible overloads for that function are that have any meaning.
We've used it to, well, the layout engine which I didn't talk too much about here. We've
used the layout engine which usually does 2D UI layouts to describe 3D worlds and the
same thing that connects baselines, made sure that the you had paths between rooms that,
you know, you didn't end up with completely walled in spaces. The solver here has been
used for doing dependency tracking for installers. What else? Somebody at Apple used it for doing
music mixing. And the graph solver itself is fairly independent of what the equation
is on the node; it doesn't--it doesn't care what the function is on the little circles,
just that things are connected. So you can plug whatever you want in there. Some people
have used it for music processing. You know, how far--how far it goes? I don't know. I
mean, we're-- from the--from the UI standpoint, we solved a good chunk of the problem right
now but we still have little pieces that are hanging out that we're focused on, not figuring
out more domains for it. >> Well, does this approach lend itself to
maintainability? I mean, you said the main problem is coming up with the model
>> PARENT: Yeah. >> Now when we want something else or changes
a little bit, do you already think about that? >> PARENT: Yeah, it lends itself pretty well.
The interesting thing is developers have a hard time changing the way they think. Okay?
So what tends to happen is frequently, the request comes in and says, well, the UI designer
says they want it to work this way. And that might just might be a view behavior on the
outside that has nothing to do with the model, the model itself doesn't change. And getting
developers in the mindset where if it does come and impact the model, I mean, this is
a model of parameters to some function and you--and the goal is to try to make the model
as complete as possible and that gives the UI designers much flexibility as they want
to put a big UI on it or a little UI on it. You know, you don't have to expose everything;
you can expose whatever sets that you want so long as the model is consistent and correct.
So the system tends to be fairly easy to maintain. What tends to be hard is recasting the problem
into what is--what is the relationship here that is being asked for. How do I express
that? And since what I'm modeling is typically a set of parameters to a function or a set
of properties in a document, going from that relationship to say, well, does that imply
about what the function has to do? These things just don't exist externally. Okay? So you
have to, you know--I'm going to go change the behavior of this function and what it
does and then I'm going to go express that in my model and that's--it's just a different
way to think about software. I mean, my experience so far has been the more years somebody has
spent doing object-oriented programming, you know, myself included, you know, over decades
worth of object-oriented programming, the more trouble they have expressing things in
terms of the relationships within the model as opposed to how events are flowing through
the system. So you know, one typical example, people will say, well, I want this checkbox--when
this checkbox is checked, I want this thing to be disabled. Okay? Well, what it means,
when something is disabled in the system is that it's disconnected from the output, that
no matter what value you put into that cell in the model, it could have no effect on the
result. Okay? So there isn't any way to say disable this thing when I click this checkbox.
What's implied by that request is that there's some relationship to that checkbox that tends
to disconnect that cell from the output. What is that relationship? Write that down and
the UI will just work. Okay? So, are there questions? I've got, I don't know, three minutes.
I can give more demos or ask--answer a question or...
>> Have you had any interests in your QA team using a model as sort of a [INDISTINCT] that
they can work off? >> PARENT: Yeah, the QA team likes the models
because it gives them, you know, a very concise spec of what's going on. The other thing is
we have a set of just command line tools that you can check the models independent of the
app. And it's also nice. You can, you know, we've got a little sample app here, so you
can bring up a model, you know, they'll function attached on the back end so I can bring up,
like, the complete image size from Photoshop with all the 14 widgets and all the connections
between them and fully test the thing without launching Photoshop. Okay? And from a UI designer
standpoint, it means that I can play with my UI and say, oh, I want three panels instead
of two panels and I want a reveally widget and I want a slider here and never have to
touch the code. So they like that. The one thing that--where we do have tension with
QE is QE is used to working in kind of a puppet string world where, you know, a lot of the
automation test tools will, like, reach into the UI and say what's the state of this checkbox?
And in our system, our checkbox don't have any real, intrinsic states, they're simply
a view of the underlying model. So you can ask the model what's the state of this cell
but it's--it doesn't make any sense to ask the checkbox what its state is. And our automation
QE team has a hard time getting past that hurdle. They're like, well, you know, what
if the model is correct but the connection to the UI is wrong? How do we test that? So
I think that's just friction that will fold out over time.
>> The--90% of the computer scientists that [INDISTINCT] how do they fit into this new
model with a program? You think this is going to be easier to teach them or you think it's
not as important as you [INDISTINCT] >> PARENT: Yeah, so I think eventually you
end up with a shift in what it means, you know, for people to be programmers. I think
assembling collections of, say, generic algorithms is an easier task than writing the generic
algorithms to begin with. I think even, you know, constructing these models for a system
is somewhat easier. I can show you. It's a hundred lines for the image size in Photoshop
and it's way easier to write that hundred lines than it is to write the 6,000 lines
of code in Photoshop, and Mark can probably attest to it. And the 6,000 lines is also
wrong and I can prove to you it's wrong. >> I didn't write that code.
>> PARENT: I wrote one version of it. So, I do think it makes it easier, but I think
what you want is a shift. I mean, at this--at this year, at this point in history, I think
it would be great, you know, if you could get a PhD by trimming a couple of cycles out
of one of the STL algorithms. That should be, you know, get you a PhD. Coming up with
a new algorithm that's generally useful certainly should be worth a PhD, and yet, right now,
very few people--you know, we don't have a massive library of this stuff. Very small
libraries and everything is very primitive. And my team spent the last week and a half
working on a formal definition of integers because math--the mathematic definition of
integers doesn't hold-up real well when you integers of a fixed range. Right? We can define
Z but we can't define what Z over range of zero to N is. Because N is Z if you...
>> [INDISTINCT] >> PARENT: What's that?
>> [INDISTINCT] >> PARENT: Z sub N which doesn't quite capture,
say, signed arithmetic. It works pretty good for twos complement unsigned. So, also, Pno
axioms which is how mathematics are defined are in terms of successor. So, you start with
zero over one and then you add you one and you can build up all of mathematics with just
successor with basic increment. From a computability standpoint, that's very unsatisfactory. Right?
We don't--we don't add two numbers by one, two, three, four, okay? It just doesn't work
that way. So, coming up with kind of re-casting mathematics into axioms that apply to computer
science is a part of the work and, you know, eventually, these should be textbooks and,
you know, we need our Oilers and our Euclids and those folks. Any more questions? Is it
on? >> Yeah, this is back on. So, thank you, Sean.