Uploaded by GoogleTechTalks on 14.08.2009

Transcript:

>>

HOLT: Welcome everybody. My name is Jason Holt, and I'm very pleased to be able to introduce

you to Rich Schroeppel. His--he and his wife Hilarie are long time friends of mine from

when I was back in grad school. They actually introduced me to people at Google that helped

me get my job here. Rich's experience in computers go way, way back. Look at HAKMEM sometime

in Wikipedia if you want to see some of his early work. But he's here to talk us about

their entry for their SHA-3 competition called SANDstorm and also some recreational mathematics

and interesting space proposal. So here's Rich Schroeppel.

>> SCHROEPPEL: Hi, I'm Rich Schroeppel. I'm presently working for Sandia Labs in Albuquerque.

And the first part of the talk is about the SANDstorm Hash Function. I found out this

morning that this is now a theory talk because the list of [INDISTINCT] cut down candidates

has just come out and we were not lucky. But I'll present the Hash Function anyway. A bunch

of people helped with this. This is what a Hash Function is. The basic notion is digital fingerprint. You put in

your digital object of whatever size and you get out some fixed size called the Hash Value.

And you want that to be unique and it can't really be unique because the input space is

bigger than the output space so you fall back and you say, "Well, I want it to be extremely

hard to come up with two objects that have the same fingerprint. I don't want it to happen

by accident and I don't want somebody to be able to cook it up." So we're in the case

of having engineering uniqueness. Okay. There are some specific ancient requirements. One of them is Collision Resistance which

has to say a bad guy should not be able to come up with two things that come with the

same Hash Value. Moreover, he should not be able to--if you give him something, you shouldn't

be able to come up with a forgery with the same Hash Value. And finally if you just give

him the Hash Value, you shouldn't be able to come up with anything that has the Hash

Value. And I emphasize those are for people of finite computational means because mathematically

you can always solve these problems if you have a suitably big computer. The Hash Function that people used for a long

time is called the MD5, and they actually still use it a whole lot, which is maybe not

such a good idea anymore. Collisions for MD5 have been known for about 5 years. What do

we have [INDISTINCT] there. All right. Boy, I love windows. I should be talking here.

Okay, Collisions for MD5 have been known for a few years. Theoretical Collisions for SHA-1

have been known. We're actually expecting SHA-1 collision to appear real soon now because

people keep improving the attacks and they're down to the point where somebody with a whole

lot of computer resources could actually do it. A couple of years ago, and this came with

SHA-2 sort of anticipation of problems like this. And SHA-2 has actually four Hash Functions

with four different output lengths--of lengths 224 up to 512 bits so the fingerprints keep

getting bigger. They're all based on the same design. They're all derived from actually

MD4 originally and they keep--kept beefing it up. And MD5 with an improved version of

MD4, SHA-1 sort of took some steroids and then they added some growth hormone and got

to SHA-2. The feeling was it may be a good idea to have something that was not a derivative

of the same design. So they started the contest. They were originally 64 submissions by last

Halloween. Fifty-one made the first cut, which was that the code complies and the spec was

sort of readable. About a third of the originals have no significant problems or I guess I

should say not even paints grown at them. Some of them have actually been successfully

attacked. They don't need one of the criteria. It's just people are, "Ehh," you know, "Whatever."

But, you know, 22 have come through completely clean so far. And in late August, which apparently

is right now, NIST is cutting to 15 candidates. They apparently just announced their cut list

last night. People have been very imaginative in names. This is the SHA-3 Zoo website that

the Europeans are running. Red means that there's an actual attack that produced some

violation of the conditions; usually a collision. Orange is there's reason to believe that a

collision can be computed in less work than should be. Yellow is there's some sort of

preimage and theory although that doesn't mean they've actually come up with one. And

green means that some redo'd version of the Hash Function has been attacked that doesn't

necessarily threaten the big one at all. >> Can you back up a second?

>> SCHROEPPEL: Sure. >> We have an unfortunate warning of alphabetical.

[INDISTINCT]. >> SCHROEPPEL: [INDISTINCT] looking forward.

>> [INDISTINCT] repeat the question. >> SCHROEPPEL: Kevin is asking why we didn't

order by speed, and the answer is this is basically taken directly off the Zoo website,

which is alphabetical. The other thing about speed though is, of course, on what machine.

Okay. SANDstorm is slightly unusual in a couple of respects. One of them is that we design

for parallelism from the beginning. So our Parallel SANDstorm is the same as the Serial

SANDstorm. And I believe that's different from all the other Hash Functions. On the

other Hash Functions, if you want to benefit from parallelism, you have to tweak something

and the Hash values are going to be different. In our case, they'll be the same. We've done

that with what's called the Truncated Tree. We'll see a picture of that later. We have

a very simple padding scheme. We stick to one bit on the end of whatever the message

is. We have a finishing step which is necessary to prevent something called the length extension

attack that everything through SHA-1 has been susceptible to. These are some of the ways we differ from

the main standard Hash Function submission. Our state variable--our internal state is

four times the size of the output that's called Y-pipe. We have a serious commitment of parallelism.

There are about twenty different places you can choose to put your parallelism all the

way down from the gate level up to high level control on the message. So whatever is easy

for you, you can do. And you if you can do all, then that's wonderful. You'll get 10,000

[INDISTINCT] or something. If you can do one or two of them, you can get up to 1,000 [INDISTINCT].

One of the other decisions we made is that if you have a super-size message like you're

Hashing a movie for some reason and you want to do it because you're selling personalized

copies, you want personalized Hashes, you want to put a wrapper around that movie--something

at the front, maybe in the middle, something at the end--we can redo the Hash without redoing

most of the work. We just redo the change parts and where they meet up next to the movie.

This is our tree mode. The idea in the tree mode is that various parts of the message

get patched and they forward something to the next level in the tree and then you Hash

that and so on. The straight tree mode has a couple of problems. One of them is that

if you have a collision somewhere that allows you to patch, you can usually fit it in to

any leaf in the tree, which gives an attacker flexibility. I prefer not to. Our overall

philosophy is try to be defense in depth, so we know that even that Part A is preventing

a problem which don't like to have Part B also prevent the problem where we can. So

our tree mode is different in a couple of ways. We have a first block and that block

influences every other block in the message. So that first block Hash has to be known--the

value from that has to be known by every other computer that's cooperating the Hash's message.

You either give it another copy of block one or forward the Hash Values. See if my mouse

works here. Doesn't seem to. Do we have a pointer? No? Okay. All right. Well, I'll wave

my hands. Okay. Level zero is that first block. In level one, subsequent blocks of the message

are grouped together in 10 block chunks and those are Hash as you would Hash in or during

Hash in block after block after block. Then the results of that Hash are forwarded to

level two and there they're grouped in units of 100 blocks; same deal. Those get forwarded

to level three, which is unbounded. Something has to be unbounded in order to process super

long messages. We made it level three. And then finally level four takes the level three

result and does a finishing step on it and, again, that prevents various kinds of attacks.

Now we haven't shown it here but the block numbers get involved in a lot of places so

that the--an opponent can't place swap a block on you. The other thing is if a level isn't

needed, it drops out. So in the very simple case of Hashing a message up to one full block

long, you will only use level zero and the finishing step. So you're just going to compress

twice. And then as you get up another 10 blocks coming in, you'll use level one and then after

that you'll use level two a little bit. You won't actually use level three unless you

have 1,000 block message. Okay. This is--one thing is we have about a 10% overhead for

the tree so it's, you know, not a big deal but it is the present. One of the advantages

of having the levels drop out is that if you're only hashing small messages like a SMART card

or something, you don't need a lot of memory to remember those intermediate levels so that's

less temporary memory to use. An ordinary tree mode just passes Hash Values up to the

next level. That's not really as big as you'd like. We pass double-sized Hash Values to

the next level. So we pass more of the internal state up. Again, that's the safety thing.

This is the internal compression function. The M1s represent date derived from the first

block of the message. The M2s are from the next block of the message. We run the--the

message schedule produces different M1s, M2s and so on. This is what happens to the internal

state [INDISTINCT]. It starts out with this initial C zero value, which comes from various

places depending on your level and it gets pass through five rounds. The result of each

round is forwarded back to the previous round in the next block so that eventually the result

of round four if you move ahead four blocks will be influencing the input into round one.

And that happens to be a very nice way if you're doing hardware pipelining to arrange

things. If you have a whole lot of gates available, you can get them all working so you'd actually

have five blocks being processed at the same time. We've re-used the constant tables from

SHA-2. The importance of the constant tables is they prevent something called the slide

attack where you pretend that round three in one case is doing what round four is in

another case. To do that, you have some constants you bring in. You know, they're supposed to

be random numbers [INDISTINCT] of primes. You lower the bits of course. Anyway, we re-use

those same constants because we figured it's actually quite likely that SHA-2 is going

to be in the same software package as whatever the backup Hash turns out to be. This is one

step in the round function so you can see the details on how the data gets forwarded

to the next--the next block process and so on. Our individual mixing makes good use of

the multiply instruction, which is an excellent mixer. We use the AES sbox a little bit. And

we mix up arithmetic and logical bit operations because you don't want to just have all of

one or all of the other. NIST asked for a Tunable Security Parameter so that you could

do things like run more rounds. So we have that. Our overall position is you want the

least possible amount of flexibility in your final standard. You know, you're not producing

a multi-tool Swiss army knife for people to use. You're producing something where people

can say, "I'm using SHA-3," and the other guy knows just from that what you're doing.

Now they've already said, "We can't really do that. You got to have four sizes." And

they also want this Tunable Security Parameter so that the next time around if there's a

break, they can just say, "Everybody set your Tunable Security Parameter to ten now and

we'll move on." These are the arithmetic details. There's not a lot to mention here except the

small-sized Hashes use 32X32 multiply inside and the big size Hashes use 64X64. We want

the entire result, which means it's a little hard to work and see unfortunately because

C won't give you the whole result of the multiply without a bunch of special hacking. And that

does mean that our portable version, which is an [INDISTINCT] standard C runs slow. We

have fast versions which have different amounts of assembly language put in to cope with the

multiply problem. This is an unfortunate slide. It really should be a simple one. There are four words of internal

state and the last thing the round function does as part of the mixing is that the bits

are moved between the four words in various ways and this is the logic function that does

that. This is more of our round function. We have four words of internal state; 464 bit words

for the small Hash. And each of those words is recomputed based on the other three words.

And we do that on each of the four words. The result of that is that after you get down

to--after the bit mix, everything depends on the every bit of the input so you get what

we call a full mix. This is the message schedule. The message will come in for the small Hash

as eight words of 64 bits. And what we do is we have a function that computes a 9th

word from the previous age. And this is a schematic of that. And we run that I think

25 times. And the first eight words are [INDISTINCT] together and that becomes the first M1 value

used in round zero. And then after that, all the other values are made by picking words--blocks

of words out of the message schedule. But we have some gaps in there so even if you

know what went into one block of four words, you don't know the next block automatically.

This is a little bit about the NIST standard machine and you can see why they picked that

and at the same time the--this machine is--it's derived from [INDISTINCT] basically and it's

a mess. It's got 64-bit words in most places except sometimes it's a 128-bit words. They

have different sets of registers. They have different applicable instructions to them

and so on and so on. The C compiler, of course, hides all this. But if you're going to do

[INDISTINCT] version, you need to know it. And in fact we're able to take advantage of

this because one of our parallelism things is the way the arithmetic is organized so

we're able to use most of, well, you know, a lot of the special features of this 64-bit

x86 descendant. These are our most recent performance results. Our assembly language

version on a single core of this machine is getting 15 clocks per byte. If we do it on

a dual core, which in this standard machine has two cores, and we run it in C, then we

get 10 clocks per byte. The 512-bit output version is a little more than twice as slow

in some respects. We've checked out the parallelism. Most of the submissions haven't talked about

parallelism much. MD6 actually has run on several different machines though. On a dual

quad core machine, which has eight processors, we get 2.1 clocks per byte. And a machine

made by SUN called the Niagara, which is designed by some sort of server, you can get up to

128 threads although reading on a 16 processors. The 16 processor gets a linear speed up from

one. And then as you get extra threads coming in without the extra cores, the speed up is

only about one a half for each doubling of the thread number. I emphasize that this produced

the same Hash Value as the serial machine. I have a slide on how much memory we need.

This isn't very interesting except if you make smart cards. But Niels Ferguson gave

us some mark down for having--using too much internal memory. So I [INDISTINCT] him last

week when I was visiting Microsoft and he said, "Oh, well, please send me your correction."

The basic deal is that we use a couple of 100 bytes and we're assuming usually you know

what sort of message you're going to be sending from a smart card because most applications

have a limited vocabulary. Okay. This is our features list. I actually covered most of

these already. What do I need to add? Oh. One feature I didn't mention, most of the

Hash functions we've seen so far, MD5 down to SHA-2, do what I call dribbling in the

message. The Hash function runs a little bit of a round--maybe a full round and then it

brings in a word of message and then the round happens another time and it brings in another

word of message. We think this is a mistake that you should actually--bringing in as much

as the message as you can whenever you bring it in and then you could cogitate for quite

a while before you bring in your next blob of message. So we've taken that approach.

What that means is when the attacker is trying to break your Hash Function, he can do all

he wants on that one blob of message. But because the connection between that and the

next part of the message that comes in has a whole lot of mixing in between, he's kind

of stuck. If he wants to do something like compensate for one big change, he can cause

it to happen at any given place, but he can't cause some compensating change later on to

make up for it. And if you look at the attacks people have actually made against MD5, SHA-1

and so on, they all use this feature. Okay. This is a summary of the whole parallelism stuff.

>> [INDISTINCT]. >> SCHROEPPEL: Yes. I believe every other

entry has that feature. >> [INDISTINCT].

>> SCHROEPPEL: Oh, okay. So every entry except SANDstorm will produce a different Hash and

parallel mode. In most cases, people haven't actually even defined a parallel mode. They

just say you can use a tree and leave it at that. MD6 has carefully defined a parallel

mode because it's designed to run in parallel. But even there, they've left some of the tree

parameters open. One of the annoying things is that a lot of the submissions have a whole

list of recommended parameters but you can tweak them if you want. Now that's actually

a good thing if what you need is a Swiss Army Knife for one reason or another. You know,

you have all these options like if you're--you know, the [INDISTINCT] function has all kinds

of things you want to parameterize. But for a standard Hash Function, it's probably not

a good idea. It's an invitation to security problems because the other guy can say, "Well,

I can't do that. [INDISTINCT] these other parameters," you know, or, "Gosh, I can only

use DES instead of AES," and you get what's called a down grade attack.

[PAUSE] >> SCHROEPPEL: Okay. One of the things we've

been questioned about is actually using multiplication. Multiplication has pluses and minuses. The

real justification for this is it's an excellent mixer in terms of bit dependencies. And as

a result of a multiply, the output bit depends on pretty much all the bits beneath it in

the two inputs which means that the top half of the multiply is depending on everything.

The bottom after dependence is a so-so but it's still a lot better than either X or [INDISTINCT].

The drawback to multiply is your parallel version--no, your portable versions which

are written in C after it go through special tricks in order to get the result of the whole

product. And that slows you down. There are a few machines where the speed of the multiply

actually varies depending on the inputs. These are no longer common [INDISTINCT] because

it's--the designers are not going to [INDISTINCT] with their timing just to make multiply run

a tiny bit faster sometimes. There are all sorts of other timing constraints that they

don't want to play with. The argument for multiply is it does a lot of mixing. It's

a little slower than add. But if you look at the total work going on, most of what the

machine is doing is getting your inputs in there. You know, you've got half a billion

gates working--struggling to provide roughly 1,000 bits of arithmetic per cycle. So you're

spending a whole lot of work assembling and gathering the data, doing a little bit of

actual data processing and then another whole lot of work sending an out towards [INDISTINCT].

So we're saying the right balance is to do more arithmetic where you can so that's why

the multiply is in there. We feel parallelism has not been sufficiently emphasized in this

computation. It looks like the way things are headed is parallel, period. If you bought

a computer recently, you may notice the advertise clock speed is not any faster than it was

last year and two years ago. What they've done instead is they're giving you more course

and those guys have to work in parallel. Now in a Microsoft System, you have about 50 processors

running all the time doing God knows what so those extra course can be independent.

But if you actually want to hack a movie fast, then you'd like these things all to be working

on your task. Another thing that we're discovering is that simplicity in a Hash Function is pretty

and it means maybe you can remember and understand the whole thing in one sitting but looking

at it for a few minutes, but it's also risky. Everything that's been attacked has been beautiful

and simple. And we have a couple of minutes for questions then I'll move on to the next part of the

talk. >> [INDISTINCT].

>> SCHROEPPEL: Yes. >> [INDISTINCT].

>> SCHROEPPEL: Yes. I sent them a note a couple of months ago.

>> Okay. [INDISTINCT]. >> SCHROEPPEL: Oh. Yes. I was asked, "Is NIST

aware the speed and memory requirements?" and I said, "Yes, I sent them an email a couple

of months ago." We haven't made public announcements on those because we're waiting to get the

codes certified and that's a slow process where I come from. They actually want to make

sure you're not issuing something that's going to cause embarrassment or legal obligation

and so on. >> You kind of sold me on your Hash and I

looked at in this website and they didn't say why they had picked the Hashes that they

did for round two. What's your guess as to what their biases were in what they selected?

>> SCHROEPPEL: I don't know. I haven't seen the list. What they did the last time during

the [INDISTINCT] competition 10 years ago, they picked the five fastest on, you know,

whatever 386, 486 was--maybe was even a Pentium at the time. And, you know, they [INDISTINCT].

They weren't going to do this time. So they may not have done it. I don't know.

>> Sorry if this is a naīve question. You talked about how multiplication is discrete.

What if one of your factors is zero, aren't you creating collisions?

>> SCHROEPPEL: Absolutely. In the case that one of the factors in multiplication is zero,

then the output loses all information from the other factor. So in that case, you lose

32 bits of information. But that same piece of data is used in a whole bunch of other

places and we have a special thing in there to prevent a series of multiplies by zero.

Of course I've left out a lot of our internal details because it's a short talk, but we

definitely are aware of that. Another version of that is if one of the factors has a few

low order zero bits and then the result loses a few bits of low order information. And we

have special stuff in there to prevent that from happening in a chain.

>> So here is the other thing that seemed a little weird when I briefly looked at the

NISA site is they said something about they're going to allow the submissions to tweak their

algorithms after round one which--yes. >> SCHROEPPEL: Yes.

>> [INDISTINCT]. >> SCHROEPPEL: Oh. Sure. Go ahead

>> [INDISTINCT]. You try implementing on FPGA to see how fast you can get, hardware?

>> SCHROEPPEL: On what sort of machine? >> FPGA like a hardware [INDISTINCT].

>> SCHROEPPEL: No FPGA's. No. >> Because...

>> SCHROEPPEL: We've implemented it on basically just processors in your computers.

>> Because eventually you can expect a card that does it for everybody.

>> SCHROEPPEL: Yes. Certainly. >> Tell us about bytes per second [INDISTINCT].

>> SCHROEPPEL: Bytes per second, hundreds of millions but--I guess I'd work it from

clocks per bit or clocks per bite. >> [INDISTINCT] 30 megabytes per second [INDISTINCT].

>> SCHROEPPEL: No. Faster than that. But MD5 is faster than that, too. In terms of where

we stack up against the other Hashes, we're comparable to SHA-1. We're a little bit slower

on a single processor and this energy put more than one processor on, we get ahead.

People have been talking a lot about the importance of speed in a Hash. It turns out that for

almost all applications, you just play and don't care about the Hash speed as long as

it's not awful because you're always doing something else. You know, if you're sending

an email, you know, if you're signing it, then there's this public key calculation you

do. If you're doing a look up, somewhere there's going to be disc arm moving, and that disc

arm timing is in milliseconds and not nanoseconds. About the only thing where you need a super

fast Hash is if you're doing--you know, you're hashing a movie--something huge. If you have

a server, that's doing a gillion transactions a second. So the individual little bit adds

up, and if you're trying to break a Hash by hashing random values over and over again.

And, you know, two out of three of those applications we figure you want to work fast. Okay. I'm

going to move on the next part now. I've got my ear open, sort of.

[PAUSE] >> SCHROEPPEL: Okay. This is more fun stuff.

Okay. This is taken from a talk I gave a few years ago and...

[PAUSE] >> SCHROEPPEL: Another windows win. We will

pause a few seconds here while the computer thinks. This was an experimental math workshop. And

I put together about 10 little things. And a lot of that was joint work with people and

here are some of the people I routinely do trying to work with. Okay. These are the little

subtalks. I'm going to do the one on Counting PolyHypercubes and the Post Tag Problem.

Okay. Counting PolyHypercubes. Everybody here has probably played Tetris. The objects that

are falling in Tetris have area of four. They're called tetrominoes by analogy with dominoes.

[INDISTINCT] invented a whole bunch of games that you can play with pentominoes, an area

of five. And there's been a lot of study of how many of these things there are in packing

problems and so on with various different areas. And you can do the obvious thing of

going up to higher dimensions and do cubes and tesseracts and so on. So we have polyominoes

and polycubes and polytesseracts and so on. You can ask all kinds of questions. For gaming,

people like to actually build the physical puzzles and play with the packings and so

on. And for theoretical math, we just like to know how many of them there are or even

an asymptotic formula. The actual [INDISTINCT] results are surprisingly sparse. The limit--it's

known that the polyominoes have a limiting ratio. If you add one to the area, it multiplies

the number of polyominoes by about four, but that about is incredibly wide. It's between

three and five or something. The numerical evidence allows us to say it's four point

seven. And there are people who think what the next [INDISTINCT] asymptotic sequence

is but I'm skeptical. The situation where the extra dimensions is, you know, not even

that developed. Anyway, we've set ourselves on just a very limited thing of getting some

data. The two parameters are the dimension and the volume. And then you get to choose

whether or not you try to remove symmetries. And removing symmetries makes a whole lot

of differences. The number of dimensions goes up. The kinds of symmetries you can have are

orientation so that's like when you throw a [INDISTINCT] in the air you get a bunch

of different orientations where it can land. If it's sitting on the table, you can rotate

it four ways. You could also put a mirror next to it and get a [INDISTINCT] that's mathematically

equivalent but looks different. You can reflect individual dimensions. And then finally there's

a symmetry in starting position. You can say, "I know it's going to put one of my squares

at zero comma zero and add to it." And in that case, any given polyomino can be built

up in several different ways where you'll have a different starting one mark as the

first position. Our approach in this problem has been to count everything so we don't remove

any symmetries. The reason for that is mathematically the numbers seem more likely to make useful

sets. Got it. Okay. All right. This is a numerical data and I know it's pretty small to read

so I won't actually read these numbers to you. Dan Hoey wrote the program here and it's

been a little bit checked by other people but mostly he gets all the credit. After we

look at some of this data, we realize there were some useful relationships. The most interesting

one probably is that if you fix the volume the--and increase the dimension, so that means

the column in this matrix, you're looking at the data, a long one column and ask, "What

if I had another dimension? What happens?" It turns out that the entries in that dimension

are a polynomial in the volume. Now there's a certain class of combinatorial person that

who would recognize that just right off the bat. For the rest of us, it actually requires

some proof. It's not intuitively obvious. It's exactly the difference between the person

who says, "Oh, chromatic polynomials make sense." And another person who says, "Huh?"

A chromatic polynomial is where you got a graph or a map. You're going to paint and

the variable is the number of colors of paint available, and the value is "How many ways

can I paint the map with these colors?" and it turns out that's polynomial. And if sufficiently

experience [INDISTINCT], you'll say, "Of course." And if you're not, this will come as a complete

surprise. Same principle here. Anyway, we spotted this after getting the data and worked

out some of the polynomials with different columns and we worked out the rules for some

of the coefficients and the polynomials. And that actually allowed us to fill in a couple

more values in the table. These are the ratios. You fix the dimension and ask, "Suppose I increase the volume by

one, how much do I multiply the total count by?" And the middle column here is the observed

ratio. It's near as we can tell. The final column is something called the Tree Bound,

and that's an easy to prove bound we're used--the way you're actually doing it is you just have

a tree with this many dimensions and the appropriate connectivity. But you ignore the fact that,

you know, if you put down five squares going around the point, the fifth one is going to

be on top of the first and that doesn't count as polyomino but it counts in the Tree Bound.

So the Tree Bound is an upper bound. I also filled in the data for dimension minus one.

Now, that doesn't mean anything. But because we have this feature that the results are

a polynomial in the dimension, I can say, "What if the dimension where minus one just

calculate the values?" And there's no obvious pattern so we let it go. And this is the side

thing where we can take out all symmetries. So these are related to the previous set of

numbers and smaller, but you can't say exactly how much smaller except [INDISTINCT]. If you

add rows one and two, you get the traditional counts for polyominoes.

[PAUSE] >> SCHROEPPEL: The ratio here seems to be

growing. It's probably unbounded so the ratio for adding the--increasing the volume and

the dimension by one seems to be at least 10. It probably keeps growing. That's one

many talk. Okay. This is a motivation for the next problem. Everyone knows after they've

taken a computer science course that there are problems that cannot be solved. Usually,

you point to the [INDISTINCT] problem if your doing--if you're a computer science kind of

person or a programmer. Programmers especially are willing to believe that there are problems

that can't be solved. And if you're a mathematician, you kind of come at it from a different direction

with [INDISTINCT] although it comes out the same thing. So then the question is, well,

what's a natural example of an unsolvable problem because all the examples you see in

your courses are things that are just problems you'd never try to solve anyway. They don't

make sense. They require thinking long and hard, even understand what's being solved.

You can twist your brain in a knot trying to figure out all the quantifiers and so on.

It would be nice to come up with a simple problem you could point to and say, "We think

this might be unsolvable." The first candidate that comes to mind is something called the

3N+1. And I rejected that because even though we can't prove the answer, we know the answer.

You know, from empirical testing, we're pretty sure what the answer is. It's the same way

with the gold box conjecture. Even though you can't prove it, you look at the numbers

and it's pretty clear it's true. So here's something that where it's really hard to say

one way or the other what the answer is. This is a Post Tag Problem. And I worked on this

in cooperation with Allan Wechsler. [INDISTINCT] from the 20s through the 30s and the 40s,

maybe in the 50s, and he almost discovered [INDISTINCT] machines and unsolvability. He

danced all around it, didn't quite see it, but he was 10, 20 years ahead of everybody

else. He set himself a number of puzzles starting I think when he was in college or grad school

and said, "If I'm a mathematician, I should be able to solve these puzzles in a mathematical

way." And he found a surprising number of them. He couldn't do anything mathematical

with it at all. You know, you'd work on it. It was obviously translatable into a math

problem, but he couldn't get an answer. And this is probably his simplest one. It's called

the Tag Problem. What it is it's an operation on bit strings. You have a string of zeros

and ones, and you look at the front of the string. And depending on what it is, you either

[INDISTINCT] a double zero or 1101 to it and then you can strike off the first three bits.

So on the average, the length is unchanged for random string. But, of course, it's not

random after you've done this a while. Here's a simple example of processing a string. And

it winds up in a very simple loop alternating between length five and six. Now because it's

much easier to talk in terms of these blocks after you process through the string ones,

everything is built up of this A blocks and B blocks. So I think in terms of As and Bs

rather than zeros and ones, and A is a double zero and the B is the other 1101 pattern.

So the first question that comes to mind is what happens to a bit string? Now for simpler

examples, [INDISTINCT] was able to solve this problem. This is the smallest example where

we couldn't say what happens. One of the questions was, does everything either shrink to zero

or fall into a loop? Is there anything that grows to infinity? And it's actually not too

hard to show that if you have a more complicated tag system, you can stimulate a [INDISTINCT]

machine, so you know that you hit unsolvable problems. But this seems to be on the borderline.

It's hard to tell. Here's some examples of loops. It turns out if you take the particle

AB, which is six bits long, and the particle B squared, A squared which is double that,

you can include those together any way you want. With multiple repetitions, you get a

lopping pattern because the individual particles loop. Those are the only ones we know of that

work exactly that way. Other loops, you get--reproduce themselves by having some sort of overlap

involved and this bottom line is a very simple example of that. I wrote a graph search program

that found some more loops. They're longer and don't have any obvious interesting features

other than they look like a mess. The periods make no sense particularly. They're long.

What they represent is just loops in a graph going from one bit string to the next. I've

set my computer a couple of years ago to trying all possible combinations of As and Bs of

a given length and running all of them and seeing what happens. And the answer is they

run longer and longer periods of time. And eventually, I had to give up on the long ones

because it was a million bits long and just half there turning along. Not particularly

growing or shrinking, but it looked like it might take a very long time to get to a period

or a loop. Now if there's a turning machine simulation lurking in here somewhere, we would

expect to find something simple example where a pattern grows linearly in length with time.

Every time you put--you pass through it or cycle around it, it adds something somewhere

in a little bit longer. I didn't find any of those. Maybe I didn't look long enough.

Okay. That's the end of that one. Now we go back to PowerPoint.

>> [INDISTINCT]. >> SCHROEPPEL: Probably. He went to MIT. And,

you know, it was in [INDISTINCT] book, so. Yes.

>> [INDISTINCT]. >> SCHROEPPEL: Right. All right. Next is Asteroids.

[PAUSE] >> SCHROEPPEL: Well, I just flipped through

one--this one--this way. It's all words. Actually, to emphasize, these last two are not official

Sandia talks. Okay. We'll start with something simpler: Moving Planets. Somebody noticed a long time

ago the Sun was getting hotter. And on a billion year timescale, it's going to get really hot

here like maybe the oceans will boil. And that could make it an inconvenient place to

live. So it would be nice if maybe the Earth could move a little bit further away from

the sun on some sort of gradual basis. If you work out how much energy is needed to

do that, it's quite a bit. I learned the notion of a Tame Asteroid on the space digest mailing

list a long time ago from John McCarthy. I believe he got it from some astronomers but

I don't know the full provenance. The idea is that you have an Asteroid that swings by

of some noticeable mass and it moves the asteroid a whole lot out of its original orbit and

it changes your orbit a little bit because you outweigh it. And because of the weight,

chaotic orbits work. You can arrange things so that if you move the asteroid one centimeter

a year ahead of time that it comes out in a completely different place after it's done

that swing by--the fly by. So the idea is you very--you move your asteroid by little

tiny amounts and arrange it. You plan ahead for your next 17 encounters with the Earth.

And then you have to have some other planets where you dump the angular momentum or add

angular momentum, and you dump the orbital energy or add orbital energy. So you need

a couple of source and sync planets. Probably one on either side of the Earth but not necessarily.

There's an interesting analogy here with Feynman diagrams. If you've seen Feynman diagrams,

you have things like two electrons repel each other because they exchange a virtual proton.

Here we have an interaction between the Earth and Mars or Venus where you have an actual

particle. In this case a Tame Asteroid. It's literally moving angular momentum from one

planet to another. And depending on which side of the planet you go by, you add or subtract

and so on. There's an actual asteroid that's in a kind of synchronous orbit with the Earth and Venus.

It switches back and forth. Its orbit is not between the Earth and Venus. It actually reaches

out to about where Mars is. And moreover, it's not actually in the same plane as the

Earth and Venus. It's got to noticeably tilt and it's noticeably elliptical, but it's still

has this periodicity that's interesting. And the thing that it switches back and forth--controlled

by the Earth and by Venus is sort of neat. Of course it's way too small to have any effect

on our orbit, but it never even comes close. The control idea which reduces the amount

of energy, you need to do this is called the Butterfly Effect. And the idea is that you

make a very small change far enough ahead in time and then you have a very prediction

in algorithm. The problem with doing this to predict the weather, of course, is that

there are million influences on the weather. But mostly speaking, there aren't a lot of

influences on planetary orbits except once you know about and kind of count for. Okay.

Here's some of the problems you run into. Big tides. Mars is probably on the way if

we start moving the Earth out. You got to use at least two other planets. You can use

Jupiter and Saturn. By doing your flybys on the correct side, you can subtract instead

of add and so on. On the other hand, Jupiter and Saturn are in some kind of resonance that

has sort of cleared outpours into the asteroid belt. And adjusting in their orbits might

actually not be a good idea. And incidentally, our test particle is nowhere near the Earth.

We're just probably a good thing right now, but sort of a mess in the plan. Okay. So here's what we do to get [INDISTINCT]

where we need it or something else. This is called the gravity machine. What it is, is

a bunch of particles of different sizes from meter size up to kilometer size and in between.

Then--and it's a swarm so they're all mutually gravitating with each other. And the idea

is you push on the small ones and you have a really good computer and you know the masses

very accurately. It allows you to control the behavior of the whole thing. Moreover,

you can absorb things that the swarm swoops down on them and you've carefully arranged

all the gravity so that the thing you swoop down on becomes a member of the swarm and

everything slows down to the average orbit because you can't change the--what happens

to the mechanics of the centers of mass of your swarm and the particle you're seizing.

One actually possible useful thing to do with this is to put another small moon in Earth

orbit. We'd like something that has a more favorable composition than the moon we have

with useful things like water. We'd like something that's a little easier to reach than the moon

we have. We don't necessarily want it to be super large because of the title of effects.

But there's something we could actually do in somebody's lifetime if you live really

long. This is going to change the appearance of the night sky. Okay. The propulsion in the gravity machine.

The internal control is by manipulating the smallest pieces. There are a lot of other

influences besides gravity that you need to think about if you're trying to do 16 decimal

predictions. There's thermal expansion, and that's going to change angular momentum and

so on. They're shadowing a lot of problems. Moreover, you can't change the barycenter

of the object here of your swarm. The center mass lies the same. So for that part of it,

you got a manipulate it and use the flybys of major large objects as your control--well,

not your control mechanism but as your amplifier. The other thing is your control has to exceed

your unknown influences. You need to be the biggest butterfly on the block. Your advantage

is you can look ahead but you have a very weak push. And so surprised pushes are a problem.

Some of the things you need to think about; real asteroids are not spheres, which means

they have interesting gravity fields. More seriously they're not solid in many cases.

And whatever they shape they've adopted is according to their particular gravity environment.

And if you change that much, things might shift in your asteroid in ways you're not

prepared to deal with. And internal heat diffusion is not too hard to compute for a solid object

of known composition. But, of course, if you don't have a solid object and if you aren't

pretty sure about the composition in particular the hidden parts inside could be other kinds

of rocks and you don't know what you're touching and so on. The heat becomes a puzzle. One

of the big things is a lot of the stuff you'd like to bring home to Earth has volatiles

in it like water and, you know, ammonia, methane, carbon things. And you can't use comets because

comets outgas and produce noticeable thrust as a result. There's also a lot of stuff we

don't know about wandering around the solar system. Normally you don't care. But if we're

trying to do 16 digit calculations, then small stuff begins to matter. And if you've been

watching the news the last couple of days, you see that Jupiter just got hit by something

that nobody knew what was. What we see is a big black hole in Jupiter's atmosphere and,

you know, people are guessing it was a comet but, you know, not a known comet. Other things;

the solar wind varies and potentially magnetic fields could be a problem if you got a--an

iron asteroid. There's some safety issues. And the other thing is that if you bring something

home, you probably need a bag. You want to hold on to your water, of course, but also

you can't have little bits of it leaking off into other nearby orbits. We've already got

a space junk problem from stuff we've put up there. These are some of the things we

might do to push this idea. You know, it's perfectly accessible to actual small scale

tests on, you know, 10 meter-sized boulders and so on. Okay. I'm ready for the next run. How much

time? Two minutes. All right. Very quick on origin of life.

[PAUSE] >> SCHROEPPEL: I call this the Fish Pond Theory.

Okay. If you look back and try to figure out what got life started, the biggest hurdle

is a bunch of chemicals came together by random chance and it started replicating. Here's

an idea. Inspired by Native Hawaiian Fish Ponds. The Hawaiians built this rock walls

out in the ocean with rocks and they arrange things so that there weren't any big holes

in the walls. So little things could swim in and grow and then couldn't swim out so

they were trapped inside the fish pond. Eventually, the fisherman would come along and select

a large fish for dinner. [PAUSE]

>> SCHROEPPEL: Now the puzzle on life. You need a boundary to concentrate your chemicals

or else they'll diffuse out into the ocean. You expect to see some sort of growth. You

need reproduction and ideally you have what's called heritable change which allows revolution.

And then you need some sort of energy or thermodynamic radiant to drive the process forward to make

it happen. And then there's a line that says "Evolution does the rest". It covers a whole

bunch of problems behind what I'm trying to address here. Okay. The boundary is a solved

problem for some people, you'd assume primordial soup in an ocean full of chemicals and you

assume the oily chemicals float to the top of that ocean and those become cell walls.

You get a wave, the wave breaks, the oil is--becomes oil and vinegar basically, it's the bubbles.

[INDISTINCT] is any small magic molecule. My magic molecule is smaller than most magic

molecules, that's probably bigger than amino acid but not too much bigger. I call it M.

One of the things atoms like to do and molecules like to do is polymerize. Sulfur in particular,

the room temperature stuff you see, the yellow mass, is made up of molecules of eight sulfur

atoms arranged in an octagon, in sort of a--and I don't know the name of the configuration,

it's not a flat plain but it is an octagon or they call it crown. So I'm going to assume

my magic molecule forms rings. -So that means its mixed polymers that has at least two act

events like amino acids do or a lot of the stuff that becomes polyplastic or one kind

or another. That the joining of the two ends is energetically favored. Now for amino acids

that's actually not exactly true, it's roughly neutral and whether the amino acids join or

not, depends on detailed conditions. And when we actually join together amino acids to make

proteins, their enzymes and energy driving and so on to make sure it goes forward. And

M likes to polymerize but doesn't usually because there isn't a whole lot of it around.

The M monomer can't cross an oily boundary. And I'll note that amino acids actually meet

most of these criteria more or less. M will polymerize if you get it--enough of it together.

And the rings hide part of the functionality on the M molecule. What's left on the outside

likes to dissolve in oil. But there's a hole in the middle if it's octagon or whatever,

and in that middle that becomes a hole in the wall, which in fact cells use a lot. They

put proteins in the walls with holes. And we're going to assume the hole is big enough

to let single M through but not double M or triple M, because they're bigger molecules

they don't fit through the hole. So inside our cell, if you get two M's coming in, they're

trapped. After you build the real ring--oh, okay I need too rush through this. Okay, the

ring travels to the wall and becomes a new hole. You get growth simply by rings adding

to the wall and more M wanders in because the cell in effect is a sync for M because

it's removing M from the solution. To reproduce, you need a wave that breaks up the cell, the

leftover pieces of wall form a new cell. Heritable changes iffy, but I'm assuming you have varieties

of the M. The energy gradient is just the polymerization. The theory is lab testable

in contrast of most origin of life theories, we can actually try to make molecules and

see what happens. And this is another different idea and that's the end. So, thank you very

much.

HOLT: Welcome everybody. My name is Jason Holt, and I'm very pleased to be able to introduce

you to Rich Schroeppel. His--he and his wife Hilarie are long time friends of mine from

when I was back in grad school. They actually introduced me to people at Google that helped

me get my job here. Rich's experience in computers go way, way back. Look at HAKMEM sometime

in Wikipedia if you want to see some of his early work. But he's here to talk us about

their entry for their SHA-3 competition called SANDstorm and also some recreational mathematics

and interesting space proposal. So here's Rich Schroeppel.

>> SCHROEPPEL: Hi, I'm Rich Schroeppel. I'm presently working for Sandia Labs in Albuquerque.

And the first part of the talk is about the SANDstorm Hash Function. I found out this

morning that this is now a theory talk because the list of [INDISTINCT] cut down candidates

has just come out and we were not lucky. But I'll present the Hash Function anyway. A bunch

of people helped with this. This is what a Hash Function is. The basic notion is digital fingerprint. You put in

your digital object of whatever size and you get out some fixed size called the Hash Value.

And you want that to be unique and it can't really be unique because the input space is

bigger than the output space so you fall back and you say, "Well, I want it to be extremely

hard to come up with two objects that have the same fingerprint. I don't want it to happen

by accident and I don't want somebody to be able to cook it up." So we're in the case

of having engineering uniqueness. Okay. There are some specific ancient requirements. One of them is Collision Resistance which

has to say a bad guy should not be able to come up with two things that come with the

same Hash Value. Moreover, he should not be able to--if you give him something, you shouldn't

be able to come up with a forgery with the same Hash Value. And finally if you just give

him the Hash Value, you shouldn't be able to come up with anything that has the Hash

Value. And I emphasize those are for people of finite computational means because mathematically

you can always solve these problems if you have a suitably big computer. The Hash Function that people used for a long

time is called the MD5, and they actually still use it a whole lot, which is maybe not

such a good idea anymore. Collisions for MD5 have been known for about 5 years. What do

we have [INDISTINCT] there. All right. Boy, I love windows. I should be talking here.

Okay, Collisions for MD5 have been known for a few years. Theoretical Collisions for SHA-1

have been known. We're actually expecting SHA-1 collision to appear real soon now because

people keep improving the attacks and they're down to the point where somebody with a whole

lot of computer resources could actually do it. A couple of years ago, and this came with

SHA-2 sort of anticipation of problems like this. And SHA-2 has actually four Hash Functions

with four different output lengths--of lengths 224 up to 512 bits so the fingerprints keep

getting bigger. They're all based on the same design. They're all derived from actually

MD4 originally and they keep--kept beefing it up. And MD5 with an improved version of

MD4, SHA-1 sort of took some steroids and then they added some growth hormone and got

to SHA-2. The feeling was it may be a good idea to have something that was not a derivative

of the same design. So they started the contest. They were originally 64 submissions by last

Halloween. Fifty-one made the first cut, which was that the code complies and the spec was

sort of readable. About a third of the originals have no significant problems or I guess I

should say not even paints grown at them. Some of them have actually been successfully

attacked. They don't need one of the criteria. It's just people are, "Ehh," you know, "Whatever."

But, you know, 22 have come through completely clean so far. And in late August, which apparently

is right now, NIST is cutting to 15 candidates. They apparently just announced their cut list

last night. People have been very imaginative in names. This is the SHA-3 Zoo website that

the Europeans are running. Red means that there's an actual attack that produced some

violation of the conditions; usually a collision. Orange is there's reason to believe that a

collision can be computed in less work than should be. Yellow is there's some sort of

preimage and theory although that doesn't mean they've actually come up with one. And

green means that some redo'd version of the Hash Function has been attacked that doesn't

necessarily threaten the big one at all. >> Can you back up a second?

>> SCHROEPPEL: Sure. >> We have an unfortunate warning of alphabetical.

[INDISTINCT]. >> SCHROEPPEL: [INDISTINCT] looking forward.

>> [INDISTINCT] repeat the question. >> SCHROEPPEL: Kevin is asking why we didn't

order by speed, and the answer is this is basically taken directly off the Zoo website,

which is alphabetical. The other thing about speed though is, of course, on what machine.

Okay. SANDstorm is slightly unusual in a couple of respects. One of them is that we design

for parallelism from the beginning. So our Parallel SANDstorm is the same as the Serial

SANDstorm. And I believe that's different from all the other Hash Functions. On the

other Hash Functions, if you want to benefit from parallelism, you have to tweak something

and the Hash values are going to be different. In our case, they'll be the same. We've done

that with what's called the Truncated Tree. We'll see a picture of that later. We have

a very simple padding scheme. We stick to one bit on the end of whatever the message

is. We have a finishing step which is necessary to prevent something called the length extension

attack that everything through SHA-1 has been susceptible to. These are some of the ways we differ from

the main standard Hash Function submission. Our state variable--our internal state is

four times the size of the output that's called Y-pipe. We have a serious commitment of parallelism.

There are about twenty different places you can choose to put your parallelism all the

way down from the gate level up to high level control on the message. So whatever is easy

for you, you can do. And you if you can do all, then that's wonderful. You'll get 10,000

[INDISTINCT] or something. If you can do one or two of them, you can get up to 1,000 [INDISTINCT].

One of the other decisions we made is that if you have a super-size message like you're

Hashing a movie for some reason and you want to do it because you're selling personalized

copies, you want personalized Hashes, you want to put a wrapper around that movie--something

at the front, maybe in the middle, something at the end--we can redo the Hash without redoing

most of the work. We just redo the change parts and where they meet up next to the movie.

This is our tree mode. The idea in the tree mode is that various parts of the message

get patched and they forward something to the next level in the tree and then you Hash

that and so on. The straight tree mode has a couple of problems. One of them is that

if you have a collision somewhere that allows you to patch, you can usually fit it in to

any leaf in the tree, which gives an attacker flexibility. I prefer not to. Our overall

philosophy is try to be defense in depth, so we know that even that Part A is preventing

a problem which don't like to have Part B also prevent the problem where we can. So

our tree mode is different in a couple of ways. We have a first block and that block

influences every other block in the message. So that first block Hash has to be known--the

value from that has to be known by every other computer that's cooperating the Hash's message.

You either give it another copy of block one or forward the Hash Values. See if my mouse

works here. Doesn't seem to. Do we have a pointer? No? Okay. All right. Well, I'll wave

my hands. Okay. Level zero is that first block. In level one, subsequent blocks of the message

are grouped together in 10 block chunks and those are Hash as you would Hash in or during

Hash in block after block after block. Then the results of that Hash are forwarded to

level two and there they're grouped in units of 100 blocks; same deal. Those get forwarded

to level three, which is unbounded. Something has to be unbounded in order to process super

long messages. We made it level three. And then finally level four takes the level three

result and does a finishing step on it and, again, that prevents various kinds of attacks.

Now we haven't shown it here but the block numbers get involved in a lot of places so

that the--an opponent can't place swap a block on you. The other thing is if a level isn't

needed, it drops out. So in the very simple case of Hashing a message up to one full block

long, you will only use level zero and the finishing step. So you're just going to compress

twice. And then as you get up another 10 blocks coming in, you'll use level one and then after

that you'll use level two a little bit. You won't actually use level three unless you

have 1,000 block message. Okay. This is--one thing is we have about a 10% overhead for

the tree so it's, you know, not a big deal but it is the present. One of the advantages

of having the levels drop out is that if you're only hashing small messages like a SMART card

or something, you don't need a lot of memory to remember those intermediate levels so that's

less temporary memory to use. An ordinary tree mode just passes Hash Values up to the

next level. That's not really as big as you'd like. We pass double-sized Hash Values to

the next level. So we pass more of the internal state up. Again, that's the safety thing.

This is the internal compression function. The M1s represent date derived from the first

block of the message. The M2s are from the next block of the message. We run the--the

message schedule produces different M1s, M2s and so on. This is what happens to the internal

state [INDISTINCT]. It starts out with this initial C zero value, which comes from various

places depending on your level and it gets pass through five rounds. The result of each

round is forwarded back to the previous round in the next block so that eventually the result

of round four if you move ahead four blocks will be influencing the input into round one.

And that happens to be a very nice way if you're doing hardware pipelining to arrange

things. If you have a whole lot of gates available, you can get them all working so you'd actually

have five blocks being processed at the same time. We've re-used the constant tables from

SHA-2. The importance of the constant tables is they prevent something called the slide

attack where you pretend that round three in one case is doing what round four is in

another case. To do that, you have some constants you bring in. You know, they're supposed to

be random numbers [INDISTINCT] of primes. You lower the bits of course. Anyway, we re-use

those same constants because we figured it's actually quite likely that SHA-2 is going

to be in the same software package as whatever the backup Hash turns out to be. This is one

step in the round function so you can see the details on how the data gets forwarded

to the next--the next block process and so on. Our individual mixing makes good use of

the multiply instruction, which is an excellent mixer. We use the AES sbox a little bit. And

we mix up arithmetic and logical bit operations because you don't want to just have all of

one or all of the other. NIST asked for a Tunable Security Parameter so that you could

do things like run more rounds. So we have that. Our overall position is you want the

least possible amount of flexibility in your final standard. You know, you're not producing

a multi-tool Swiss army knife for people to use. You're producing something where people

can say, "I'm using SHA-3," and the other guy knows just from that what you're doing.

Now they've already said, "We can't really do that. You got to have four sizes." And

they also want this Tunable Security Parameter so that the next time around if there's a

break, they can just say, "Everybody set your Tunable Security Parameter to ten now and

we'll move on." These are the arithmetic details. There's not a lot to mention here except the

small-sized Hashes use 32X32 multiply inside and the big size Hashes use 64X64. We want

the entire result, which means it's a little hard to work and see unfortunately because

C won't give you the whole result of the multiply without a bunch of special hacking. And that

does mean that our portable version, which is an [INDISTINCT] standard C runs slow. We

have fast versions which have different amounts of assembly language put in to cope with the

multiply problem. This is an unfortunate slide. It really should be a simple one. There are four words of internal

state and the last thing the round function does as part of the mixing is that the bits

are moved between the four words in various ways and this is the logic function that does

that. This is more of our round function. We have four words of internal state; 464 bit words

for the small Hash. And each of those words is recomputed based on the other three words.

And we do that on each of the four words. The result of that is that after you get down

to--after the bit mix, everything depends on the every bit of the input so you get what

we call a full mix. This is the message schedule. The message will come in for the small Hash

as eight words of 64 bits. And what we do is we have a function that computes a 9th

word from the previous age. And this is a schematic of that. And we run that I think

25 times. And the first eight words are [INDISTINCT] together and that becomes the first M1 value

used in round zero. And then after that, all the other values are made by picking words--blocks

of words out of the message schedule. But we have some gaps in there so even if you

know what went into one block of four words, you don't know the next block automatically.

This is a little bit about the NIST standard machine and you can see why they picked that

and at the same time the--this machine is--it's derived from [INDISTINCT] basically and it's

a mess. It's got 64-bit words in most places except sometimes it's a 128-bit words. They

have different sets of registers. They have different applicable instructions to them

and so on and so on. The C compiler, of course, hides all this. But if you're going to do

[INDISTINCT] version, you need to know it. And in fact we're able to take advantage of

this because one of our parallelism things is the way the arithmetic is organized so

we're able to use most of, well, you know, a lot of the special features of this 64-bit

x86 descendant. These are our most recent performance results. Our assembly language

version on a single core of this machine is getting 15 clocks per byte. If we do it on

a dual core, which in this standard machine has two cores, and we run it in C, then we

get 10 clocks per byte. The 512-bit output version is a little more than twice as slow

in some respects. We've checked out the parallelism. Most of the submissions haven't talked about

parallelism much. MD6 actually has run on several different machines though. On a dual

quad core machine, which has eight processors, we get 2.1 clocks per byte. And a machine

made by SUN called the Niagara, which is designed by some sort of server, you can get up to

128 threads although reading on a 16 processors. The 16 processor gets a linear speed up from

one. And then as you get extra threads coming in without the extra cores, the speed up is

only about one a half for each doubling of the thread number. I emphasize that this produced

the same Hash Value as the serial machine. I have a slide on how much memory we need.

This isn't very interesting except if you make smart cards. But Niels Ferguson gave

us some mark down for having--using too much internal memory. So I [INDISTINCT] him last

week when I was visiting Microsoft and he said, "Oh, well, please send me your correction."

The basic deal is that we use a couple of 100 bytes and we're assuming usually you know

what sort of message you're going to be sending from a smart card because most applications

have a limited vocabulary. Okay. This is our features list. I actually covered most of

these already. What do I need to add? Oh. One feature I didn't mention, most of the

Hash functions we've seen so far, MD5 down to SHA-2, do what I call dribbling in the

message. The Hash function runs a little bit of a round--maybe a full round and then it

brings in a word of message and then the round happens another time and it brings in another

word of message. We think this is a mistake that you should actually--bringing in as much

as the message as you can whenever you bring it in and then you could cogitate for quite

a while before you bring in your next blob of message. So we've taken that approach.

What that means is when the attacker is trying to break your Hash Function, he can do all

he wants on that one blob of message. But because the connection between that and the

next part of the message that comes in has a whole lot of mixing in between, he's kind

of stuck. If he wants to do something like compensate for one big change, he can cause

it to happen at any given place, but he can't cause some compensating change later on to

make up for it. And if you look at the attacks people have actually made against MD5, SHA-1

and so on, they all use this feature. Okay. This is a summary of the whole parallelism stuff.

>> [INDISTINCT]. >> SCHROEPPEL: Yes. I believe every other

entry has that feature. >> [INDISTINCT].

>> SCHROEPPEL: Oh, okay. So every entry except SANDstorm will produce a different Hash and

parallel mode. In most cases, people haven't actually even defined a parallel mode. They

just say you can use a tree and leave it at that. MD6 has carefully defined a parallel

mode because it's designed to run in parallel. But even there, they've left some of the tree

parameters open. One of the annoying things is that a lot of the submissions have a whole

list of recommended parameters but you can tweak them if you want. Now that's actually

a good thing if what you need is a Swiss Army Knife for one reason or another. You know,

you have all these options like if you're--you know, the [INDISTINCT] function has all kinds

of things you want to parameterize. But for a standard Hash Function, it's probably not

a good idea. It's an invitation to security problems because the other guy can say, "Well,

I can't do that. [INDISTINCT] these other parameters," you know, or, "Gosh, I can only

use DES instead of AES," and you get what's called a down grade attack.

[PAUSE] >> SCHROEPPEL: Okay. One of the things we've

been questioned about is actually using multiplication. Multiplication has pluses and minuses. The

real justification for this is it's an excellent mixer in terms of bit dependencies. And as

a result of a multiply, the output bit depends on pretty much all the bits beneath it in

the two inputs which means that the top half of the multiply is depending on everything.

The bottom after dependence is a so-so but it's still a lot better than either X or [INDISTINCT].

The drawback to multiply is your parallel version--no, your portable versions which

are written in C after it go through special tricks in order to get the result of the whole

product. And that slows you down. There are a few machines where the speed of the multiply

actually varies depending on the inputs. These are no longer common [INDISTINCT] because

it's--the designers are not going to [INDISTINCT] with their timing just to make multiply run

a tiny bit faster sometimes. There are all sorts of other timing constraints that they

don't want to play with. The argument for multiply is it does a lot of mixing. It's

a little slower than add. But if you look at the total work going on, most of what the

machine is doing is getting your inputs in there. You know, you've got half a billion

gates working--struggling to provide roughly 1,000 bits of arithmetic per cycle. So you're

spending a whole lot of work assembling and gathering the data, doing a little bit of

actual data processing and then another whole lot of work sending an out towards [INDISTINCT].

So we're saying the right balance is to do more arithmetic where you can so that's why

the multiply is in there. We feel parallelism has not been sufficiently emphasized in this

computation. It looks like the way things are headed is parallel, period. If you bought

a computer recently, you may notice the advertise clock speed is not any faster than it was

last year and two years ago. What they've done instead is they're giving you more course

and those guys have to work in parallel. Now in a Microsoft System, you have about 50 processors

running all the time doing God knows what so those extra course can be independent.

But if you actually want to hack a movie fast, then you'd like these things all to be working

on your task. Another thing that we're discovering is that simplicity in a Hash Function is pretty

and it means maybe you can remember and understand the whole thing in one sitting but looking

at it for a few minutes, but it's also risky. Everything that's been attacked has been beautiful

and simple. And we have a couple of minutes for questions then I'll move on to the next part of the

talk. >> [INDISTINCT].

>> SCHROEPPEL: Yes. >> [INDISTINCT].

>> SCHROEPPEL: Yes. I sent them a note a couple of months ago.

>> Okay. [INDISTINCT]. >> SCHROEPPEL: Oh. Yes. I was asked, "Is NIST

aware the speed and memory requirements?" and I said, "Yes, I sent them an email a couple

of months ago." We haven't made public announcements on those because we're waiting to get the

codes certified and that's a slow process where I come from. They actually want to make

sure you're not issuing something that's going to cause embarrassment or legal obligation

and so on. >> You kind of sold me on your Hash and I

looked at in this website and they didn't say why they had picked the Hashes that they

did for round two. What's your guess as to what their biases were in what they selected?

>> SCHROEPPEL: I don't know. I haven't seen the list. What they did the last time during

the [INDISTINCT] competition 10 years ago, they picked the five fastest on, you know,

whatever 386, 486 was--maybe was even a Pentium at the time. And, you know, they [INDISTINCT].

They weren't going to do this time. So they may not have done it. I don't know.

>> Sorry if this is a naīve question. You talked about how multiplication is discrete.

What if one of your factors is zero, aren't you creating collisions?

>> SCHROEPPEL: Absolutely. In the case that one of the factors in multiplication is zero,

then the output loses all information from the other factor. So in that case, you lose

32 bits of information. But that same piece of data is used in a whole bunch of other

places and we have a special thing in there to prevent a series of multiplies by zero.

Of course I've left out a lot of our internal details because it's a short talk, but we

definitely are aware of that. Another version of that is if one of the factors has a few

low order zero bits and then the result loses a few bits of low order information. And we

have special stuff in there to prevent that from happening in a chain.

>> So here is the other thing that seemed a little weird when I briefly looked at the

NISA site is they said something about they're going to allow the submissions to tweak their

algorithms after round one which--yes. >> SCHROEPPEL: Yes.

>> [INDISTINCT]. >> SCHROEPPEL: Oh. Sure. Go ahead

>> [INDISTINCT]. You try implementing on FPGA to see how fast you can get, hardware?

>> SCHROEPPEL: On what sort of machine? >> FPGA like a hardware [INDISTINCT].

>> SCHROEPPEL: No FPGA's. No. >> Because...

>> SCHROEPPEL: We've implemented it on basically just processors in your computers.

>> Because eventually you can expect a card that does it for everybody.

>> SCHROEPPEL: Yes. Certainly. >> Tell us about bytes per second [INDISTINCT].

>> SCHROEPPEL: Bytes per second, hundreds of millions but--I guess I'd work it from

clocks per bit or clocks per bite. >> [INDISTINCT] 30 megabytes per second [INDISTINCT].

>> SCHROEPPEL: No. Faster than that. But MD5 is faster than that, too. In terms of where

we stack up against the other Hashes, we're comparable to SHA-1. We're a little bit slower

on a single processor and this energy put more than one processor on, we get ahead.

People have been talking a lot about the importance of speed in a Hash. It turns out that for

almost all applications, you just play and don't care about the Hash speed as long as

it's not awful because you're always doing something else. You know, if you're sending

an email, you know, if you're signing it, then there's this public key calculation you

do. If you're doing a look up, somewhere there's going to be disc arm moving, and that disc

arm timing is in milliseconds and not nanoseconds. About the only thing where you need a super

fast Hash is if you're doing--you know, you're hashing a movie--something huge. If you have

a server, that's doing a gillion transactions a second. So the individual little bit adds

up, and if you're trying to break a Hash by hashing random values over and over again.

And, you know, two out of three of those applications we figure you want to work fast. Okay. I'm

going to move on the next part now. I've got my ear open, sort of.

[PAUSE] >> SCHROEPPEL: Okay. This is more fun stuff.

Okay. This is taken from a talk I gave a few years ago and...

[PAUSE] >> SCHROEPPEL: Another windows win. We will

pause a few seconds here while the computer thinks. This was an experimental math workshop. And

I put together about 10 little things. And a lot of that was joint work with people and

here are some of the people I routinely do trying to work with. Okay. These are the little

subtalks. I'm going to do the one on Counting PolyHypercubes and the Post Tag Problem.

Okay. Counting PolyHypercubes. Everybody here has probably played Tetris. The objects that

are falling in Tetris have area of four. They're called tetrominoes by analogy with dominoes.

[INDISTINCT] invented a whole bunch of games that you can play with pentominoes, an area

of five. And there's been a lot of study of how many of these things there are in packing

problems and so on with various different areas. And you can do the obvious thing of

going up to higher dimensions and do cubes and tesseracts and so on. So we have polyominoes

and polycubes and polytesseracts and so on. You can ask all kinds of questions. For gaming,

people like to actually build the physical puzzles and play with the packings and so

on. And for theoretical math, we just like to know how many of them there are or even

an asymptotic formula. The actual [INDISTINCT] results are surprisingly sparse. The limit--it's

known that the polyominoes have a limiting ratio. If you add one to the area, it multiplies

the number of polyominoes by about four, but that about is incredibly wide. It's between

three and five or something. The numerical evidence allows us to say it's four point

seven. And there are people who think what the next [INDISTINCT] asymptotic sequence

is but I'm skeptical. The situation where the extra dimensions is, you know, not even

that developed. Anyway, we've set ourselves on just a very limited thing of getting some

data. The two parameters are the dimension and the volume. And then you get to choose

whether or not you try to remove symmetries. And removing symmetries makes a whole lot

of differences. The number of dimensions goes up. The kinds of symmetries you can have are

orientation so that's like when you throw a [INDISTINCT] in the air you get a bunch

of different orientations where it can land. If it's sitting on the table, you can rotate

it four ways. You could also put a mirror next to it and get a [INDISTINCT] that's mathematically

equivalent but looks different. You can reflect individual dimensions. And then finally there's

a symmetry in starting position. You can say, "I know it's going to put one of my squares

at zero comma zero and add to it." And in that case, any given polyomino can be built

up in several different ways where you'll have a different starting one mark as the

first position. Our approach in this problem has been to count everything so we don't remove

any symmetries. The reason for that is mathematically the numbers seem more likely to make useful

sets. Got it. Okay. All right. This is a numerical data and I know it's pretty small to read

so I won't actually read these numbers to you. Dan Hoey wrote the program here and it's

been a little bit checked by other people but mostly he gets all the credit. After we

look at some of this data, we realize there were some useful relationships. The most interesting

one probably is that if you fix the volume the--and increase the dimension, so that means

the column in this matrix, you're looking at the data, a long one column and ask, "What

if I had another dimension? What happens?" It turns out that the entries in that dimension

are a polynomial in the volume. Now there's a certain class of combinatorial person that

who would recognize that just right off the bat. For the rest of us, it actually requires

some proof. It's not intuitively obvious. It's exactly the difference between the person

who says, "Oh, chromatic polynomials make sense." And another person who says, "Huh?"

A chromatic polynomial is where you got a graph or a map. You're going to paint and

the variable is the number of colors of paint available, and the value is "How many ways

can I paint the map with these colors?" and it turns out that's polynomial. And if sufficiently

experience [INDISTINCT], you'll say, "Of course." And if you're not, this will come as a complete

surprise. Same principle here. Anyway, we spotted this after getting the data and worked

out some of the polynomials with different columns and we worked out the rules for some

of the coefficients and the polynomials. And that actually allowed us to fill in a couple

more values in the table. These are the ratios. You fix the dimension and ask, "Suppose I increase the volume by

one, how much do I multiply the total count by?" And the middle column here is the observed

ratio. It's near as we can tell. The final column is something called the Tree Bound,

and that's an easy to prove bound we're used--the way you're actually doing it is you just have

a tree with this many dimensions and the appropriate connectivity. But you ignore the fact that,

you know, if you put down five squares going around the point, the fifth one is going to

be on top of the first and that doesn't count as polyomino but it counts in the Tree Bound.

So the Tree Bound is an upper bound. I also filled in the data for dimension minus one.

Now, that doesn't mean anything. But because we have this feature that the results are

a polynomial in the dimension, I can say, "What if the dimension where minus one just

calculate the values?" And there's no obvious pattern so we let it go. And this is the side

thing where we can take out all symmetries. So these are related to the previous set of

numbers and smaller, but you can't say exactly how much smaller except [INDISTINCT]. If you

add rows one and two, you get the traditional counts for polyominoes.

[PAUSE] >> SCHROEPPEL: The ratio here seems to be

growing. It's probably unbounded so the ratio for adding the--increasing the volume and

the dimension by one seems to be at least 10. It probably keeps growing. That's one

many talk. Okay. This is a motivation for the next problem. Everyone knows after they've

taken a computer science course that there are problems that cannot be solved. Usually,

you point to the [INDISTINCT] problem if your doing--if you're a computer science kind of

person or a programmer. Programmers especially are willing to believe that there are problems

that can't be solved. And if you're a mathematician, you kind of come at it from a different direction

with [INDISTINCT] although it comes out the same thing. So then the question is, well,

what's a natural example of an unsolvable problem because all the examples you see in

your courses are things that are just problems you'd never try to solve anyway. They don't

make sense. They require thinking long and hard, even understand what's being solved.

You can twist your brain in a knot trying to figure out all the quantifiers and so on.

It would be nice to come up with a simple problem you could point to and say, "We think

this might be unsolvable." The first candidate that comes to mind is something called the

3N+1. And I rejected that because even though we can't prove the answer, we know the answer.

You know, from empirical testing, we're pretty sure what the answer is. It's the same way

with the gold box conjecture. Even though you can't prove it, you look at the numbers

and it's pretty clear it's true. So here's something that where it's really hard to say

one way or the other what the answer is. This is a Post Tag Problem. And I worked on this

in cooperation with Allan Wechsler. [INDISTINCT] from the 20s through the 30s and the 40s,

maybe in the 50s, and he almost discovered [INDISTINCT] machines and unsolvability. He

danced all around it, didn't quite see it, but he was 10, 20 years ahead of everybody

else. He set himself a number of puzzles starting I think when he was in college or grad school

and said, "If I'm a mathematician, I should be able to solve these puzzles in a mathematical

way." And he found a surprising number of them. He couldn't do anything mathematical

with it at all. You know, you'd work on it. It was obviously translatable into a math

problem, but he couldn't get an answer. And this is probably his simplest one. It's called

the Tag Problem. What it is it's an operation on bit strings. You have a string of zeros

and ones, and you look at the front of the string. And depending on what it is, you either

[INDISTINCT] a double zero or 1101 to it and then you can strike off the first three bits.

So on the average, the length is unchanged for random string. But, of course, it's not

random after you've done this a while. Here's a simple example of processing a string. And

it winds up in a very simple loop alternating between length five and six. Now because it's

much easier to talk in terms of these blocks after you process through the string ones,

everything is built up of this A blocks and B blocks. So I think in terms of As and Bs

rather than zeros and ones, and A is a double zero and the B is the other 1101 pattern.

So the first question that comes to mind is what happens to a bit string? Now for simpler

examples, [INDISTINCT] was able to solve this problem. This is the smallest example where

we couldn't say what happens. One of the questions was, does everything either shrink to zero

or fall into a loop? Is there anything that grows to infinity? And it's actually not too

hard to show that if you have a more complicated tag system, you can stimulate a [INDISTINCT]

machine, so you know that you hit unsolvable problems. But this seems to be on the borderline.

It's hard to tell. Here's some examples of loops. It turns out if you take the particle

AB, which is six bits long, and the particle B squared, A squared which is double that,

you can include those together any way you want. With multiple repetitions, you get a

lopping pattern because the individual particles loop. Those are the only ones we know of that

work exactly that way. Other loops, you get--reproduce themselves by having some sort of overlap

involved and this bottom line is a very simple example of that. I wrote a graph search program

that found some more loops. They're longer and don't have any obvious interesting features

other than they look like a mess. The periods make no sense particularly. They're long.

What they represent is just loops in a graph going from one bit string to the next. I've

set my computer a couple of years ago to trying all possible combinations of As and Bs of

a given length and running all of them and seeing what happens. And the answer is they

run longer and longer periods of time. And eventually, I had to give up on the long ones

because it was a million bits long and just half there turning along. Not particularly

growing or shrinking, but it looked like it might take a very long time to get to a period

or a loop. Now if there's a turning machine simulation lurking in here somewhere, we would

expect to find something simple example where a pattern grows linearly in length with time.

Every time you put--you pass through it or cycle around it, it adds something somewhere

in a little bit longer. I didn't find any of those. Maybe I didn't look long enough.

Okay. That's the end of that one. Now we go back to PowerPoint.

>> [INDISTINCT]. >> SCHROEPPEL: Probably. He went to MIT. And,

you know, it was in [INDISTINCT] book, so. Yes.

>> [INDISTINCT]. >> SCHROEPPEL: Right. All right. Next is Asteroids.

[PAUSE] >> SCHROEPPEL: Well, I just flipped through

one--this one--this way. It's all words. Actually, to emphasize, these last two are not official

Sandia talks. Okay. We'll start with something simpler: Moving Planets. Somebody noticed a long time

ago the Sun was getting hotter. And on a billion year timescale, it's going to get really hot

here like maybe the oceans will boil. And that could make it an inconvenient place to

live. So it would be nice if maybe the Earth could move a little bit further away from

the sun on some sort of gradual basis. If you work out how much energy is needed to

do that, it's quite a bit. I learned the notion of a Tame Asteroid on the space digest mailing

list a long time ago from John McCarthy. I believe he got it from some astronomers but

I don't know the full provenance. The idea is that you have an Asteroid that swings by

of some noticeable mass and it moves the asteroid a whole lot out of its original orbit and

it changes your orbit a little bit because you outweigh it. And because of the weight,

chaotic orbits work. You can arrange things so that if you move the asteroid one centimeter

a year ahead of time that it comes out in a completely different place after it's done

that swing by--the fly by. So the idea is you very--you move your asteroid by little

tiny amounts and arrange it. You plan ahead for your next 17 encounters with the Earth.

And then you have to have some other planets where you dump the angular momentum or add

angular momentum, and you dump the orbital energy or add orbital energy. So you need

a couple of source and sync planets. Probably one on either side of the Earth but not necessarily.

There's an interesting analogy here with Feynman diagrams. If you've seen Feynman diagrams,

you have things like two electrons repel each other because they exchange a virtual proton.

Here we have an interaction between the Earth and Mars or Venus where you have an actual

particle. In this case a Tame Asteroid. It's literally moving angular momentum from one

planet to another. And depending on which side of the planet you go by, you add or subtract

and so on. There's an actual asteroid that's in a kind of synchronous orbit with the Earth and Venus.

It switches back and forth. Its orbit is not between the Earth and Venus. It actually reaches

out to about where Mars is. And moreover, it's not actually in the same plane as the

Earth and Venus. It's got to noticeably tilt and it's noticeably elliptical, but it's still

has this periodicity that's interesting. And the thing that it switches back and forth--controlled

by the Earth and by Venus is sort of neat. Of course it's way too small to have any effect

on our orbit, but it never even comes close. The control idea which reduces the amount

of energy, you need to do this is called the Butterfly Effect. And the idea is that you

make a very small change far enough ahead in time and then you have a very prediction

in algorithm. The problem with doing this to predict the weather, of course, is that

there are million influences on the weather. But mostly speaking, there aren't a lot of

influences on planetary orbits except once you know about and kind of count for. Okay.

Here's some of the problems you run into. Big tides. Mars is probably on the way if

we start moving the Earth out. You got to use at least two other planets. You can use

Jupiter and Saturn. By doing your flybys on the correct side, you can subtract instead

of add and so on. On the other hand, Jupiter and Saturn are in some kind of resonance that

has sort of cleared outpours into the asteroid belt. And adjusting in their orbits might

actually not be a good idea. And incidentally, our test particle is nowhere near the Earth.

We're just probably a good thing right now, but sort of a mess in the plan. Okay. So here's what we do to get [INDISTINCT]

where we need it or something else. This is called the gravity machine. What it is, is

a bunch of particles of different sizes from meter size up to kilometer size and in between.

Then--and it's a swarm so they're all mutually gravitating with each other. And the idea

is you push on the small ones and you have a really good computer and you know the masses

very accurately. It allows you to control the behavior of the whole thing. Moreover,

you can absorb things that the swarm swoops down on them and you've carefully arranged

all the gravity so that the thing you swoop down on becomes a member of the swarm and

everything slows down to the average orbit because you can't change the--what happens

to the mechanics of the centers of mass of your swarm and the particle you're seizing.

One actually possible useful thing to do with this is to put another small moon in Earth

orbit. We'd like something that has a more favorable composition than the moon we have

with useful things like water. We'd like something that's a little easier to reach than the moon

we have. We don't necessarily want it to be super large because of the title of effects.

But there's something we could actually do in somebody's lifetime if you live really

long. This is going to change the appearance of the night sky. Okay. The propulsion in the gravity machine.

The internal control is by manipulating the smallest pieces. There are a lot of other

influences besides gravity that you need to think about if you're trying to do 16 decimal

predictions. There's thermal expansion, and that's going to change angular momentum and

so on. They're shadowing a lot of problems. Moreover, you can't change the barycenter

of the object here of your swarm. The center mass lies the same. So for that part of it,

you got a manipulate it and use the flybys of major large objects as your control--well,

not your control mechanism but as your amplifier. The other thing is your control has to exceed

your unknown influences. You need to be the biggest butterfly on the block. Your advantage

is you can look ahead but you have a very weak push. And so surprised pushes are a problem.

Some of the things you need to think about; real asteroids are not spheres, which means

they have interesting gravity fields. More seriously they're not solid in many cases.

And whatever they shape they've adopted is according to their particular gravity environment.

And if you change that much, things might shift in your asteroid in ways you're not

prepared to deal with. And internal heat diffusion is not too hard to compute for a solid object

of known composition. But, of course, if you don't have a solid object and if you aren't

pretty sure about the composition in particular the hidden parts inside could be other kinds

of rocks and you don't know what you're touching and so on. The heat becomes a puzzle. One

of the big things is a lot of the stuff you'd like to bring home to Earth has volatiles

in it like water and, you know, ammonia, methane, carbon things. And you can't use comets because

comets outgas and produce noticeable thrust as a result. There's also a lot of stuff we

don't know about wandering around the solar system. Normally you don't care. But if we're

trying to do 16 digit calculations, then small stuff begins to matter. And if you've been

watching the news the last couple of days, you see that Jupiter just got hit by something

that nobody knew what was. What we see is a big black hole in Jupiter's atmosphere and,

you know, people are guessing it was a comet but, you know, not a known comet. Other things;

the solar wind varies and potentially magnetic fields could be a problem if you got a--an

iron asteroid. There's some safety issues. And the other thing is that if you bring something

home, you probably need a bag. You want to hold on to your water, of course, but also

you can't have little bits of it leaking off into other nearby orbits. We've already got

a space junk problem from stuff we've put up there. These are some of the things we

might do to push this idea. You know, it's perfectly accessible to actual small scale

tests on, you know, 10 meter-sized boulders and so on. Okay. I'm ready for the next run. How much

time? Two minutes. All right. Very quick on origin of life.

[PAUSE] >> SCHROEPPEL: I call this the Fish Pond Theory.

Okay. If you look back and try to figure out what got life started, the biggest hurdle

is a bunch of chemicals came together by random chance and it started replicating. Here's

an idea. Inspired by Native Hawaiian Fish Ponds. The Hawaiians built this rock walls

out in the ocean with rocks and they arrange things so that there weren't any big holes

in the walls. So little things could swim in and grow and then couldn't swim out so

they were trapped inside the fish pond. Eventually, the fisherman would come along and select

a large fish for dinner. [PAUSE]

>> SCHROEPPEL: Now the puzzle on life. You need a boundary to concentrate your chemicals

or else they'll diffuse out into the ocean. You expect to see some sort of growth. You

need reproduction and ideally you have what's called heritable change which allows revolution.

And then you need some sort of energy or thermodynamic radiant to drive the process forward to make

it happen. And then there's a line that says "Evolution does the rest". It covers a whole

bunch of problems behind what I'm trying to address here. Okay. The boundary is a solved

problem for some people, you'd assume primordial soup in an ocean full of chemicals and you

assume the oily chemicals float to the top of that ocean and those become cell walls.

You get a wave, the wave breaks, the oil is--becomes oil and vinegar basically, it's the bubbles.

[INDISTINCT] is any small magic molecule. My magic molecule is smaller than most magic

molecules, that's probably bigger than amino acid but not too much bigger. I call it M.

One of the things atoms like to do and molecules like to do is polymerize. Sulfur in particular,

the room temperature stuff you see, the yellow mass, is made up of molecules of eight sulfur

atoms arranged in an octagon, in sort of a--and I don't know the name of the configuration,

it's not a flat plain but it is an octagon or they call it crown. So I'm going to assume

my magic molecule forms rings. -So that means its mixed polymers that has at least two act

events like amino acids do or a lot of the stuff that becomes polyplastic or one kind

or another. That the joining of the two ends is energetically favored. Now for amino acids

that's actually not exactly true, it's roughly neutral and whether the amino acids join or

not, depends on detailed conditions. And when we actually join together amino acids to make

proteins, their enzymes and energy driving and so on to make sure it goes forward. And

M likes to polymerize but doesn't usually because there isn't a whole lot of it around.

The M monomer can't cross an oily boundary. And I'll note that amino acids actually meet

most of these criteria more or less. M will polymerize if you get it--enough of it together.

And the rings hide part of the functionality on the M molecule. What's left on the outside

likes to dissolve in oil. But there's a hole in the middle if it's octagon or whatever,

and in that middle that becomes a hole in the wall, which in fact cells use a lot. They

put proteins in the walls with holes. And we're going to assume the hole is big enough

to let single M through but not double M or triple M, because they're bigger molecules

they don't fit through the hole. So inside our cell, if you get two M's coming in, they're

trapped. After you build the real ring--oh, okay I need too rush through this. Okay, the

ring travels to the wall and becomes a new hole. You get growth simply by rings adding

to the wall and more M wanders in because the cell in effect is a sync for M because

it's removing M from the solution. To reproduce, you need a wave that breaks up the cell, the

leftover pieces of wall form a new cell. Heritable changes iffy, but I'm assuming you have varieties

of the M. The energy gradient is just the polymerization. The theory is lab testable

in contrast of most origin of life theories, we can actually try to make molecules and

see what happens. And this is another different idea and that's the end. So, thank you very

much.