Monday, December 31, 2007

Ignorance of Blisks

(On the Requirement of Manual Production for Aerospace Engineering Tools) In June, I spent some hours with my rocket-scientist little brother, and we talked about my own near-Singularitarianism. I may well have over-emphasized my liking for the idea of 3D printing, which I found esthetically appealing in the late 80s--still do, and now I do expect it to get to be important. Then in July in the mail I found a clipping from "Aviation Week and Space Technology" about the difficulties of manufacturing the machines that manufacture "blisks". There was a note attached; the note said that

this article reminds me of why I'm skeptical of "printable machines" in the next 10-20 years.
Umm..well, okay, Roger. I'm not even sure I disagree with you, in the sense that I'm not sure we'll see printed blisk-making machines within 20 years. (We might, or we might instead see "printed" blisks, or neither, and Hofstadter's Law applies as to the production of this note.) But when I'm not sure, I get even more verbose than when I am sure, and that's what I propose to do now -- and in any case, I am pretty sure that the article is wrong, whether or not your conclusion from it is correct. I was totally ignorant of blisks. My conclusions, introductions, and everything in betwen are therefore based on bliskful ignorance. Read at your own risk.

Oh, yeah? The magazine sets up the problem in paragraph 5: blisk-making needs

a machine tool whose cutting head can follow the digitally encoded design to within about 12 microns.
Okay, 12 microns is pretty small in today's world of practical machinery. However, I think the crucial statement in your magazine is in paragraph 6: this tool "must be built by hand" because, he claims,
No machine can make another as precise as itself.
If the author (Bradley Perrett) says that's true of the relevant machines at this moment, I have no reason to doubt him. If he says it as a statement of principle, or even as something that will reliably hold for any particular machines over the next 10-20 years (your chosen timeframe), I don't believe it at all. In general, I would say that we have machines of high precision because they have been built, with much difficulty, by machines of lesser precision -- under direction which has been human but which could be robotic. As a special case, I believe that it's perfectly possible for a machine to be built which builds a copy of itself, to exactly the same tolerances. Indeed, my sense of where the future is going, as well as my sense of the past few billion years of evolution, is based on that belief. So...

Replication and Precision: principle Some (especially analog) methods of replication will always diminish precision (which can be recovered in some cases by tactics such as grinding and polishing), whereas other (digital, including DNA) methods may or may not diminish precision (which can be recovered in some cases by error-correcting codes or simply by selection, as long as we can generate enough variants that some of them are okay, and the rest can be rejected.) In essence, I think Perrett is confusing the standard-meter problem:

  • if you define the meter as a metal stick and try to make analog copies (even with a digitally-controlled tool) then you will lose some precision with each generation of copies. This has been going on for a while; a friend who's an archaeologist at Colgate (Rebecca Ammerman) has written about ancient terracotta goddess statues made from molds made from statues made from molds...(and about tracking the sequence via imperfections and size change.) I believe that's what Perrett is getting at with his "No machine can..." claim.
  • If, however, you define the meter as a multiple of some reproducible wavelength, say a carbon-dioxide laser's 9.6 microns which I pick because it's close to the stated 12 microns, or a TEA laser's 337.1 nm, then the problem of measurement-reproducibility becomes solvable...not at all easy, but solvable.
Lasers are not crucial in principle, but they're conveniently well-defined, and the meter definition is already implemented via helium-neon lasers:
...actual laboratory realisations of the metre are still delineated by counting the required number of wavelengths of light along the distance.
And as you probably know, commercial laser interferometrics can provide pretty fair accuracy in a fairly large space:
With regard to the lineal control of the laser interferometer, the resolution is 0.16 microns; the repeatability is 1 micron +1 micron/meter; and the accuracy for radial distance is 10 microns +0.8 microns/meter.
I don't claim that this magically makes it easy to make blisks or blisk-makers, but the point of interferometry is that you're using precision that comes from the light-waves themselves -- and the effective precision can and does improve without any magical manual touch. I think of the improvements as a Moore's Law instance, but it may be that Moore's Law thinking is irrelevant here; after all, in 1991 we already had, in principle, a Laser interferometric system such that
This method achieved (i) sub-nanometer resolution (0.6 nm/LSB), (ii) high stability (2.5 nm/day), (iii) high linearity (less than 1 LSB), and (iv) high following speed (more than 1000mm s-1).
Well, maybe. I expect that some future CNC systems, including but not limited to 3D printers, will use a well-defined 3D grid which will, if that's desirable, be based on counting off wavelengths one way or another. And later, maybe we'll be counting off specific (crystallized?) molecules. And later, maybe we'll be counting off spaces in a graphene grid, six carbon atoms around each hexagonal tile. And then maybe we'll have to stop, but 12 microns will not seem small. Twelve microns will seem, will be, huge.

And so? Somehow, Roger, I doubt that you'll consider that to be an adequate response. Maybe it isn't, but anyway you won't think so. Hmm...

Outline: I want to bloviate on our respective professional-geek biases (why you won't take this seriously), and then on my most fundamental caveats (why you shouldn't take this seriously), and then on what I am "predicting", to the extent that I'm predicting anything, and why. You can then decide whether you actually disagree, and whether any comment you might make could possibly lighten my hopeless ignorance.

Biases: In general, we judge and misjudge by supposing that the future will look like X, X being something we think we've learned from in the past, yes? Think back long before our respective PhuDs to the time of the moon landing, with us both in Colegio Nueva Granada -- high school for me, first grade (second?) for you. (Okay, maybe you don't remember it that clearly, but a high-school classmate named Al Borrero got in touch recently, I guess we'd both gotten in touch with "legendary guitarist" and anthro prof Hector Qirko who'd just been interviewed by instapundit, and Al remembers you and Magi as "very very bright." You were there, you were aware.) At that time, Moore's Law was four years old: two doublings since its proposal. It has gone on pretty well since, with some wobbles as to what was doubling. If your tools had improved the way mine have improved, I suppose we'd all be commuting to the moon, and we certainly wouldn't have to wait ten years to see if your itty-bitty rockets work when they get to Pluto. Your tools have not improved that much; in some respects they haven't improved fundamentally at all. It's natural for my model of improvement to be more dramatic than yours.

My well-grounded lack of confidence: That doesn't mean that Moore's Law is (or is not) now an appropriate model for your subject or mine. Back when I ended my PhuD work in 1980, I remember I believed predictions that we were nearing physical limits that would stop it cold. After that, we would depend on massive parallelism (remember the Connection Machine?) and so I was one of the throng doing proofs about parallel programs, functional programming, algebras of parallel-reducible expressions and automatic parallel scheduling of the evaluation of recursively-defined arrays...but Moore's Law kept on going, sequential machines were good enough to support the "desktop revolution" (say 1975 to 1990, though it continues) and then the "connectivity revolution" (say 1990 to 2005, though it continues). Anyway, Moore's Law kept on going: I wuz wrong. I still expect the original circuit-size Moore's Law (and the directly-associated variations, e.g. disk drive capacity per square inch) to hit a limit, and I still expect growth of effective computer capacity to continue for a while via massive parallelism, and I still even expect functional languages (e.g., within Erlang?) to play a larger role, but I don't have much of a track record for any predictions in any direction.

My basic "prediction": What I think I'm starting to see now, something I haven't seen before, is a robotics revolution to follow the connectivity revolution of WWW and cell phones. A Moore's Law progression beginning in robotics, 3D printing, and associated (CNC, mostly) technologies. Back in the 80s, I thought 3D printers might get real someday; now I mostly think that today's toys of that general category will be twice as good in a couple of years, and will similarly go on doubling their overall goodness (which I have no intention of trying to define) at roughly Moore's Law rates for the next decade or three.

Am I sure? Nah. What I see now might not continue. On the other hand, we might see growth (or size/cost shrinkage) a whole lot faster than Moore's Law, because lots of technologies for working at very small scales have already been developed.

Why am I talking this way? Basically, I look at reports of progress, of robotics in medical and military and pure-research and just for fun, of 3D printers and "fab labs" mostly in the last two categories, and the progress of productivity in random places: a robotics summary last year claimed that

Prices of industrial robots, expressed in constant 1990 US dollars, have fallen from an index 100 to 54 in the period 1990-2005, without taking into account that robots installed in 2005 had a much higher performance than those installed in 1990. When taking into account quality changes, it was estimated that the index would have fallen to 22.
In the same period (1990-2005), the index of labour compensation in the American business sector increased from 100 to 179. This implies that the relative prices of robots fell from 100 in 1990 to 23 in 2005 without quality adjustment, and to 10 when taking into account quality improvements in robots.
I find that plausible; I expect it to continue; if it does continue, I expect to see the production and manipulation of objects revolutionized in much the same way that we've seen in the production and manipulation of data. I don't see why blisks should be an exception.

Of course, if robotic production of blisks (and of other things) does well enough, there will be correspondingly less incentive to speed the improvement of 3D printing. I expect 3D printing to improve at roughly Moore's Law rates also, I expect 3D printing to dominate robotic production for a large class of consumer goods within your chosen time-frame, but high-precision stuff that's tough enough for aerospace will be hard. It is possible that it won't work well enough until we can do atomic-level assembly (not just putting an atom in the right place, but putting it in the right place with the right bonds); that's hard. No, not impossible. Just hard--much harder than the industrial-robot approach for almost any individual problem (e.g., blisks.) . Consider Robot Sales Up 33% in North America in First Nine Months of 2007 - Robotics Online:

Among the best performing non-automotive markets this year are life sciences/pharmaceutical/ biomedical/medical devices (up 20%), food & consumer goods (up 15%) and plastics and rubber (up eight percent).
Of course those are sales, which as they admit are driven by cycles as well as by fundamental change; I expect many of those numbers to drop over the coming year, even if the actual-recession trade price is still under 50%. In the longer run, though, increasing required precision will force less manual processing, not more: Perrett is quite fundamentally wrong. I think. Your blisks will be produced automatically -- probably within your rocketry career (You don't personally get involved with blisks, do you?) Never mind, the point is increasingly automatic production, especially of the means of (increasingly automatic) production. Even though I did give my youngest a soldering kit for Christmas, and I'm hoping to get her to use it.

On the other hand, I think you're surer of what you're sure of than I am sure of what I'm sure of, and you may be right.

Or then again (and again and again), maybe not. Happy New Year!

Labels: , , ,

Monday, December 03, 2007

Maximal Meaningful DNA: 25 Megabytes?

At Overcoming Bias, Eliezer Yudkowsky asserts that:

There's an upper bound, a speed limit to evolution: If Nature kills off a grand total of half the children, then the gene pool of the next generation can acquire a grand total of 1 bit of information.
and that's very cool. In a sense it's obvious; selection is pushing you down a tree of choices, rather like the tree of choices involved in sorting where we tediously show students how sorting can't be better than O(N*log(N)). We think of evolution as answering a series of yes/no questions, going from a breeding population of a zillion with no answer for question Q, to a population of two zillion young'uns of whom half try out "yes", half "no", and then to a surviving next-generation breeding population of a zillion who have survived by choosing the right answer. I like it. Yudkowsky continues:
I am informed that this speed limit holds even with semi-isolated breeding subpopulations, sexual reproduction, chromosomal linkages, and other complications.
Yeah, I think I can believe that. I think. It's very plausible, and I don't see a way to attack it -- if somebody challenged me with an attack I would not say it's a priori ridiculous to try, especially if there's a way to isolate subsystems of questions which are separately answered by subpopulations, but I would expect them to fail -- I don't think you can know which subsystems to isolate until after you have the answer. He then goes on with:
Let's repeat that. It's worth repeating. A mammalian gene pool can acquire at most 1 bit of information per generation.
and this is clearly dependent on the assumption (slightly discussed) that the selection of DNA sequences starts with a pool of roughly twice the surviving size, i.e. about four offspring per pair. For mammals, that sounds right, yes? And if so, we can go on with
Among mammals, the rate of DNA copying errors is roughly 10^-8 per base per generation.
and if we build up to 100,000,000 base pairs, then we can add one and lose one per generation so we've hit the maximum and that's two base-pairs per byte so we get 25 megabytes for the maximum meaningful mammalian DNA.

This strikes me as extremely cool, but actually my current opinion is that it's wrong for a very simple reason: yields over 62,000 hits, while yields over 500. In other words, some of the DNA selection occurs before we see the offspring. How much? Well, as Simon Levay put it:

as anyone who has watched the Discovery Channel knows, a maverick sperm takes a flood of its buddies along for the ride — between one hundred million and seven hundred million tail-snapping semen-surfing spermatozoa in each ejaculation.
Of course that number can be a lot less and still have reproductive success, but clearly there is selection of sperm (and ova, to some extent) going on.

As a programmer, I'm thinking of sperm-selection and ovum-selection as module testing; the miscarriages that then take out at least some pregnancies serve as initial system-integration testing; and then we get the approximately one bit added from post-birth selection.

One major caveat: the external environment is not necessarily involved (it may be involved, since some environmental stimuli do clearly get through). So pre-birth selection is not equivalent to post-birth selection; in particular, it may have an extremely limited ability to select bits relating to the external environment. However, a whole lot of the environment, for any given gene's expressed proteins, consists of other genes' expressed proteins and their consequences.

So, how much meaningful DNA can be supported? Each doubling in offspring corresponds to an extra bit to be selected; a hundred-million-fold increase is more than 26 doublings. In fact using Scott Aaronson's summary

we’ll never find any organism in evolutionary equilibrium, with mutation rate ε and K offspring per mated pair, with more than (log2(K)-1)/(8388608ε) MB of functional DNA.
we're talking about a possibly 26-fold increase in log2(K); a few hundred megabytes, instead of just 25.

And ova? Well, it seems to me that if the ovum's genetic expression is largely independent (doing different things, expressing and testing different genes than sperm) then whatever expansion there is for ova should be a multiplier; if we form an embryo by choosing from 1E8 sperm and, say, 100 ova, then actually we're selecting from 1E10 potential embryos -- that would give us a basis for maintenance of all our DNA as non-junk. In this kind of consideration, the redundancy of the genes from parents is obviously relevant, and I'm not at all sure how to handle it; but we are able to use the zillions of sperm to get right answers to roughly log2(1E8) questions. Whether the actual reproductive process does so, and whether there really is a more than 25MB (or thereabouts) package of data, is an experimental issue, but I'm not sold on Yudkowsky's belief that this line of reasoning predicts the junkiness of junk DNA.

The principle, though, is clearly convincing.

A random thought, while updating: the error rate has to be non-negligible in order to accumulate information, but perhaps it could be a variable if there's a way of detecting "we're near a local optimum" (with better-than-random success) and stepping up error correction if so. In particular, consider the fact of variation at equilibrium; it's a little hard to think about this in the current context, where I've been supposing that each DNA locus has a single "right" answer, but a species in or near equilibrium, a "successful" species, doesn't generally consist of clones ... for a variety of reasons. I hereby conjecture that if you're a member of a species under stress, one far from equilibrium because it's "losing", then it's relatively more likely that your parents will both have had the same value for gene G, for any given G. (For example, a habitat changes temperature and only the least or most heat-sensitive survive.) If so, your error-correction algorithm should look at the genes it is copying and say "hmm...too many of these are identical. Better not try so hard." The effective mutation rate will therefore rise. I have no idea whether or not any real systems work this way, but they might.

Or then again, maybe not.

Labels: , , ,