Archive for the ‘Computer Science’ Category

Using Unambiguous Notation

October 19, 2007

I’ve mentioned their book the “Structure and Interpretation of Classical Mechanics” a couple times already; today I’d like to recommend that you read a wonderful short paper by Gerald Jay Sussman and Jack Wisdom, called “The Role of Programming in the Formulation of Ideas,” which helps explain why understanding physics and the other mathematical sciences can sometimes be so difficult. The basic point is that our notation is often an absolute mess, caused by the fact that we use equations like we use natural language, in a highly ambiguous way:

“It is necessary to present the science in the language of mathematics. Unfortunately, when we teach science we use the language of mathematics in the same way that we use our natural language. We depend upon a vast amount of shared knowledge and culture, and we only sketch an idea using mathematical idioms.”

The solution proposed is to develop notation that can be understood by computers, which do not tolerate ambiguity:

“One way to become aware of the precision required to unambiguously communicate a mathematical idea is to program it for a computer. Rather than using canned programs purely as an aid to visualization or numerical computation, we use computer programming in a functional style to encourage clear thinking. Programming forces one to be precise and formal, without being excessively rigorous. The computer does not tolerate vague descriptions or incomplete constructions. Thus the act of programming makes one keenly aware of one’s errors of reasoning or unsupported conclusions.”

Sussman and Wisdom then focus on one highly illuminating example, the Lagrange equations. These equations can be derived from the fundamental principle of least action. This principle tells you that if you have a classical system that begins in a configuration C1 at time t1 and arrives at a configuration C2 at time t2, the path it traces out between t1 and t2 will be the one that is consistent with the initial and final configurations and minimizes the integral over time of the Lagrangian for the system, where the Lagrangian is given by the kinetic energy minus the potential energy.

Physics textbooks tell us that if we apply the calculus of variations to the integral of the Lagrangian (called the “action”) we can derive that the true path satisfies the Lagrange equations, which are traditionally written as:

lagrange001.jpg

Here L is the Lagrangian, t is the time, annd qi are the coordinates of the system.

These equations (and many others like them) have confused and bewildered generations of physics students. What is the problem? Well, there are all sorts of fundamental problems in interpreting these equations, detailed in Sussman and Wisdom’s paper. As they point out, basic assumptions like whether a coordinate and its derivative are independent variables are not consistent within the same equation. And shouldn’t this equation refer to the path somewhere, since the Lagrange equations are only correct for the true path? I’ll let you read Sussman and Wisdom’s full laundry list of problems yourself. But let’s turn to the psychological effects of using such equations:

“Though such statements (and derivations that depend upon them) seem very strange to students, they are told that if they think about them harder they will understand. So the student must either come to the conclusion that he/she is dumb and just accepts it, or that the derivation is correct, with some appropriate internal rationalization. Students often learn to carry out these manipulations without really understanding what they are doing.”

Is this true? I believe it certainly is (my wife agrees: she gave up on mathematics, even though she always received excellent grades, because she never felt she truly understood). The students who learn to successfully rationalize such ambiguous equations, and forget about the equations that they can’t understand at all, are the ones who might go on to be successful physicists. Here’s an example, from the review of Sussman and Wisdom’s book by Piet Hut, a very well-regarded physicist who is now a professor at the Institute for Advanced Studies:

“… I went through the library in search of books on the variational principle in classical mechanics. I found several heavy tomes, borrowed them all, and started on the one that looked most attractive. Alas, it didn’t take long for me to realize that there was quite a bit of hand-waving involved. There was no clear definition of the procedure used for computing path integrals, let alone for the operations of differentiating them in various ways, by using partial derivatives and/or using an ordinary derivative along a particular path. And when and why the end points of the various paths had to be considered fixed or open to variation also was unclear, contributing to the overall confusion.

Working through the canned exercises was not very difficult, and from an instrumental point of view, my book was quite clear, as long as the reader would stick to simple examples. But the ambiguity of the presentation frustrated me, and I started scanning through other, even more detailed books. Alas, nowhere did I find the clarity that I desired, and after a few months I simply gave up. Like generations of students before me, I reluctantly accepted the dictum that ‘you should not try to understand quantum mechanics, since that will lead you astray for doing physics’, and going even further, I also gave up trying to really understand classical mechanics! Psychological defense mechanisms turned my bitter sense of disappointment into a dull sense of disenchantment.”

Sussman and Wisdom do show how the ambiguous conventional notation can be replaced with unambiguous notation that can even be used to program a computer. Because it’s new, it will feel alien at first; the Lagrange equations look like this:

functional002.jpg

It’s worth learning Sussman and Wisdom’s notation for the clarity it ultimately provides. It’s even more important to learn to always strive for clear understanding.

One final point: although mathematicians do often use notation that is superior to physicists’, they shouldn’t feel too smug; Sussman and Wisdom had similar things to say about differential geometry in this paper.

SICM on Mac OS X

October 17, 2007

“Structure and Interpretation of Classical Mechanics,” (SICM) by Gerald Jay Sussman and Jack Wisdom, with Meinhard Mayer, is a fascinating book, revisiting classical mechanics from the point of view that everything must be computationally explicit. I already mentioned the book in a previous post.

The book is available online, and all the software is freely available on-line as well. The software is written in Scheme, and a very extensive library called “scmutils” was developed to support computations in classical mechanics, including implementations of many symbolic and numerical algorithms.

I think that many scientists and programmers could find the “scmutils” library to be generally useful, even if they are not particularly interested in classical mechanics. If you are using the GNU/Linux operating system, there’s no problem in getting the library working. However, if you want to use it on Mac OS X (or Windows), the instructions leave the impression that it’s not possible, and Googling turned up some useful information, but no complete instructions, and also some people that seemed to be at a loss about how to do it.

Well, it is possible to get MIT-Scheme with the scmutils library running on Mac OS X (and you can probably modify my instructions to make it work on Windows too):

Click here for my instructions for running scmutils on Mac OS X.

Multicellular Logic Circuits, Part III: A Model

September 26, 2007

In Part I and Part II of this series, I discussed genetic algorithms and why we might want to create artificial machines that begin life as a single cell, and develop into networks of identical communicating cells. In this post, I want to begin describing a model that works along these lines.

The model is a highly stylized and simplified cartoon of biological multicellular organisms. It my attempt to make the simplest model possible that captures the essence of what is happening in biology. So understand that biology is more complicated than this model; but the goal is a model stripped down to those essential elements that cannot be taken away if one wants something that looks like life. Thus, the model is proposed in the spirit of the Ising model of magnets in statistical physics; the simplest model that captures the general behavior we are looking for.

The first question is what do we want our machine (or “circuit” or “network” or “organism”; I will use these terms interchangeably) to do? As is quite conventional in hardware design, I will presume the organism receives some input signals from the world, and it is supposed to produce some desired output signal, which depends on the inputs it has received at the current and previous times. Thus, the circuit should in general be capable of creating memories, that lets it store something about previous inputs.

model001.png

The organism begins its life as a single cell, and then has two phases in its life, a dynamic “embryonic” phase and a static “adult” phase. During the embryonic phase, the cells in the organism can undergo developmental events, primarily cell duplication, but also perhaps cell death or cell relocation, to sculpt out the final network of communicating cells. After the embryonic phase is complete (say after a fixed amount of time has passed, or some signal is generated by the circuit) the adult phase is entered. The network is static in structure during the adult phase. It is during the adult phase that the network can be tested to see whether it properly computes the desired input-output function. The figure above is a pictorial representation of the model that hopefully makes clear what I have in mind.

Each of the cells in the network will have an internal structure, defined primarily by “logic units” which send signals to each other. The computations performed by the organism will simply be the computations performed by the logic units inside of its cells. The details of what the logic units do, and how they are connected to each other, is specified by a “genome” or “program” for the organism.

Look at the figure below for a peek inside an individual cell in the model. Each cell will have an identical set of logic units, with identical connections between the logic units.

cell2002.png

The logic units compute an output according to some fixed function of their inputs. They transmit that output after some delay, which is also part of their fixed function. The output of one logic unit will be the input of another; they send “signals” to each other.

These signals are of various types (see the above figure). The first type of signal, called a “factor signal,” will always go from a logic unit to another logic unit in the same cell. The second type of signal, called an “inter-cellular signal,” will always go from a logic unit to a logic unit in a different cell. The third type of signal, called a “developmental output signal,” will not actually go to another logic unit, but will be a signal to the cell development apparatus to perform some important development event, such as duplication or programmed cell death. Finally, the fourth type of signal, called a “developmental input signal,” will be used by the cell development apparatus to signal that some type of cell development event has occurred, and will serve as an input to logic units.

Remember that initial cell (the “fertilized egg”) will need to have a set of logic units that enable it to automatically create the adult network, so it must effectively contain the instructions for development as well as for the adult circuit. It might seem hard to imagine that this can work, but it can. In the next post in this series, I will discuss in more detail the process of development in this model, and then we will be in position to look at some interesting multicellular circuits that I have designed.

If you don’t want to wait, you can visit this page to find a PDF and a PowerPoint version of a talk I gave on the subject at a conference in Santa Fe in May 2007, although unfortunately, it might be hard to decipher without my explanation…

Talking about Probabilistic Robotics

September 23, 2007

Sebastian Thrun is a professor of computer science and electrical engineering at Stanford, and director of the Stanford Artificial Intelligence Laboratory. He was the leader of Stanford’s team which won the $2 million first prize in the 2005 DARPA Grand Challenge, which was a race of driver-less robotic cars across the desert, and also leads Stanford’s entry into the 2007 DARPA Urban Challenge.

One of the ingredients in the Stanford team’s win was their use of “probabilistic robotics,” which is an approach based on the recognition that all sensor readings and models of the world are inherently subject to uncertainty and noise. Thrun, together with Wolfram Burgard and Dieter Fox have written the definitive text on probabilistic robotics, which will be a standard for years to come. If you are seriously interested in robotics, you should read this book. (The introductory first chapter, which clearly explains the basic ideas of probabilistic robotics is available as a download here.)

The Laboratory of Intelligent Systems at the Swiss École Polytechnique Fédérale de Lausanne (EPFL) hosts the superb “Talking Robots” web-site, which consists of a series of podcast interviews with leading robotics researchers. I noticed that the latest interview is with Thrun, and liked it quite a bit; it is well worth downloading to your iPod or computer.

You can watch Thrun speaking about the DARPA Grand Challenge at this Google TechTalk.

Artificial Intelligence: A Modern Approach

September 20, 2007

images.jpeg

“Artificial Intelligence: A Modern Approach,” by Stuart Russell (professor of computer science at UC Berkeley) and Peter Norvig (head of research at Google) is the best-known and most-used textbook about artificial intelligence, and for good reason; it’s a great book! The first edition of this book was my guide to the field when I was switching over from physics research to computer science.

I feel almost embarrassed to recommend it, because I suspect nearly everybody interested in AI already knows about it. So I’m going to tell you about a couple related resources that are maybe not as well-known.

First, there is the online code repository to the algorithms in the book, in Java, Python, and Lisp. Many of the algorithms are useful beyond AI, so you may find for example that the search or optimization algorithm that you are interested in has already been written for you. I personally have used the Python code, and it’s really model code from which you can learn good programming style.

Second, if you haven’t ever visited Peter Norvig’s web-site, you really should. I particularly recommend his essays “Teach Yourself Programming in Ten Years,” “Solving Every Sudoku Puzzle,” and “The Gettysburg Powerpoint Presentation.”

Multicellular Logic Circuits, Part II: Cells

September 18, 2007

In my post “Multicellular Logic Circuits, Part I: Evolution,” I discussed evolution and genetic algorithms; I want to continue that discussion here.

There are two salient facts of biology that are completely inescapable. The first is that all organisms are shaped by the process of evolution. The second is that all organisms are constructed from cells.

Furthermore, all complex multicellular organisms begin life as a single cell, and undergo a process of development through cell division to mature into an adult. And no matter how different any two organisms may be on the gross macroscopic level that we are used to, inside their cells the chemical processes of life are fundamentally very similar.

080534635x.jpg 0815340729.jpg

Thus it is no accident that the titles of the two leading textbooks in molecular biology are The Molecular Biology of the Gene by Watson, et. al. and The Molecular Biology of the Cell by Alberts et. al. [These are both great books. This link to the first chapter of MBOC is an excellent entry point into modern biology. And if you are serious about learning biology, I also strongly recommend the companion Molecular Biology of the Cell: A Problems Approach, by Wilson and Hunt, which will force you to think more actively about the material.]

It therefore seems reasonable that if we want to construct artificial systems that achieve the performance of natural ones, we should consider artificially evolving a system constructed from cells.

Although there are typically many different cell types in a mature multi-cellular organism, all the different cells of the organism, with the exception of sperm and egg cells, share an identical genetic specification in their DNA. The different behavior of cells with identical genetic specifications is the result of the cells having different histories and being subjected to different environments.

More specifically, the behavior of a biological cell is controlled by complex genetic regulatory mechanisms that determine which genes are transcribed into messenger RNA and then translated into proteins. One very important regulatory mechanism is provided by the proteins called “transcription factors” that bind to DNA regulatory regions upstream of the protein coding regions of genes, and participate in the promotion or inhibition of the transcription of DNA into RNA. The different histories of two cells might lead to one having a large concentration of a particular transcription factor, and the other having a low concentration, and thus the two cells would express different genes, even though they had identical DNA.

Another important mechanism that controls the differential development of different types of cells in a multi-cellular organism is the biochemical signaling sent between cells. Signals such as hormones have the effect of directing a cell down a particular developmental pathway.

In general, the transcription factors, hormones, and multitude of other control mechanisms used in biological cells are organized into a network which can be represented as a “circuit” where the state of the system is characterized by the concentrations of the different biochemical ingredients. In fact, biologists are now using wiring diagrams to help summarize biological circuits; see for example, the “Biotapestry editor” developed by Eric Davidson’s lab at Caltech.

books.jpeg books2.jpeg

[I strongly recommend Davidson’s recent book The Regulatory Genome: Gene Regulatory Networks in Development and Evolution for an exciting introduction to the burgeoning “evo-devo” field; if you don’t have any background in biology, you may prefer The Coiled Spring, by Ethan Bier for a somewhat more popular account.]

Turning to the problem of designing artifical systems, a natural question is what theoretical advantages exist, from the point of view of designing with evolution, to using an identical genetic specification for all the cells in a multi-cellular organism.

One potential advantage is that relatively small changes to the genetic specification of the organism can concurrently alter the behavior of many different kinds of cells at many different times during the development of the organism. Therefore, if there is the possibility of an advantageous change to the circuitry controlling a cell, then it can be found once and used many times instead of needing to find the same advantageous mutation repeatedly for each of the cells in the organism.

Another related potential advantage is that a highly complicated organism can be specified in a relatively compact way. If each of the trillions of cells in a complex organism like a human had to be separately specified, then the overall amount of information required to describe the human genome would be multiplied more than a trillion-fold. Clearly, it is much more efficient to re-use the identical circuitry in many different types of cells.

In other words, biology uses a strategy of specifying a complex multi-cellular organism by just specifying a single cell–all the other cells in the mature organism are grown organically out of the developmental process. This seems like a strategy worth imitating.

On the other hand, the constraint that each cell in an organism should share an identical genetic specification clearly causes complications from the point of view of design. For example, it is important that genes that are designed to function in one type of cell at one point in development not cause problems for different type of cell at a different point in development. Clearly, good design of the control logic that turns genes on and off is essential to the proper functioning of a multi-cellular organism.

In the next post in this series, I will turn to the construction of a concrete model for multi-cellular circuits that tries to capture, as simply as possible, the essence of what is happening in biology. 

Two Draft Books

September 13, 2007

If you’re interested in learning about statistical mechanics, graphical models, information theory, error-correcting codes, belief propagation, constraint satisfaction problems, or the connections between all those subjects, you should know about a couple of books that should be out soon, but for which you can already download extensive draft versions.

The first is Information, Physics, and Computation by Marc Mézard and Andrea Montanari.

The second is Modern Coding Theory by Tom Richardson and Ruediger Urbanke.

I also recommend the tutorial on Modern Coding Theory: the Statistical Mechanics and Computer Science Points of View, by Montanari and Urbanke, from the lectures they gave at the 2006 Les Houches summer school.

Using Illusions to Understand Vision

September 6, 2007


checkershrunk001.jpg

MIT professor Edward Adelson uses remarkable visual illusions to help explain the workings of the human visual system. One such illusion is shown above. Believe it or not, the square marked A is the same shade of gray on your computer screen as the square marked B.

checkerproofshrunk002.jpg

Here’s a “proof,” provided by Adelson. Two strips of constant grayness are aligned on top of the picture. You can see that the A square is the same shade as the strips near it and the B square is the same shade as the strips near it. Perhaps you still don’t believe that the strips are of constant grayness. In that case, put some paper up next to your computer screen to block off everything except for the strips; you’ll see it’s true.

Adelson explains the illusion here. The point is that our visual system is not meant to be used as a light meter; instead it is trying to solve the much more important problem (for our survival) of determining the true shade (that is, the color of the attached “paint”) of the objects it is looking at.

You can find more interesting illusions and demos from Adelson and other members of the perceptual science group at MIT, but don’t fail to also take a look at the illusions collected by the lab of Dale Purves at Duke. I particularly recommend the cube color contrast demo, where you can see that gray can be made to look yellow or blue.

Purves, together with R. Beau Lotto, wrote the book “Why We See What We Do: An Empirical Theory of Vision,” which collects these remarkable illusions and also expounds on a theory explaining them. The theory, to summarize it very briefly, says that what humans actually see is a “reflexive manifestation of the past rather than a logical analysis of the present.” I found myself quite uncomfortable with the theory for much the same reasons as given in Alan Gilchrist‘s review.

I also would prefer a more mathematical theory than Purves and Lotto give. It seems to me that we should in general try to explain illusions in terms of a Bayesian analysis of the most probable scene given the evidence provided by the light. My collaborators Bill Freeman and Yair Weiss (both former students of Adelson’s) have long worked along these lines; see for example Yair’s excellent Ph.D. thesis from 1998, explaining motion illusions.

In fact, I would like to go beyond a mathematical explanation of illusions to an algorithmic one. I would argue that a good computer vision system should “suffer” from the same illusions as a human, even though it has neither the same evolutionary history nor the same life history. To take an example of what I have in mind, the famous Necker cube illusion presumably arises naturally from the fact that the two interpretations are both local optima, with respect to probability, so a good artificial system should use an algorithm that settles into one interpretation, but then still be able to spontaneously switch to the other.

Programming Erlang

August 23, 2007

CPU’s stopped getting faster about five years ago. Since then, Intel and AMD started introducing multi-core processors, and the trend for the foreseeable future seems to be more and more processors per computer. That will be great, so long as the software industry is able to take advantage of the increased number of processors to deliver bigger and faster applications. However, that means parallel programming is in our future, and parallel programming is hard.

What’s more, the current dominant paradigm to take advantage of multiprocessor systems is threads, and threads appear to have very serious problems. Berkeley professor Edward Lee argues in no uncertain terms that the inherent non-determinism of threads programming dooms any software testing approach to failure, and will lead to buggy unreliable programs:

“A folk definition of insanity is to do the same thing over and over again and to expect the results to be different. By this definition, we in fact require that programmers of multithreaded systems be insane. Were they sane, they could not understand their programs.”

“These same computer vendors are advocating more multi-threaded programming, so that there is concurrency that can exploit the parallelism they would like to sell us. Intel, for example, has embarked on an active campaign to get leading computer science academic programs to put more emphasis on multi-threaded programming. If they are successful, and the next generation of programmers makes more intensive use of multithreading, then the next generation of computers will become nearly unusable.”

images-1.jpg

Having read Lee’s paper, I was very interested when I learned of Joe Armstrong’s new book “Programming Erlang: Software for a Concurrent World.” Armstrong actually identifies a problem with threads that is related to but slightly different than non-determinism: the fact that different threads can access the same memory. Why is shared memory a big problem? Briefly, because a thread or process that needs to access shared memory must lock it, and if it crashes while the memory is locked, you’re in trouble. For more, see this post.

So what is Erlang, and what is it like? Well, it’s open-source, and has been developed and used in telecom companies for a long time, so there already exist extensive libraries. It is a general-purpose language designed for concurrent, distributed, and fault-tolerant applications. It adheres strongly to the functional programming paradigm. It is a dynamic language, comes with a shell, and uses pattern-matching extensively. In Erlang, it is easy to spawn very large numbers of very-lightweight processes. Processes communicate using messages, and do not share any memory. All in all, it’s a very funky language.

So far, I have only read a small portion of Armstrong’s book (through chapter 8, where concurrent programs are introduced), but it is already clear to me that this is a significant piece of work, well worth the time spent with it. Starting from scratch, Armstrong works his way up to explaining complex distributed and fault-tolerant applications, such as a streaming media server, that require surprisingly little code. I plan to say more in future blog posts, as I progress through the book. In the meantime, here’s a list of beginner Erlang links.

(By the way, in his paper quoted above, Edward Lee discusses Erlang briefly, and says that he believes that its unfamiliar syntax will continue to block its wide-spread adoption. I’m not so sure–perhaps Erlang has simply not broken through until now because there just wasn’t much need for a language suited for multi-processor programming.)