Archive for the ‘Education’ Category

Using Unambiguous Notation

October 19, 2007

I’ve mentioned their book the “Structure and Interpretation of Classical Mechanics” a couple times already; today I’d like to recommend that you read a wonderful short paper by Gerald Jay Sussman and Jack Wisdom, called “The Role of Programming in the Formulation of Ideas,” which helps explain why understanding physics and the other mathematical sciences can sometimes be so difficult. The basic point is that our notation is often an absolute mess, caused by the fact that we use equations like we use natural language, in a highly ambiguous way:

“It is necessary to present the science in the language of mathematics. Unfortunately, when we teach science we use the language of mathematics in the same way that we use our natural language. We depend upon a vast amount of shared knowledge and culture, and we only sketch an idea using mathematical idioms.”

The solution proposed is to develop notation that can be understood by computers, which do not tolerate ambiguity:

“One way to become aware of the precision required to unambiguously communicate a mathematical idea is to program it for a computer. Rather than using canned programs purely as an aid to visualization or numerical computation, we use computer programming in a functional style to encourage clear thinking. Programming forces one to be precise and formal, without being excessively rigorous. The computer does not tolerate vague descriptions or incomplete constructions. Thus the act of programming makes one keenly aware of one’s errors of reasoning or unsupported conclusions.”

Sussman and Wisdom then focus on one highly illuminating example, the Lagrange equations. These equations can be derived from the fundamental principle of least action. This principle tells you that if you have a classical system that begins in a configuration C1 at time t1 and arrives at a configuration C2 at time t2, the path it traces out between t1 and t2 will be the one that is consistent with the initial and final configurations and minimizes the integral over time of the Lagrangian for the system, where the Lagrangian is given by the kinetic energy minus the potential energy.

Physics textbooks tell us that if we apply the calculus of variations to the integral of the Lagrangian (called the “action”) we can derive that the true path satisfies the Lagrange equations, which are traditionally written as:


Here L is the Lagrangian, t is the time, annd qi are the coordinates of the system.

These equations (and many others like them) have confused and bewildered generations of physics students. What is the problem? Well, there are all sorts of fundamental problems in interpreting these equations, detailed in Sussman and Wisdom’s paper. As they point out, basic assumptions like whether a coordinate and its derivative are independent variables are not consistent within the same equation. And shouldn’t this equation refer to the path somewhere, since the Lagrange equations are only correct for the true path? I’ll let you read Sussman and Wisdom’s full laundry list of problems yourself. But let’s turn to the psychological effects of using such equations:

“Though such statements (and derivations that depend upon them) seem very strange to students, they are told that if they think about them harder they will understand. So the student must either come to the conclusion that he/she is dumb and just accepts it, or that the derivation is correct, with some appropriate internal rationalization. Students often learn to carry out these manipulations without really understanding what they are doing.”

Is this true? I believe it certainly is (my wife agrees: she gave up on mathematics, even though she always received excellent grades, because she never felt she truly understood). The students who learn to successfully rationalize such ambiguous equations, and forget about the equations that they can’t understand at all, are the ones who might go on to be successful physicists. Here’s an example, from the review of Sussman and Wisdom’s book by Piet Hut, a very well-regarded physicist who is now a professor at the Institute for Advanced Studies:

“… I went through the library in search of books on the variational principle in classical mechanics. I found several heavy tomes, borrowed them all, and started on the one that looked most attractive. Alas, it didn’t take long for me to realize that there was quite a bit of hand-waving involved. There was no clear definition of the procedure used for computing path integrals, let alone for the operations of differentiating them in various ways, by using partial derivatives and/or using an ordinary derivative along a particular path. And when and why the end points of the various paths had to be considered fixed or open to variation also was unclear, contributing to the overall confusion.

Working through the canned exercises was not very difficult, and from an instrumental point of view, my book was quite clear, as long as the reader would stick to simple examples. But the ambiguity of the presentation frustrated me, and I started scanning through other, even more detailed books. Alas, nowhere did I find the clarity that I desired, and after a few months I simply gave up. Like generations of students before me, I reluctantly accepted the dictum that ‘you should not try to understand quantum mechanics, since that will lead you astray for doing physics’, and going even further, I also gave up trying to really understand classical mechanics! Psychological defense mechanisms turned my bitter sense of disappointment into a dull sense of disenchantment.”

Sussman and Wisdom do show how the ambiguous conventional notation can be replaced with unambiguous notation that can even be used to program a computer. Because it’s new, it will feel alien at first; the Lagrange equations look like this:


It’s worth learning Sussman and Wisdom’s notation for the clarity it ultimately provides. It’s even more important to learn to always strive for clear understanding.

One final point: although mathematicians do often use notation that is superior to physicists’, they shouldn’t feel too smug; Sussman and Wisdom had similar things to say about differential geometry in this paper.

iTunes U

October 11, 2007


I bought one of the new iPod Nanos last week. One of the main reasons that I wanted one was to be able to listen to and watch the video lectures available at iTunes U while walking or traveling. There’s quite a bit of interesting material available for free download. For example, I’ve been watching lectures from the weekly colloquium of the Stanford Computer Systems Laboratory.

There are also MIT courses about graduate-level Digital Communications taught by Edison and Shannon medalist David Forney, or introductory Biology by Eric Lander and Robert Weinberg, or Mathematical Methods for Engineers (look under Mathematics) by Gilbert Strang, all in video. The podcast section of iTunes Store also has many videos of entertaining 20-minute talks from the TED conferences held over the last few years.

Of course there’s all sorts of other material available; I’m just pointing out some more academic videos.

It’s somewhat annoying how difficult it is to find specifically video material; there’s much more material available only as audio, but the video material is not really specifically singled out in any way. Also, you should know that you can also watch all the material directly on your computer, but if you want to do that, you should also visit this MIT OpenCourseWare page or this Berkeley webcast page. Both of these pages have a lot of additional video courses available in formats other than the MP4 format used by the iPod.

The iPod nano is very small and light. It’s my first iPod, so I was pleasantly surprised to learn that the earbuds are actually pretty comfortable. The screen is really sharp, but it’s also really tiny, so while it’s OK to look at occasionally to see what’s going on, but I wouldn’t want to watch a long movie on it. The iPod Touch will definitely be a better option for that.

The Teaching Company

August 24, 2007

My wife and I are both big fans of the college courses produced by the Teaching Company. The courses cover a wide variety of subjects, and come in a range of video and/or audio formats.

I personally find that the audio format usually works somewhat better. The lecturers are very good, but watching a professor lecture on TV is inevitably somewhat dull. On the other hand, when I’m driving or riding in a car or train, I find that an audio lecture fills an ideal amount of mental bandwidth. (Every so often, the lecture gets complicated at the same time as the driving, but one can always rewind).

My favorite course so far was Robert Greenberg’s course on How to Listen to and Understand Great Music. Music courses are naturally a great fit for an audio course!

Otherwise, there are unfortunately not that many science and mathematics courses that go beyond the beginning undergraduate level, although I did enjoy Stephen Nowicki’s course on Biology. If you are interested in history or philosophy, or other subjects in the humanities (like my wife, who is a historian) there are many more interesting options.

The Teaching Company has an unusual pricing policy. The courses are very expensive, except when they go on sale, when they cost roughly one quarter the normal price and are very reasonable. All the courses go on sale on a regular rotation, so unless you are in a tremendous hurry, you should definitely wait until the course you are interested in goes on sale. A lot of the courses are available at libraries too, so you can borrow one to see if you like it first.

Here’s another blog post endorsing the Teaching Company, with some reader comments on their favorite courses.


August 22, 2007

As I mentioned in my previous post about Lenny Guerente’s book “Ageless Quest,” aging research has, within the last 15 years, gone from being a scientific backwater to a mainstream field of scientific research, with new discoveries now regularly featured on the cover of Nature or Science (as in the Nature issue from June 2007 below.)


Although we now are capable of manipulating the aging process, including significantly extending the lifespan of many laboratory animals, it is still a frustrating fact that there is no consensus about the ultimate cause or causes of aging.

One viewpoint, which is probably only held by a significant minority of scientists in the field, is that the aging process is strongly connected to mitochondria, which are the power plants or batteries of our cell, converting nutrients into useful packets of energy in the form of ATP. We’re used to the idea that electronic equipment fails when the batteries go dead, so it’s not such a stretch to take a close look at the mitochondria.

What’s more, mitochondria produce much of the “pollution” in the cell in the form of the free radicals that are a by-product of the oxidative phosphorylation process (the process that turns nutrients into energy). Those free-radicals can damage proteins or DNA, particularly the mitochondrial DNA (this is special DNA, inherited from the mother, that resides in the mitochondria rather than the nucleus) that codes for a few essential mitochondrial proteins.

So one theory says that there is a kind of vicious circle, whereby old mitochondria start emitting more free radicals, which further damages the mitochondria, until the mitochondria are so damaged that they don’t produce sufficient energy and start damaging the rest of the cell. Right now, the consensus view on whether the experimental facts really fit that theory is “Maybe.”


If you want to learn more about mitochondria, I highly recommend “Power, Sex, Suicide: Mitochondria and the Meaning of Life (you’ve just got to love that title), by Nick Lane. Lane’s book is popular science, but it’s a very deep book, and actually proposes theories, including theories of aging, that you won’t see elsewhere in the literature. It’s not an easy book to read, but it’s very worthwhile.

Alternatively, you might enjoy this video of Douglas Wallace lecturing on the role of mitochondria in diseases and aging. Wallace, a professor from UC Irvine, delivers highly entertaining and persuasive lectures.

Structure and Interpretation of Computer Programs: Videos and Textbook

August 17, 2007


This set of videos, of Gerald Jay Sussman and Hal Abelson teaching their course on the “Structure and Interpretation of Computer Programs” in July 1986 for Hewlett-Packard employees, is one of the treasures of the internet. The clothes are out of style, but the material presented is still completely relevant.

The SICP web-site has lots of useful additional information, including the complete text of the 2nd edition of their classic textbook.

I am not going to try to write a better review of the Abelon’s and Sussman’s textbook than the Amazon review written by Peter Norvig. Norvig is the head of Research at Google, and a co-author of Artificial Intelligence: a Modern Approach, the leading AI textbook.

But I will add one comment: what I love most about these lectures is the point (see the picture above of Sussman in his wizard hat) that a computer programmer is like a wizard–he creates something real out of ideas. Computers let us be wizards; and I believe we have only scratched the surface of the possible “spells” that we can learn to cast.

If you want a nice implementation of Scheme, the beautiful dialect of Lisp that was invented by Sussman with his student Guy Steele, and used in this textbook, I highly recommend DrScheme.

Algorithms for Physics

August 14, 2007


Much of my own work is at the intersection of statistical mechanics and algorithms, in particular understanding and developing new algorithms using ideas originating in statistical mechanics. Werner Krauth also works at the intersection of the two fields, but coming from a very different angle: he is a leading expert on the development and application of algorithms to compute and understand the properties of physical systems.

In his recently published book, “Statistical Mechanics: Algorithms and Computations,” targeted at advanced undergraduates or graduate students, he covers a very wide range of interesting algorithms. To give you an idea of the coverage, I’ll just list the chapters: “Monte Carlo methods,” “Hard disks and spheres,” “Density matrices and path integrals,” “Bosons,” “Order and disorder in spin systems, “Entropic forces,”and “Dynamic Monte Carlo methods.”

Krauth’s presentation is leavened by his humor, and he often uses the results obtained using his algorithms to make surprising points about physics that would otherwise be hard to convey.

I am often asked by computer science or electrical engineering scientists and researchers for good introductions to physics, and particularly statistical mechanics, and I’m now happy to be able to recommend this book.


Specifying physics explicitly in terms of algorithms, as Krauth does, gives a very concrete basis for understanding concepts that can otherwise seem terribly abstract. Gerald Sussman and Jack Wisdom make this point in the preface of their already classic book (which is available online) “Structure and Interpretation of Classical Mechanics”:

“Computational algorithms are used to communicate precisely some of the methods used in the analysis of dynamical phenomena. Expressing the methods of variational mechanics in a computer language forces them to be unambiguous and computationally effective. Computation requires us to be precise about the representation of mechanical and geometric notions as computational objects and permits us to represent explicitly the algorithms for manipulating these objects. Also, once formalized as a procedure, a mathematical idea becomes a tool that can be used directly to compute results.”

But while Sussman and Wisdom’s book focuses in great detail on classical mechanics, Krauth’s book covers more broadly subjects in classical mechanics, statistical mechanics, quantum mechanics, and even quantum statistical mechanics. Another difference is that Sussman and Wisdom specify their algorithms in executable Scheme code, while Krauth uses pseudo-code. Of course, both choices have their advantages, just as both of these books are worth your time.

Gateway High School, 1981

August 13, 2007


Paul Graham has a great web-site full of thought-provoking essays. Some of them were collected into his fascinating book “Hackers and Painters.”

One of those essays, “Gateway High School, 1981″ talks about our high school, which I in fact graduated from in 1981. The picture above is from that essay; it’s the Chess Club picture from the 1981 Gateway High School yearbook. I’m the one at the bottom left seated at the chess board; Paul is standing at the top left.

If you read Paul’s description, or read his famous essay about “Why Nerds are Unpopular,” my high school sounds very grim, but I really didn’t experience it the same way. I honestly had a pretty good time in high school; maybe that’s why I’m the one who’s smiling in the picture above. I even loved rooting for the football teams (both the Steelers and the Gators), and trace my football fanaticism to those years, when the Pittsburgh Steelers were a dynasty.

When I arrived at Harvard in 1981, I was at first a little intimidated by some of the other students who went to better high schools, but I soon realized that I was easily well-enough prepared. And I was hungry to learn.

Nowadays, I’m most concerned that high school students are so driven to perform in order to be accepted into the elite universities, and that many high school teachers drive the students so hard with homework, that by the time they get to college, the students can have had the natural desire to learn drummed out of them.

Lectures on Disordered Systems

August 12, 2007

Many physicists study “disordered systems,” such as materials like glasses where the molecules making up the material are arranged randomly in space, in contrast to crystals, where all the particles are arranged in beautiful repeating patterns.

The symmetries of crystals make them much easier to analyze than glasses, and new theoretical methods had to be invented before physicists could make any headway in computing the properties of disordered systems. Those methods have turned out to be closely connected to approaches, such as the “belief propagation” algorithm, that are widely used in computer science, artificial intelligence, and communications theory, with the result that physicists and computer scientists today regularly exchange new ideas and results across their disciplines.

Returning to the physics of disordered systems, physicists began working on the problem in the 1970′s by considering the problem of disordered magnets (also called “spin glasses”). My Ph.D. thesis advisor, Philip W. Anderson summarized the history as follows:

“In 1975 S.F. (now Sir Sam) Edwards and I wrote down the “replica” theory of the phenomenon I had earlier named “spin glass”, followed up in ’77 by a paper of D.J. Thouless, my student Richard Palmer, and myself. A brilliant further breakthrough by G. Toulouse and G. Parisi led to a full solution of the problem, which turned out to entail a new form of statistical mechanics of wide applicability in fields as far apart as computer science, protein folding, neural networks, and evolutionary modelling, to all of which directions my students and/or I contributed.”


In 1992, I presented five lectures on “Quenched Disorder: Understanding Glasses Using a Variational Principle and the Replica Method” at a Santa Fe Institute summer school on complex systems. The lectures were published in a book edited by Lynn Nadel and Daniel Stein, but that book is very hard to find, and I think that these lectures are still relevant, so I’m posting them here. As I say in the introduction, “I will discuss technical subjects, but I will try my best to introduce all the technical material in as gentle and comprehensible a way as possible, assuming no previous exposure to the subject of these lectures at all.”

The first lecture is an introduction to the basics of statistical mechanics. It introduces magnetic systems and particle systems, and describes how to exactly solve non-interacting magnetic systems and particle systems where the particles are connected by springs.

The second lecture introduces the idea of variational approaches. Roughly speaking, the idea of a variational approach is to construct an approximate but exactly soluble system that is as close as possible to the system you are interested in. The grandly titled “Gaussian variational method” is the variational method that tries to find the set of particles and springs that best approximates an interacting particle system. I describe in this second lecture how the Gaussian variational method can be applied to heteropolymers like proteins.

The next three lectures cover the replica method, and combine it with the variational approach. The replica method is highly intricate mathematically. I learned it at the feet of the masters during my two years at the Ecole Normale Superieure (ENS) in Paris. In particular, I was lucky to work with Jean-Philippe Bouchaud, Antoine Georges, and Marc Mezard, who taught me what I knew. I thought it unfortunate that there wasn’t a written tutorial on the replica method, so the result were these lectures. Marc told me that for years afterwards they were given to new students of the replica method at the ENS.

Nowadays, the replica method is a little less popular than it used to be, mostly because it is all about computing averages of quantities over many samples of systems that are disordered according to some probability distribution. While those averages are very useful in physics, they are somewhat less important in computer science, where you usually just want an algorithm to deal with the one disordered system in front of you, rather than an average over all the possible disordered systems.

Santa Fe Institute Lectures

Generalized Belief Propagation

August 5, 2007

In 2002, I gave a lecture at the Mathematical Sciences Research Institute on the work I did, together with Bill Freeman and Yair Weiss on Generalized Belief Propagation, and the correspondence between free energy approximations and message passing algorithms. The lecture is available as a streaming video, together with a pdf for the slides, here.

It’s worth mentioning that there are many other interesting research lectures available in MSRI’s video archive, and that the more recent ones are of higher production quality.

Here is our most recent and comprehensive paper on this subject, published in the July 2005 issue of IEEE Transactions on Information Theory, which gives many additional details compared to the lecture: MERL TR2004-040.

If that paper is too difficult, you should probably start with this earlier paper, which was more tutorial in nature: MERL TR2001-22.

If you’re looking for generalized belief propagation software, your best bet is this package written by Yair’s student Talya Meltzer.

P.S.: I realized I haven’t told those of you who don’t know anything about it what generalized belief propagation is. Well, one answer is to that is look at the above material! But here’s a little background text that I’ve copied from my research statement to explain why you might be interested:

Most of my current research involves the application of statistical methods to “inference” problems. Some important fields which are dominated by the issue of inference are computer vision, speech recognition, natural language processing, error-control coding and digital communications. Essentially, any time you are receiving a noisy signal, and need to infer what is really out there, you are dealing with an inference problem.

A productive way to deal with an inference problem is to formalize it as a problem of computing probabilities in a “graphical model.” Graphical models, which are referred to in various guises as “Markov random fields,” “Bayesian networks,” or “factor graphs,” provide a statistical framework to encapsulate our knowledge of a system and to infer from incomplete information.

Physicists who use the techniques of statistical mechanics to study the behavior of disordered magnetic spin systems are actually studying a mathematically equivalent problem to the inference problem studied by computer scientists or electrical engineers, but with different terminology, goals, and perspectives. My own research has focused on the surprising relationships between methods that are used in these communities, and on powerful new techniques and algorithms, such as Generalized Belief Propagation, that can be understood using those relationships.

I’ll tell you more in future posts; I promise.


Get every new post delivered to your Inbox.