Archive for the ‘Algorithms’ Category

Programming Erlang

August 23, 2007

CPU’s stopped getting faster about five years ago. Since then, Intel and AMD started introducing multi-core processors, and the trend for the foreseeable future seems to be more and more processors per computer. That will be great, so long as the software industry is able to take advantage of the increased number of processors to deliver bigger and faster applications. However, that means parallel programming is in our future, and parallel programming is hard.

What’s more, the current dominant paradigm to take advantage of multiprocessor systems is threads, and threads appear to have very serious problems. Berkeley professor Edward Lee argues in no uncertain terms that the inherent non-determinism of threads programming dooms any software testing approach to failure, and will lead to buggy unreliable programs:

“A folk definition of insanity is to do the same thing over and over again and to expect the results to be different. By this definition, we in fact require that programmers of multithreaded systems be insane. Were they sane, they could not understand their programs.”

“These same computer vendors are advocating more multi-threaded programming, so that there is concurrency that can exploit the parallelism they would like to sell us. Intel, for example, has embarked on an active campaign to get leading computer science academic programs to put more emphasis on multi-threaded programming. If they are successful, and the next generation of programmers makes more intensive use of multithreading, then the next generation of computers will become nearly unusable.”

images-1.jpg

Having read Lee’s paper, I was very interested when I learned of Joe Armstrong’s new book “Programming Erlang: Software for a Concurrent World.” Armstrong actually identifies a problem with threads that is related to but slightly different than non-determinism: the fact that different threads can access the same memory. Why is shared memory a big problem? Briefly, because a thread or process that needs to access shared memory must lock it, and if it crashes while the memory is locked, you’re in trouble. For more, see this post.

So what is Erlang, and what is it like? Well, it’s open-source, and has been developed and used in telecom companies for a long time, so there already exist extensive libraries. It is a general-purpose language designed for concurrent, distributed, and fault-tolerant applications. It adheres strongly to the functional programming paradigm. It is a dynamic language, comes with a shell, and uses pattern-matching extensively. In Erlang, it is easy to spawn very large numbers of very-lightweight processes. Processes communicate using messages, and do not share any memory. All in all, it’s a very funky language.

So far, I have only read a small portion of Armstrong’s book (through chapter 8, where concurrent programs are introduced), but it is already clear to me that this is a significant piece of work, well worth the time spent with it. Starting from scratch, Armstrong works his way up to explaining complex distributed and fault-tolerant applications, such as a streaming media server, that require surprisingly little code. I plan to say more in future blog posts, as I progress through the book. In the meantime, here’s a list of beginner Erlang links.

(By the way, in his paper quoted above, Edward Lee discusses Erlang briefly, and says that he believes that its unfamiliar syntax will continue to block its wide-spread adoption. I’m not so sure–perhaps Erlang has simply not broken through until now because there just wasn’t much need for a language suited for multi-processor programming.)

“On Intelligence” and Numenta

August 21, 2007


book.png

“On Intelligence,” written by Jeff Hawkins with Sandra Blakeslee, is a great read, full of provocative ideas about how our brains work.

Hawkins founded Palm Computing and Handspring, but he says that his true lifelong passion has been trying to understand our brains. In 2002, he founded the Redwood Neuroscience Institute (now the Redwood Center for Theoretical Neuroscience at Berkeley).

At Redwood, he developed a theory of the brain, that he expounds in this book. The book is a popular science book; you will not find any equations or explicit algorithms. It reads very smoothly, undoubtably due to the fact that Hawkins was helped in writing the book by Sandra Blakeslee, a science writer for the New York Times.

Hawkins argues that the cortex consists of modules that are all performing the same algorithm. The purpose of that algorithm is to learn to complete and predict the spatio-temporal patterns coming into a module from “lower” modules in the brain’s hierarchy, using feedback from “higher” modules.

I find this hypothesis for how the brain works extremely attractive, and “On Intelligence” argues for the hypothesis almost too well, in that the difficulties of converting the hypothesis into a concrete and useful algorithm will slip by the reader, as if by sleight of hand. The problem, of course, is that it is easy to use words to talk about how a brain might work; the hard part is making a machine do the same thing.

Peter Dayan, a leading computational neuroscientist, has written a much more detailed review. And for a 2006 lecture by Hawkins about his theory, you can watch this video.

After reading the book, I actually spent a few weeks trying to make the hypothesis into a concrete algorithm, without much success. But Hawkins is certainly putting his money where his mouth is: he has founded a company, called Numenta, and Numenta has released software (the “Numenta Platform for Intelligent Computing” or “NuPIC”) that implements part of his theory. NuPIC appears to be based on software originally written by Numenta co-founder and Stanford graduate student Dileep George, but transformed by a team of software developers into a professional-quality product.

However, it is very disappointing that the NUPIC software does not include feedback in its hierarchies, nor does it let you learn and infer temporal patterns (only spatial ones). To be honest, I am not so surprised, because it was these elements (that are obviously central in Hawkins’ theory) that I found it so difficult to integrate into a real algorithm.

Numenta promises that future versions of NuPIC will fill these gaps. I sincerely wish them the best of luck! I should say that I think it is wonderful that an impatient guy like Hawkins pushes the field, using a different approach than the standard academic one.

So in summary, you should definitely read Hawkins’ book, and if you’re intrigued, you should by all means check out the free Numenta software. Whether this line of research will lead to anything significant though, I’m not really sure…

Structure and Interpretation of Computer Programs: Videos and Textbook

August 17, 2007


gjspicture.jpg

This set of videos, of Gerald Jay Sussman and Hal Abelson teaching their course on the “Structure and Interpretation of Computer Programs” in July 1986 for Hewlett-Packard employees, is one of the treasures of the internet. The clothes are out of style, but the material presented is still completely relevant.

The SICP web-site has lots of useful additional information, including the complete text of the 2nd edition of their classic textbook.

I am not going to try to write a better review of the Abelon’s and Sussman’s textbook than the Amazon review written by Peter Norvig. Norvig is the head of Research at Google, and a co-author of Artificial Intelligence: a Modern Approach, the leading AI textbook.

But I will add one comment: what I love most about these lectures is the point (see the picture above of Sussman in his wizard hat) that a computer programmer is like a wizard–he creates something real out of ideas. Computers let us be wizards; and I believe we have only scratched the surface of the possible “spells” that we can learn to cast.

If you want a nice implementation of Scheme, the beautiful dialect of Lisp that was invented by Sussman with his student Guy Steele, and used in this textbook, I highly recommend DrScheme.

Algorithms for Physics

August 14, 2007


smac_cover.jpg

Much of my own work is at the intersection of statistical mechanics and algorithms, in particular understanding and developing new algorithms using ideas originating in statistical mechanics. Werner Krauth also works at the intersection of the two fields, but coming from a very different angle: he is a leading expert on the development and application of algorithms to compute and understand the properties of physical systems.

In his recently published book, “Statistical Mechanics: Algorithms and Computations,” targeted at advanced undergraduates or graduate students, he covers a very wide range of interesting algorithms. To give you an idea of the coverage, I’ll just list the chapters: “Monte Carlo methods,” “Hard disks and spheres,” “Density matrices and path integrals,” “Bosons,” “Order and disorder in spin systems, “Entropic forces,”and “Dynamic Monte Carlo methods.”

Krauth’s presentation is leavened by his humor, and he often uses the results obtained using his algorithms to make surprising points about physics that would otherwise be hard to convey.

I am often asked by computer science or electrical engineering scientists and researchers for good introductions to physics, and particularly statistical mechanics, and I’m now happy to be able to recommend this book.

images.jpg

Specifying physics explicitly in terms of algorithms, as Krauth does, gives a very concrete basis for understanding concepts that can otherwise seem terribly abstract. Gerald Sussman and Jack Wisdom make this point in the preface of their already classic book (which is available online) “Structure and Interpretation of Classical Mechanics”:

“Computational algorithms are used to communicate precisely some of the methods used in the analysis of dynamical phenomena. Expressing the methods of variational mechanics in a computer language forces them to be unambiguous and computationally effective. Computation requires us to be precise about the representation of mechanical and geometric notions as computational objects and permits us to represent explicitly the algorithms for manipulating these objects. Also, once formalized as a procedure, a mathematical idea becomes a tool that can be used directly to compute results.”

But while Sussman and Wisdom’s book focuses in great detail on classical mechanics, Krauth’s book covers more broadly subjects in classical mechanics, statistical mechanics, quantum mechanics, and even quantum statistical mechanics. Another difference is that Sussman and Wisdom specify their algorithms in executable Scheme code, while Krauth uses pseudo-code. Of course, both choices have their advantages, just as both of these books are worth your time.

Lectures on Disordered Systems

August 12, 2007

Many physicists study “disordered systems,” such as materials like glasses where the molecules making up the material are arranged randomly in space, in contrast to crystals, where all the particles are arranged in beautiful repeating patterns.

The symmetries of crystals make them much easier to analyze than glasses, and new theoretical methods had to be invented before physicists could make any headway in computing the properties of disordered systems. Those methods have turned out to be closely connected to approaches, such as the “belief propagation” algorithm, that are widely used in computer science, artificial intelligence, and communications theory, with the result that physicists and computer scientists today regularly exchange new ideas and results across their disciplines.

Returning to the physics of disordered systems, physicists began working on the problem in the 1970’s by considering the problem of disordered magnets (also called “spin glasses”). My Ph.D. thesis advisor, Philip W. Anderson summarized the history as follows:

“In 1975 S.F. (now Sir Sam) Edwards and I wrote down the “replica” theory of the phenomenon I had earlier named “spin glass”, followed up in ’77 by a paper of D.J. Thouless, my student Richard Palmer, and myself. A brilliant further breakthrough by G. Toulouse and G. Parisi led to a full solution of the problem, which turned out to entail a new form of statistical mechanics of wide applicability in fields as far apart as computer science, protein folding, neural networks, and evolutionary modelling, to all of which directions my students and/or I contributed.”

csss_us.jpg

In 1992, I presented five lectures on “Quenched Disorder: Understanding Glasses Using a Variational Principle and the Replica Method” at a Santa Fe Institute summer school on complex systems. The lectures were published in a book edited by Lynn Nadel and Daniel Stein, but that book is very hard to find, and I think that these lectures are still relevant, so I’m posting them here. As I say in the introduction, “I will discuss technical subjects, but I will try my best to introduce all the technical material in as gentle and comprehensible a way as possible, assuming no previous exposure to the subject of these lectures at all.”

The first lecture is an introduction to the basics of statistical mechanics. It introduces magnetic systems and particle systems, and describes how to exactly solve non-interacting magnetic systems and particle systems where the particles are connected by springs.

The second lecture introduces the idea of variational approaches. Roughly speaking, the idea of a variational approach is to construct an approximate but exactly soluble system that is as close as possible to the system you are interested in. The grandly titled “Gaussian variational method” is the variational method that tries to find the set of particles and springs that best approximates an interacting particle system. I describe in this second lecture how the Gaussian variational method can be applied to heteropolymers like proteins.

The next three lectures cover the replica method, and combine it with the variational approach. The replica method is highly intricate mathematically. I learned it at the feet of the masters during my two years at the Ecole Normale Superieure (ENS) in Paris. In particular, I was lucky to work with Jean-Philippe Bouchaud, Antoine Georges, and Marc Mezard, who taught me what I knew. I thought it unfortunate that there wasn’t a written tutorial on the replica method, so the result were these lectures. Marc told me that for years afterwards they were given to new students of the replica method at the ENS.

Nowadays, the replica method is a little less popular than it used to be, mostly because it is all about computing averages of quantities over many samples of systems that are disordered according to some probability distribution. While those averages are very useful in physics, they are somewhat less important in computer science, where you usually just want an algorithm to deal with the one disordered system in front of you, rather than an average over all the possible disordered systems.

Santa Fe Institute Lectures

Multicellular Logic Circuits, Part I: Evolution

August 7, 2007

If we want to construct artificial machines that rival the capabilities of biological organisms, we should try to understand the principles by which complex natural “machines” such as plants and animals are created.

It is generally agreed, at least by scientists, that all natural organisms have been “designed” by the completely blind and random process of evolution. Through evolution, a population of organisms tends to become progressively better adapted to its environment via the mutation of genomes of individuals in the population, and the selection and more rapid reproduction of the fittest organisms in that population.

51tk0pychxl_aa240_.jpg

Harvard professor Martin Nowak a has written a lovely and elegant book describing the mathematics of evolutionary dynamics, using the ideas of evolutionary game theory; here is a video of Nowak describing evolutionary game theory at Harvard in 2004.

My own interest is not so much in analyzing evolution, but in exploiting it. If we understand evolution so well, shouldn’t we be able to use it to design useful machines?

180px-img013biglittledogfx_wb.jpg

Of course, humans have already for many centuries exploited evolution, using artificial selection to breed domesticated animals or cultivate useful plants.

But I am looking for something else: the design of artificial machines through artificial selection. Although it has never been a mainstream idea, computer scientists have pursued such dreams since the 1950’s. When I was in graduate school in the 1980’s, I loved reading John Holland’s seminal 1975 book “Adaptation in Natural and Artificial Systems.”

adapt.jpg

Holland and his students were deeply influential in popularizing the whole field of genetic algorithms.

Another important figure in the field is John Koza, who has advocated for many years one of the most important variants of genetic algorithms, which he calls “genetic programming.” In genetic programming, computer programs, typically written in Lisp, are evolved through a process that involves mutating the programs by altering or swapping branches of the computation tree representing the program.

Genetic programming and genetic algorithms more generally, have had considerable success creating interesting and useful systems and programs. Nevertheless, I think it is fair to say that these ideas are still considered “fringe” ideas in the scientific and engineering community, and they have not widely replaced more conventional software and hardware design strategies.

So what might be missing? I will begin discussing that in Part II.


Generalized Belief Propagation

August 5, 2007

In 2002, I gave a lecture at the Mathematical Sciences Research Institute on the work I did, together with Bill Freeman and Yair Weiss on Generalized Belief Propagation, and the correspondence between free energy approximations and message passing algorithms. The lecture is available as a streaming video, together with a pdf for the slides, here.

It’s worth mentioning that there are many other interesting research lectures available in MSRI’s video archive, and that the more recent ones are of higher production quality.

Here is our most recent and comprehensive paper on this subject, published in the July 2005 issue of IEEE Transactions on Information Theory, which gives many additional details compared to the lecture: MERL TR2004-040.

If that paper is too difficult, you should probably start with this earlier paper, which was more tutorial in nature: MERL TR2001-22.

If you’re looking for generalized belief propagation software, your best bet is this package written by Yair’s student Talya Meltzer.

P.S.: I realized I haven’t told those of you who don’t know anything about it what generalized belief propagation is. Well, one answer is to that is look at the above material! But here’s a little background text that I’ve copied from my research statement to explain why you might be interested:

Most of my current research involves the application of statistical methods to “inference” problems. Some important fields which are dominated by the issue of inference are computer vision, speech recognition, natural language processing, error-control coding and digital communications. Essentially, any time you are receiving a noisy signal, and need to infer what is really out there, you are dealing with an inference problem.

A productive way to deal with an inference problem is to formalize it as a problem of computing probabilities in a “graphical model.” Graphical models, which are referred to in various guises as “Markov random fields,” “Bayesian networks,” or “factor graphs,” provide a statistical framework to encapsulate our knowledge of a system and to infer from incomplete information.

Physicists who use the techniques of statistical mechanics to study the behavior of disordered magnetic spin systems are actually studying a mathematically equivalent problem to the inference problem studied by computer scientists or electrical engineers, but with different terminology, goals, and perspectives. My own research has focused on the surprising relationships between methods that are used in these communities, and on powerful new techniques and algorithms, such as Generalized Belief Propagation, that can be understood using those relationships.

I’ll tell you more in future posts; I promise.

SimPy and Discrete-Event Simulation

August 3, 2007

I’ve been using the SimPy discrete-event simulation package lately, and I really like it.

As the SimPy home page says, “SimPy (= Simulation in Python) is an object-oriented, process-based discrete-event simulation language based on standard Python.” What does that mean? Well, first of all, it is a Python module, and you import and then use it like any other Python module.

(If you haven’t used Python, get yourself over to www.python.org and download it now! Or if you use Mac OS X or Linux, you already have it. Python is a powerful, high-level general-purpose language, and comes with excellent documentation.)

“Discrete-event simulations” are something a good nerd will often need to write. They are simulations where things happen at discrete times. There are three levels of sophistication in handling such simulations. Level zero is to just step forward in time by small increments, checking every time step for each possible event. Not too bright, because you waste a lot of computation on time steps when nothing happens. Level one is the “event-oriented” approach, where you keep events in some “agenda,” and you process events in the agenda one at a time, with each event in the agenda possibly creating new events in the future that are then added to the agenda.

Level one is as far as most people get, but it’s not where you should stop, because writing an event-oriented simulation is painful and error-prone. The right thing to do is to go to the next level of sophistication, the “process-oriented” approach.

In the process-oriented approach, you create special objects, called processes, which are like “living” objects. Processes have a special event method which functions as an event loop. To program the events in your simulation, you need to write one or more process event methods which describes how each process object reacts to the possible events in the simulation. It turns out that these process event methods are very natural and easy to write, because they properly correspond to how we think about what is happening in the simulation.

So how does it work? Well, SimPy sets up and handles the event agenda underlying the system, so you don’t have to do it yourself. When you call the SimPy “simulate()” method, it begins stepping through the events in the event agenda, calling the appropriate process event methods defined in your processes in the correct order. The Python feature that makes the whole thing work is the Python “yield” statement, which is like a return, except that the next time a function with a “yield” is called, it picks up after the yield rather than at the beginning of the function. All the process event methods you define when using SimPy will use yields to give back control to the SimPy run-time system.

Anyways, Professor Norm Matloff from UC Davis has written some excellent tutorials, or you can use the documentation that comes with SimPy.