Posts Tagged ‘belief propagation’

LDPC Decoders and PyCodes

October 15, 2007

I already wrote about Gallager’s LDPC error-correcting codes, but I didn’t explain very much about how they work, aside from pointing you to some good references. I want to use this post to say a little about their decoders, which use the belief propagation algorithm, and also to make you aware of some freely available LDPC software, in case you want to study or simulate these codes.

The decoders typically work by message-passing (although decoders based on linear programming have also been studied). One represents the codes using a “Tanner graph,” that looks like the figure shown below, which is actually a Tanner graph for the famous Hamming code.

ldpc001.jpg

The circles in the Tanner graph represent the bits that are transmitted. For this Hamming code, only 7 bits are transmitted in a block, but more practical codes will have hundreds or thousands of bits in a block.

The squares with a “+” inside of them represent the parity check constraints. Each parity check constraint enforces that the bits that it is connected to must sum to 0 modulo 2, or equivalently the sum of the bits is even. For example, in the code above, there are three parity check constraints, and the first parity check constraint forces the first, second, third, plus fifth bit to sum to an even number. Even in codes with large number of bits, each check will only be connected to a small number of bits; that’s what makes the codes “low density.”

The plain squares represent the information from the channel about each bit. For example, if a binary symmetric channel with a flip probability of f was used, and the first bit was received as a 0, the first square would be a function that said that the first bit had a probability of 1-f of being a 0, and a probability of f of being a 1.

The belief propagation decoders for LDPC codes (there are actually various variants) work by passing messages back and forth between the bit nodes and the parity check nodes in the factor graph. The bit nodes start by sending their beliefs about what values they have to their neighboring check nodes. I.e., a message would say something like “bit 1 believes it has a 90% chance of being a 0, and a 10% chance of being a 1.”

The check nodes look at their incoming messages, and send out appropriate messages in response. For example, if a check node is connected to four bits, and the first three bits think that they are a 0, a 0, and a 1, respectively, the fourth bit will get a message to be a 1 (so that the sum will be even), with a probability that depends on how strongly the three other bits believe that they have those values.

When the bits get messages back from the check nodes, they update their beliefs appropriately and iterate. Eventually, if we’re lucky, the bits have beliefs which are consistent (when they are thresholded to their most likely value) with all the parity checks, and the decoder can output a codeword. Again, you should check out the references in my previous post about LDPC codes for more mathematical details about the algorithms.

If you want to implement LDPC codes, you might want to use the PyCodes package developed by Dr. Emin Martinian while he was at Mitsubishi Electric Research Labs (MERL). PyCodes is written in C, and linked into Python, so you can call it within Python as an ordinary module, but it still runs very fast.

Emin began writing PyCodes when he was my intern, and continued when he became a full-time employee at MERL. It’s very well-written code that I use a lot; Emin was a professional software developer before he was a graduate student, and the software is professional-quality. PyCodes is free for non-commercial use; see the license for more details.

For other software for error-correcting codes, see “the Error Correcting Codes Page.”

Advertisement

Gallager’s LDPC error-correcting codes

September 28, 2007

Error-correcting codes are a technology that most people don’t think much about, if they even know they exist, but these codes work quietly in the background to enable such things as cell phones and other wireless technology, cable and satellite TV, and also the internet, including DSL, fiber-optic communications, and good old-fashioned dial-up modems. The modern communications revolution could not have begun without these codes.

So what’s the idea behind these codes? There’s a lot to say, and many textbooks have been written on the subject, so I’ll only give the briefest of introductions. [Some excellent textbooks I recommend include MacKay’s textbook which I’ve already reviewed, McEliece’s “Theory of Information and Coding”, and Lin and Costello’s “Error Control Coding.” See also this post for two forthcoming books available online.] EDIT: I’ve added some more information about LDPC decoders, with pointers to available software, in this post.

The basic idea is that we want to transmit some bits which represent some piece of text or picture or something. Unfortunately, when we transmit those bits, they need to travel through some channel (say a wire or through the air) and when they are received, the receiver only gets a noisy version of each bit. For example, each bit might be flipped independently from a 0 to a 1 or vice versa with some small probability (this is called the binary symmetric channel; many other channel models exist too).

To combat this noise, we send extra bits; so instead of sending say the 1000 bits that represent our message, we might send 2000, where the extra 1000 “check” bits have some known relationship to the original 1000. Both the transmitter and receiver agree on that relationship ahead of time; that is the “code.” Of course, all 2000 bits are now subject to the noise, so some of those extra bits could be flipped. Nevertheless, if the noise is small enough, the receiver can try to “decode” the original 1000 bits by finding the configuration of the 2000 bits which obeys all the constraints and is most probable.

In 1948, Claude Shannon proved a theorem that essentially said that if the noise in the channel was small enough, and if the number of extra bits that you were willing to send per original bit was large enough, that one could design very long codes, that if optimally decoded, would always remove all the noise and recover the transmitted message.

(By the way, it is this amazing property that codes can remove 100% of the noise that means that we can watch crystal-clear high-definition TV coming in over the airwaves, something I very much appreciate when I watch a football game these days. When advertisers talk about “digital this” or “digital that,” they really mean “error-corrected digital”.)

As an example of Shannon’s theorem, if one was willing to use one extra bit for every original bit, and the percentage of flipped bits in your binary symmetric channel was less than the Shannon limit of about 11%, his theorem tells you that codes exist that will reliably remove all the noise. However, Shannon’s proof was non-constructive; he didn’t tell us what these wonderful codes were. Shannon also proved a theorem that if the noise was higher than the “Shannon limit,” no codes exist that can reliably correct the noise.

So error-correcting coding theory deals with the problems of designing codes, and efficient encoders and decoders for those codes, that work as close to the Shannon limit as possible. Many theorists invented many interesting and useful codes and encoders and decoders over the years, but until the 1990’s, it still seemed a distant dream to most coding theorists that we would be able to find practical codes that performed near the Shannon limit.

What is very strange is that the best codes and decoders that were discovered in the 1990’s were actually a rediscovery of codes and decoders invented by Robert Gallager in the early 1960’s, for his Ph.D. thesis. Gallager’s thesis introduced “low density parity check” (LDPC) codes, and their decoding algorithm, the famous “belief propagation” decoding algorithm. His thesis also introduced many other important ideas in coding theory, including “density evolution,” simplified “bit-flipping decoders,” and analysis methods for optimal LDPC decoders. It is a truly remarkable work, that every aspiring coding theorist should read. Fortunately, it’s available online.

images.jpg

How is it possible that this work was forgotten? Well, in 1968, Robert Gallager wrote a magnificent textbook on information theory, called “Information Theory and Reliable Communication,” where he explained most of the information and coding theory known at the time, but neglected to talk about LDPC codes! I’m not sure why. Possibly he thought that he’d already covered the material in the 1963 monograph that was made from his thesis, or perhaps he didn’t think LDPC codes were practical at the time. In fact, the decoder was too complex for 1960’s technology, but Moore’s Law took care of that, although only now are LDPC codes widely replacing previously-developed codes in communications systems.

So the moral of the story is: if you write a ground-breaking Ph.D. thesis about a remarkable technology that is 30 years ahead of its time, please don’t forget to mention it in your classic textbook a few years later. Not a moral too many of us have to worry about, but still…

Two Draft Books

September 13, 2007

If you’re interested in learning about statistical mechanics, graphical models, information theory, error-correcting codes, belief propagation, constraint satisfaction problems, or the connections between all those subjects, you should know about a couple of books that should be out soon, but for which you can already download extensive draft versions.

The first is Information, Physics, and Computation by Marc Mézard and Andrea Montanari.

The second is Modern Coding Theory by Tom Richardson and Ruediger Urbanke.

I also recommend the tutorial on Modern Coding Theory: the Statistical Mechanics and Computer Science Points of View, by Montanari and Urbanke, from the lectures they gave at the 2006 Les Houches summer school.

Generalized Belief Propagation

August 5, 2007

In 2002, I gave a lecture at the Mathematical Sciences Research Institute on the work I did, together with Bill Freeman and Yair Weiss on Generalized Belief Propagation, and the correspondence between free energy approximations and message passing algorithms. The lecture is available as a streaming video, together with a pdf for the slides, here.

It’s worth mentioning that there are many other interesting research lectures available in MSRI’s video archive, and that the more recent ones are of higher production quality.

Here is our most recent and comprehensive paper on this subject, published in the July 2005 issue of IEEE Transactions on Information Theory, which gives many additional details compared to the lecture: MERL TR2004-040.

If that paper is too difficult, you should probably start with this earlier paper, which was more tutorial in nature: MERL TR2001-22.

If you’re looking for generalized belief propagation software, your best bet is this package written by Yair’s student Talya Meltzer.

P.S.: I realized I haven’t told those of you who don’t know anything about it what generalized belief propagation is. Well, one answer is to that is look at the above material! But here’s a little background text that I’ve copied from my research statement to explain why you might be interested:

Most of my current research involves the application of statistical methods to “inference” problems. Some important fields which are dominated by the issue of inference are computer vision, speech recognition, natural language processing, error-control coding and digital communications. Essentially, any time you are receiving a noisy signal, and need to infer what is really out there, you are dealing with an inference problem.

A productive way to deal with an inference problem is to formalize it as a problem of computing probabilities in a “graphical model.” Graphical models, which are referred to in various guises as “Markov random fields,” “Bayesian networks,” or “factor graphs,” provide a statistical framework to encapsulate our knowledge of a system and to infer from incomplete information.

Physicists who use the techniques of statistical mechanics to study the behavior of disordered magnetic spin systems are actually studying a mathematically equivalent problem to the inference problem studied by computer scientists or electrical engineers, but with different terminology, goals, and perspectives. My own research has focused on the surprising relationships between methods that are used in these communities, and on powerful new techniques and algorithms, such as Generalized Belief Propagation, that can be understood using those relationships.

I’ll tell you more in future posts; I promise.