Researchers make improvements on 45-year-old coding theory problem
By Katie Carr, Coordinated Science Laboratory
December 10, 2013
- ISE Assistant Professor Negar Kiyavash and ECE graduate student Daniel Cullina have made improvements on a 45-year-old coding theory problem.
- Their work focuses on the relationship between code size and error correcting capability, a fundamental problem in coding theory.
- Applications for this research are diverse, including DNA sequencing and network forensics.
In today’s fast moving, tech savvy world, new advances in research and technology are common. However, it’s not every day that someone makes advances on a problem that’s been around for over 45 years.
Recently, Negar Kiyavash
, an assistant professor of industrial and enterprise systems engineering, and ECE graduate student Daniel Cullina have done just that by advancing coding theory established by Russian scientist Vladimir Levenshtein in the 1960s.
"A lot of the classic problems in coding theory are comparably old and are now extremely well understood. People know exactly what to do in practice,” Cullina said. “However, progress on this problem has been relatively slow since Levenshtein’s time.”
In the early 1960s, Levenshtein found an optimal code for correcting a single deletion, as well as upper and lower bounds on the size of the best codes capable of correcting multiple deletions. His work still guides modern thinking about errors in communication channels, many of which distort the data messages transmitted by devices – such as computers and wireless phones – by erasing, substituting, inserting or deleting message symbols.
Insertions and deletions pose a significantly bigger challenge than substitutions and erasures, according to Cullina. Consider a message of ABCDE. When a symbol is erased, the recipient of a message sees the empty space instead of the original symbol (ABC_E), but the deletion of a symbol leaves no trace (ABCE).
Another source of difficulty is that insertions and deletions change the alignment of symbols throughout the entire message. ABCDE can be transformed into DABCE with one deletion and one insertion, but those two strings of symbols differ in four out of five positions. Therefore, even a small number of insertions and deletions can create a large number of substitution errors, making it hard to apply substitution error correcting techniques.
The Illinois researchers have been working on the design of deletion correcting codes and discovered that they were able to tighten Levenshtein’s upper bound on code size. Their research has focused on how correctly and efficiently a person can decode a message with insertions and deletions and how many errors is too many before it’s impossible to decode a message.
Daniel Francis Cullina
An error correcting code is a set of messages that can always be distinguished from each other after undergoing some number of errors. A fundamental problem in coding theory is finding the relationship between code size and error correcting capability.
“There’s always some tradeoff between communication rate and error probability,” said Kiyavash, who is also an alumna (MS '03, PhD '06) and affiliate professor of ECE ILLINOIS. “People want a reliable communication system, but they also want to send lots of information.”
Any code that can correct a particular number of deletions can also correct that many insertions or a mixture of that many total insertions and deletions. Levenshein's upper bound on code size comes from an adversarial set of deletion error patterns. Even though the ability to correct insertions is equivalent to the ability to correct deletions, the corresponding bound using insertion errors is much weaker. Kiyavash and Cullina were able to show that the best version of the bound uses a mixture of mostly deletions and a few insertions.
The key component of their proof is a partial solution to a combinatorial problem: on average, how many different output messages can a deletion-insertion channel produce?
“There are a lot more open questions than resolved ones in this area,” Cullina said. “We only closed a fraction of the gap between the upper and lower bounds on code size, and only for channels that make relatively few errors.”
The problem of deletion correction is connected to diverse real world problems, including DNA sequencing and network forensics.
“Any breakthrough in terms of fundamental limits is just beautiful because it satisfies our mathematical curiosity,” Kiyavash said.
Editor's note: media inquiries should be directed to Brad Petersen, Director of
Communications, at email@example.com or (217) 244-6376.