Applications
by: Lilly Irani
The development of mechanical theorem provers has inspired several applications, some practical, some academic, and most falling under both categories.
Mathematical Proof Checking
The following use for mechanical theorem proving methods seems the most obvious. Mathematicians and scientists can check proofs conceived of in the traditional way -- mentally -- using programs such as Otter or the Stanford Validity Checker, described in the tools section.
Though the word "automated" may seem to imply that a theorem prover would eliminate the need for mathematicians. However, this is certainly not the case. ATPs can be used by mathematicians to check their proofs, much in the same way that programmers debug their code -- the user inputs logical steps in the ATPs syntax. Then, the ATP verifies the correctness of the step. If the statement cannot be verified, the mathematician reconsiders the logical steps he made to arrive at that "sub-conclusion."
Checking proofs using ATP tools requires more work than simply using a pencil and paper and trusting the conclusion. However, pencil and paper proofs are often simplified by mistaken definitions, hidden assumptions, and invalid arguments. These errors sacrifice generality and accuracy. The extra time put into ATP helps eliminate such inaccurate results.
According to Stanford Professor of Computer Science David Dill, "If the only thing that a prover does is check your work, then you make so many mistakes that it makes using it worth it." [cit]
The QED Project
"The development of mathematics toward a greater appreciation has led, as is well known, to the formalization of large tracts of it, so that one can prove any theorem using nothing but a few mechanical rules." -- K. Goedel
This quote introduces "The QED Manifesto" [cit], the standard paper describing an international project to build a computer system that effectively represents important mathematical knowledge and technique. Once QED is developed, it's focus will be on confirming much of known mathematics, not on discovering new math. That will be left to the creativity of mathematicians.
Though much of the current work on QED is centered at Argonne National Laboratory, the anonymous authors of "The QED Manifesto" hope that the project will inspire a cooperative effort by hundreds of mathematicians and computer scientists, "with broad support and leadership from research agencies." [cit]
One of QED's proposed benefits is to ensure correctness in mathematics. Also, QED will allow mathematicians to access a large, well-organized, and easy-to-use (it is hoped) database of mathematical definitions and theorems. Though some argue that hard proofs are too difficult to encode in ATP syntax, QED's defenders state that expert users of QED will be able to check a proof ten times faster than the time it would take for a mathematician to write the proof at the level of an advanced undergraduate textbook.
Some argue that QED can also help the field of mathematics by taking much of the focus off of combing through proofs, since QED generated proofs are checked mechanically. Instead, "referees can concentrate on the relevance of the paper" [QED Workshop].
Advocates of the project also call attention to the fifty to one hundred thousand mathematical papers published each year. With a mechanized method of checking such papers, the inaccurate or flawed can be picked out and improved much more quickly than happens today. Today, it may be years before a flaw is discovered in a theorem's proof. Any paper that has used the misproven theorem in that time will also have to reconsider its results. QED (and automated theorem provers, in general, if used appropriately) can quicken this filtration process, accelerating the progress of the mathematical field and freeing mathematicians to consider new problems creatively.
In order to ensure accuracy, the QED project will be founded on a few pages-worth of mathematics. Using such basic, well-established truths as the absolute basis of proofs and deriving more advanced conclusions mechanically will cut down on chances for programmers to introduce bugs into QED. Also, it will ensure that false theorems not yet exposed are not used as the basis of QED-"certified" proofs.
John McCarthy is a Stanford Computer Science professor involved in the QED effort. He contributed the paper "The Mutilated Checkerboard in Set Theory" to the QED Workshop II.
Chip Verification
Because of their extreme complexity, computer chips (such as microprocessors) have been a particular target of practical formal verification research. Circuits can be verified by comparing their implementation (called gate-level) to their formal specifications. Such equivalence checking is widely used.
This comparison problem solves the satisfiability problem so it is NP-complete. Essentially, it checks all possible input configurations, so as a new "gate" is added, allowing an additional bit to enter the chip's circuit, the number of possible input configurations increases by a magnitude of 2 (exponential time).
Because formal verification is computationally expensive, firms must make an economic decision on whether to invest time and money into ensuring 100% accuracy. Some companies opt for a "guess and test" strategy to debug components. Engineers can devise test input either manually or pseudo-randomly. However, these methods cannot encompass all possible inputs or scenarios and some bugs inevitably fall through the cracks. [cit]
Intel is one company that has developed verification methods based on automated theorem proving. The company began work on such methods in July 1997, after the discovery of a bug in the Pentium Pro's floating-point execution unit. AMD also applied ATP to verify its K7 chip. [cit]
Software Verification
Hardware verification successes have inspired efforts to verify software as well. However, several factors case hardware and software verification to differ economically. First of all, chips are much more difficult to debug since a physical circuit must be redesigned -- the debug loop takes months, not minutes. Also, chip fabrication is expensive so more companies may find it in their best interests to verify their specifications before building. Hardware is also easier to represent formally, since it's less dynamic. (Chips have neither heaps nor dynamically created threads.) However, by any standard (lines of code, programmer hours, and/or money), more design effort is invested in software. Also, with the burgeoning e-commerce industry, reliability is of critical importance so, despite the difficulties, formal software verification is a viable research area.
A technology currently being developed is specification-checking. This is currently done using state charts, which are tabular representations of a program's states similar to a truth table [cit]. Daimler-Benz successfully used such verification to test its airbag firing mechanisms. They were able to locate and repair problems before the cars went on the market, avoiding a potentially fatal surfacing of problems later on. State chart verification has also been used in the development of other safety-critical systems such as those used in air traffic collision avoidance.
In class, we considered the paradox of a software verification program being unable to verify itself, since it could not terminate itself in the case of an infinite loop so that it return FALSE. If a formal verification program cannot be formally verified by itself, how can one ensure that the verification program itself is correct? This philosophical issue threatened the field of software verification. When probed on this issue, however, Professor David Dill replied "This [paradox] used to bother me a lot; now it doesn't." He explains that:
"You can find bugs or increase your confidence that you have found most or all of the bugs without knowing that your checking program is correct. If you can reduce a problem to logic, you can't have complete confidence in the result, but you can be VERY confident."
Though a perfect and beautiful solution to the verification paradox does not exist, pragmatics still call for software verification, whether it is 100% accurate or 99%.