# P vs NP Problem

Every computer science student must have heard about the P vs. NP problem. One could say that it is the most famous unsolved problem in computer science. It is one of the 7 Millennium Prize Problems selected by the Clay Mathematics Institute to carry a 1 million dollar prize for the first correct solution and is still open. Proving or disproving the P=NP problem would have profound implications in the fields of computer science, mathematics, cryptography, AI, multimedia processing, economics and many other. This problem can be vaguely stated as

“Can every problem whose solution can be quickly verified by a computer also be quickly solved by a computer?”.

Although the existence of this issue was discussed by John Nash Jr. and Kurt Godel in the 1950s the problem was formally introduced by Stephen Cook in 1971 in his famous paper “The Complexity of Theorem Proving Procedures”. Before diving into formal statement and the explanation of the problem we will first look at some definitions related to the topic.

## Related terms and definitions :-

— the word computer casually used in the above vague statement refers to Deterministic Turing Machine (DTM). In simple words it is a machine which can select to proceed to only one next step. A non-branching machine [3]. This is how every existing computer operates.*Computer*— an expression consisting of variables raised to some power and their coefficients. For example a general second order polynomial is of the form ax²+bx+c.*Polynomial*— the amount of time taken by an algorithm to run as a function of the length of input. Commonly expressed using the big O notation. For example if we write an algorithm to print out all the elements in an array of size 2n one-by-one, its time complexity would be O(n).*Time complexity of an algorithm*— the time complexity of the algorithm is n^{O(1)}*Polynomial time complexity***P**= the set of problems that are solvable in polynomial time by a Deterministic Turing Machine.**NP**= the set of decision problems (answer is either yes or no) that are solvable in nondeterministic polynomial time i.e can be solved in polynomial time by a Nondeterministic Turing Machine[4].— a branching machine. If many options for the next step of computation exists this machine can go down all of them at the same time. NTMs are capable of guessing the correct option out of polynomially many options in O(1) time.*Nondeterministic Turing Machine(NTM)*

An alternative definition to NP would be the set of decision problems for which if a potential solution is provided, a DTM can verify its correctness in polynomial time. It is important to note that all P problems also belong to NP because if a problem is solvable by DTM in polynomial time, verifying a potential solution will be easier than solving. So a DTM would be able to verify also in polynomial time. *So trivially, P ⊆ NP i.e P is a subset of NP.*

Also it is important to know that all computers that exist today are DTMs and NTM is a purely theoretical computer used for thought experiments. As Prof. Erik Demain says in [1]

“So this (NTM) is a much more powerful model. Of course there’s no computers that work like this, sadly, or I guess more interestingly”

## More Terms and Definitions :-

— a problem X is NP-hard if every problem Y ∈ NP*NP-hard*to X i.e X is at least as hard to solve as every problem in NP (If P != NP, then X does not belong to P)*reduces*— a process of converting the inputs of a problem A into equivalent inputs of a problem B using a polynomial time algorithm. Equivalent means both the problem A and B must output the same answer (Yes or No) for the input and the converted input. The existence of a reduction algorithm from A to B will imply the following,*Reduction*

1. If B ∈ P, then A ∈ P (You can just reduce from A to B in polynomial time and solve B in polynomial time. Combining this gives polynomial time algorithm for A)

2. If B ∈ NP, then A ∈ NP

3. If A is NP-hard, then B is NP-hard. A can be reduced to B in polynomial time and if B is not NP-hard then B ∈ NP-NP-hard and it will mean that A ∈ NP-NP-hard which contradicts with the hypothesis (A is NP-hard).

— A problem X is NP-Complete if X ∈ NP and X is NP-hard.*NP-Completeness*

## Proving that A Problem is NP-Complete

Proving NP-completeness of a problem involves 2 steps. First we have to show that the problem belongs to NP and then we have to show it is NP-hard. The steps can be explained further as follows,

**Step 1 — Showing X ∈ NP. **find a nondeterministic algorithm for X. But a more practical way would be to give a polynomial time verification for X if a potential solution is provided.

**Step 2 — Showing X is NP-hard.** Reduce from a known NP-complete problem to X. So by the implication 3 we saw previously will mean that X is NP-hard.

## Stephen Cook

Stephen Arthur Cook (born December 14, 1939) is an American-Canadian computer scientist and mathematician. In his 1971 paper “The complexity of Theorem Proving Procedures” Cook formalized the notions of polynomial time reduction, and NP-completeness. In the same paper he proved the existence of NP-complete problem by showing that the 3-SAT boolean satisfiability problem is NP-complete.

He was **denied reappointment** at the University of California, Berkeley, mathematics department, in 1970, where he has been working as an assistant professor for 4 years. At that time he was working on his ground-breaking paper. He later joined the faculty of University of Toronto, Computer Science and Mathematics departments. He published the paper a year later, which laid foundations for the theory of NP-Completeness. Now, he is considered as one of the forefathers of computational complexity theory.

If you look at the definition of NP-hard you could understand how tedious proving a problem to be NP-complete for the first time in the world would be. It would mean that you basically have to show that you could reduce any problem in NP to 3SAT. Now that the hard work is done, our life is much easier. We can prove a problem to be NP-hard just by reducing from 3SAT to the required problem and using the 3rd implication we previously saw.

## The 3SAT problem

3SAT problem belongs to a class of problems known as Boolean satisfiability problem. Wikipedia defines Boolean satisfiability problem as,

the problem of determining if there exists an interpretation that satisfies a given Boolean formula. In other words, it asks whether the variables of a given Boolean formula can be consistently replaced by the values TRUE or FALSE in such a way that the formula evaluates to TRUE.

If we can find an interpretation that can make the entire formula true, then the boolean formula is called satisfiable, if not, unsatisfiable.

**Definition of 3SAT:** Given a boolean formula of the form:

is there an assignment of variables to True and False, such that the entire formula evaluates to True?

A clause in the formula is made up of the OR of 3 literals, and a formal is the AND of clauses.

*References:*

- Lecture 16: Complexity: P, NP, NP-completeness, Reductions — https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-046j-design-and-analysis-of-algorithms-spring-2015/lecture-videos/lecture-16-complexity-p-np-np-completeness-reductions/
- P versus NP problem — https://en.wikipedia.org/wiki/P_versus_NP_problem
- Computational Complexity Theory: What’s the difference between deterministic and non-deterministic Turing machines? — https://www.quora.com/Computational-Complexity-Theory-Whats-the-difference-between-deterministic-and-non-deterministic-Turing-machines
- NP (complexity) — https://en.wikipedia.org/wiki/NP_(complexity)
- Stephen Cook — https://en.wikipedia.org/wiki/Stephen_Cook