The theoretical intractability of the basic package management problems (low-cost installation, individual package installability, and other properties to be developed) whenever we have conflicts and disjunctive dependencies poses serious problems, even if ad hoc (APT, smart) or SAT solvers like Davis-Putnam or GRASP seem more or less to work reasonably well for now.
The number of packages grows exponentially. Also software dependencies become more complex. These are side effects of two good things :
- The increasing availability of free software,
- The increasing use of free software components in other free software.
We have observed that the average behaviour of SAT solvers is quadratic as a function of the dependency complexity. Dependency complexity for a package is defined as the subcircuit of the dependency circuit that is reachable from the node representing that package. The growth rate of the average dependency complexity is unknown (but this could be observed by looking at the older pools).
However, we might conjecture that the depenedency complexity is linear in the size of the total source code needed to build a given package. Therefore, it is dependent on the quantity of code reuse.
If we assume that dependency complexity grows linearly with time, then the amount of computation required to check package installability will grow quadratically. This may be offset by the increase in computing power. However, as integration density and computing power increases, Linux is increasingly used in embedded and deeply embedded systems, whose complexity follows that of personal computers with, say, a lag of five or ten years. These systems will soon require unified Linux distributions with package management and remote update. As computing power has a higher energy cost than memory (because computing, being, today, sequential, its power is, to the first order, proportional to the operating frequency, and power consumption increases with operating frequency), we will have embedded systems with large flash memory and few computing power. Therefore, complex software may be installed on these MIPS-starved storageful systems. (The question of determining whether there is complex software which does not require much CPU power is left open.)
But computing the installability of such complex software on those systems may
well exceed their computing power.
Also, the quadratic behaviour is totally empirical, and the actual times, although reasonable on a fast system (on the order of seconds), become non
negligible on a slightly old PC (on the order of tens of seconds). The algorithms used, even if correct (which they are not in the case of APT), have no provable and reasonable upper bound on the time required to compute a solution.
What we want is an algorithm that is provably correct and that has a provably
acceptable bound on required computational resources. As our chances of solving P = NP are slim, we must either :
- Find a statistical property of the package pool which allows us to give an adequate bound on the complexity of the presently used solvers. Changes to the pool must more or less preserve that statistical property.
- Find a new dependency management scheme (which may or may not be very different from what we have now) that has an efficient algorithm, then convince everyone to use that scheme, or create a distribution that uses it, together with automatic or semi-automatic conversion tools from existing distributions.
For the first approach, as it has been noted that the Debian package graph is small world and the behaviour of the solvers is quadratic, we emit the following conjecture:
Conjecture (Small world solvability)
Any boolean circuit whose graph is small-world can be solved in polynomial time.
Comments: 0