Packages Overview

This page provides an overview of packages - uses, popular formats, problems and available tools.

What are packages for?

_

[Software packages]
are often a single file containing many more files to be installed, along with rules describing what other software needs to be installed for the package to function properly._ (quote from http://en.wikipedia.org/wiki/Software_package).

Packages are a convenient way for users to get new (or updated) software installed on their computer. A package is more than just a collection of files to be installed. It usually contains additional information required for the proper installation and/or uninstallation: other packages that this package depends on, directories for files to be installed, menu items for the desktop environements, scripts to be executed before/after installing/uninstalling the package, and more. Packages are usually not installed manually by the user, but using a package manager. The package manager's duty is to automate (as much as possible) the process of installing, upgrading, configuring, and removing software packages from the user's computer.

Packages may be either binary packages or source packages. Binary packages contain the files needed for installation and proper functioning of the software, but not the source files. The files in the binary package are precompiled and so usually are expected to work on a limited set of machine architectures.

Source packages contain the files needed for compilation of the software on the user's computer. The source package contains a makefile which automates the compilation and (afterwards) installation processes. It is considered good practice to also enable an uninstallation procedure in the makefile.

Source packages are more flexible, because the user may choose to tweak the source and because the compilation will usually be optimized for the user's architecture at compile time. However, most users are not expected to be able to cope with software compilation on their machines and to fix problems with the compilation. Also, local compilation can slow down the machine quite considerably.

Following are descriptions of the most popular package formats and the most popular package managers.

Packages formats

Most of the new software today is packaged using either the RPM package format (Red Hat, Mandriva and more) or the DEB package format (mostly Debian).

Initially, the purpose of the packages was to automate and easy the managing the software included in the different GNU/Linux distributions - as the number of packages grew to several thousands, the distribution vendors needed a standard way to package a software piece along with its metadata. Today packages are being made by people willing to do so, sometimes with no connection to a distribution vendor.

The formats are not entirely different from each other and support a very similar set of functions. A comparison of features of 5 different packaging formats has been prepared by Alien developer Joey Hess and can be read here. No future "standard" to which all distributions will adhere is seen today as possible. By now the different major distributions have a loyal userbase which sticks by its own packaging format (and appropriate package managers). Moreover, converting the entire package repositories from one format to another seems a hard task which no-one is now willing to do.

Any solution to the packaging problems, presented later in this document, will have to be aware of the different packaging schemes available today.

Package managers

Note: dpkg, rpm and other low-level package managers are not listed here since they are quite basic and handle the package file itself. The tools that use dpkg and rpm but also provide dependency handling (such as apt and urpmi) are the ones that are listed. Front-ends for these tools, however, are not listed because they provide GUI only and no additional functionality (for example: Synaptic and RPMDrake).

apt

Advanced Packaging Tool (or apt) is the tool used originally by Debian to manage package retrieval and dependency handling. Originally it had worked with DEB packages, but it had been ported to other package formats as well, most notably RPM (see apt4rpm).

APT reads dependencies from an internal index which is updated from index files in repositories. The dependency index is built from the packages metadata. It then uses repositories from a configurable list to retrieve the missing packages. It's done recursively so all the dependencies are fulfilled. It is also possible with APT to update the whole system or parts of it with newer versions of software.

urpmi

urpmi was developed by Mandriva (formerly known as Mandrakesoft which merged with Conectiva) to provide functionality similar to apt's. This includes dependency handling, package retrieval from repositories and system update with newer versions. uprmi is very similar to apt and is used by Mandriva's distribution of GNU/Linux.

yum

yum is another tool for RPM-based distributions. Instead of relying on a repository index and custom code to determine what should be installed, yum gets from the server only the headers of the RPM files. Then yum can use the RPM native tools to read the headers and determine the dependencies. This results in a smaller code base and a simpler and faster program.

portage

Portage is the Gentoo package manager. Contrary to the other package managers, Portage deals only rarely with binary packages. Instead, it relies on source code packages with local compilation and installation. Each build and install are done using a "recipe" called an ebuild. Portage synchrnoizes its "Tree" (containing all the ebuild files) with the "Official Tree" of Gentoo approved packages. It is also possible to divert from the official tree and install other packages - either source or binary. The ebuild scripts handle the dependencies.

up2date/RHN

up2date and RHN (Red Hat Network) are the package managers of Fedora Core and Red Hat Enterprise, respectively. Both rely on a central, vendor-managed, repository and both update packages which came along with the distribution, i.e. they can't be used to install new third-party software (at least not easily). up2date can work both on top of yum and on top of apt.

YaST2

YaST2 is the configuration and package manager of the SuSE GNU/Linux distribution. It is comparable to apt-get and urpmi with dependency resolution and package retrieval. It is said, however, that the SuSE distribution is lacking in third-party repositories supporting YaST2, though the manager itself is considered as a good tool.

Problems with packages

Packagers

For a package to be available for users to install, someone has to prepare it. This is not a trivial task, since the package (as described above) is more than just a file archive.

It might take a few months for a distribution vendor to release a package of a new or updated software. In some cases, a certain project might not be packaged at all by a distribution vendor. This is due to the internal processes required by the distribution vendor before they can officially "release" a new/updated package: the distribution editor needs to be updated about the new/updated software, assess its necessity to the distribution, test it - both its functionality and its interoperability with other packages, make changes in its code to fit with the distribution (if needed) and finally package it properly. Additional lag is presented because of the dependency problem (see below).

Because of that lag, users are inclined to download packages made by third-party packagers, thus introducing possible packaging bugs and possible incompatibilities - the precompiled package might not run correctly on their specific architecture and distribution. Another possibility for users is to download the source code and compile it themselves.

Dependency

Usually, a package is dependent on other packages/files to be installed on the user's computer in order to operate correctly. For example, most packages are written in C and so need the C library present. Of course, a package will not just need _a_ C library, but a _specific_ (or later) version of it. In some cases, the dependency might become complicated with recursive dependencies. Other types of dependency might be conflicts between packages, a replacement of one by the other or one complementing the other.

Most of these types of dependencies are listed in the metadata section of the package. Most package managers know how to read the dependencies from the package file and download the appropriate packages from known package repositories.

There are some problems with the current methods of dealing with dependencies. The first is that a package might contain a dependency to another package which is not distributed by the same distribution vendor (either not distributed at all or not distributed with the same name or configuration). For example, mplayer is not distributed with Debian due to legal issues, but a user might want to download a package which needs the mplayer package. This leads to possible incompatibility problems.

The second problem related to dependencies is that the package manager learns the dependencies from the package's metadata declaratively, i.e. the packager is the one who tells the system which packages it should fetch due to dependencies. Packagers, who are often not the developers and not the distribution editors, are due to make mistakes and not include necessary package dependencies in the metadata.

The third problem related to dependencies is that the package manager has (today) only one way of checking whether a certain dependency is fulfilled, and that's looking onto its own database of packages installed. If a user should choose (for any reason) to install a package while _bypassing_ the package manager (for example, because it wasn't distributed at the time or the user needed a newer version), then the package manager will forever continue to assume that the package is not installed. A better option for the package manager might be to check the system directly for the necessary files and their versions (much like AutoPackage suggests).

Lack of compatibility

As implied before, in the previous problems, there is a certain lack of compatibility between the various packaging methods, part of it stemming from the lack of compatibility between the various flavors of GNU/Linux.

An installation script in a package might put the files in another directory than a script in another package of the same software would. This is due to the different filesystem hierarchies employed by the distribution vendors. This situation may be temporary, due to initiatives such as the Filesystem Hierarchy Standard (although naturally the FHS does not cover every existing or possible application).

Moreover, the different distriubtion vendors might have a different menu structure and/or system. A different menu file might be needed to be added for each distribution. Different packagers who are not related to distribution vendors might put the menu item in different places. Similarly to the filesystem hierarchy, there is also an attempt here to make a standard for menu hierarchies.

The different vendors might employ different naming conventions for their packaged material. They might also have a different policy on splitting-up big packages. This leads to package name ambiguity and uncertainty regarding installed packages on the system. For example, a user might install software on his Mandriva system which was packaged for Red Hat. The package manager might be led to think that a necessary library is not present in the system just because Red Hat uses a different naming convention than Mandriva on that specific library.

Another compatibility problem is of course between the package formats themselves. DEB is different from RPM. Even RPM has various flavors, not all of them compatible. There are ways to convert from one format to another (such as the Alien program) but naturally some information may be lost.

Related tools

Alien

Alien is a program that is able to convert one package format to another. It is used indeed by users who want to install on their machines software which has only been packaged for other distributions, thus enabling them to install the software without having to build it on their machines. However, this conversion doesn't solve the compatibility problems (naming conventions, binary compatibility, menus etc.) and - of course - a person still needs to create the first package, which might take time.

Autopackage

Autopackage tries to break the distinction between developers and packagers. According to Autopackage's philosophy, the developers are the best choice for packaging their own application, instead of third-party packagers or distribution vendors. Because the developer can't create a package for every existing distribution, Autopackage suggests a new multi-platform format (.package). The dependency checking is done not through a package manager's database, but by the way of checking the system directly for the files and versions needed. The developer has to supply "skeleton files" for each dependency, stating where to search the system for the files and where to download missing files from.

Autopackage's approach makes some sense, but some developers have reacted quite strongly to the idea of their spending time packaging their own application and resolving installation issues (such as ensuring binary compatibilities across platforms), so Autopackage's success with developers is not yet certain.

The other "classic" package formats (rpm, deb) are still relevant in Autopackage's philosophy, but only for their original use: distribution on CDs from the vendor. A possible problem here is that a new software project might be first packaged with the Autopackage format, and then later - when a distribution vendor finds it fit - packaged as, say, an rpm, so that consequent updates from the vendor will be in .rpm format while updates from the original developer will be in .package format, causing much possible confusion.

Autopackage doesn't directly address security issues - since packages get to the users without a third party revising them (either a distribution editor or a forum such as freshmeat), it is possible to package trojan horses, viruses and the like. Of course, such a problem exists with any scheme that bypasses revision. However, when downloading source code from the developer, the user has more control than when downloading binaries. This is a difficult problem and is recognized as such by the Autopackage author. Some tentative methods for dealing with it are proposed but none is yet implemented.

EPM

EPM is another format developed by Easy Software intended to be cross-platform and package-manager-independent (the package is installed by a script). EPM is supposed to support a union of the other formats' functionalities (though it doesn't support triggers a-la-rpm, for example). It is also possible to generate from EPM any other package formats, thus enabling the packager - in thoery - to package once and then easily distribute to other formats. A problem with EPM is that dependency is resolved only for packages that are of the same format (for example an RPM package made with EPM will only resolve dependencies with other RPM packages). This doesn't solve the dependency problem when installing packages without any package manager, and limits the use of the EPM format itself.

CheckInstall

CheckInstall is a utility which uses InstallWatch to monitor all files created/updated during a makefile installation and then uses that information to create a package (deb, rpm and tgz are supported). This rpm can then be used either to redistribute the software or to uninstall it (providing the makefile doesn't have an uninstall option). Distribution of a package created using CheckInstall will not address some of the metadata that is usually provided manually, namely dependencies (depends, recommends, replaces, etc.), menu files and so on. This makes CheckInstall a nice option for creating packages mainly for personal use but not for a serious distribution.

ldd

According to the ldd man page: ldd _"prints the shared libraries required by each program or shared library specified on the command line"_. This tool can be useful in addition to declarative dependencies, but not _instead_, since some dependencies are not explicit and can't be found just by means of static analysis (such as recommends, replaces and such).

autoconf

While Autoconf is not directly related to packaging, it is related to cross-platform issues. The source code of any software project might not compile correctly (or at all) between different flavors of UNIX (and even between different GNU/Linux distributions). This is due to different compilers, different environment variables, different scripting tools etc. Autoconf creates a configuration script which is intended to be run on the user's machine before compilation. This script checks the user's system directly (like autopackage) and updates the makefile accordingly. Autoconf is pretty much the de-facto standard among both developers and users who compile code. What we would like is something like "Autoconf" for binary packages…

Todo: Apt-Torrent

Other alternatives

ZeroInstall

The ZeroInstall project challenges the whole idea of packages and local installation. It lists inherent problems in current package management, such as the need of root privileges to install software, the size of packages containing files which are currently unnecessary, and other problems which were listed in this overview. The proposed alternative is to not install the applications at all. Instead, the users run their applications directly from the Internet from the software author's pages, using local caching for improved speed and in cases of downtimes. The software is never really _installed_ on the user's machine, and so needs never be uninstalled. This "zero deployment" scheme bypasses completely the distribution vendors and sends the application straight to the user's machines. Binary compatibility issues still need to be resolved (like in Autopackage) and also security issues (easy to lure users into running malicious software). Other issues with ZeroInstall are the quantity of "legacy" packages which still need to be supported, lack of QA/backwards compatiblity (e.g. when the developer decides to provide a new version and stop supporting the old one), and the dependency on the developer-site's availability to an unknown multitude of users (there could be mirrors set-up, but no methodology is present). Finally, there still has to be support for old-fashioned packages since not all applications are fit to be run this way (a possible example may be a DBMS).

OpenPKG

OpenPKG is a project to provide uniform and easy installation of multiple servers running various flavours of Unix (incl. GNU/Linux). OpenPKG supplies packages in a new format that is based on RPM. The same package installs on any (supported) flavour of Unix. It is not recommended to try and install a package from a different vendor with a different format on an OpenPKG-installed system due to possible incompatibility clash.

GoboLinux

According to the GoboLinux home page: GoboLinux is "a Linux distribution that breaks with the historical Unix directory hierarchy" which is "geared towards people who prefer to install applications from the original source packages". It is targated at "seasoned" users who don't want packaging systems to place files in their file system hierarchies without their knowledge/agreement. Each application has its files stored in its own subtree with symbolic links to where they really reside. The file-system itself is acting as a package database.

A-A-P

A-A-P is intended as a cross-platform replacement to Makefile. It uses Python instead of shell-scripts to ensure compatibility to any system capable of running Python. Those Python scripts are called "recipes" and are much more flexible than Makefile: the recipe can include instructions for accessing the Internet, using CVS, and so on. This can be used to automate many tasks when installing/uninstalling software.

More? (TBD)

Loads of GNU/Linux Links Package Management

Links to package management related sites

    • Main.AssafSagi - 31 Jan 2005
{metadata}

Topics Wp2

{metadata}

Version 1.6 last modified by MarcLijour on 15/12/2005 at 18:23

Comments 0

No comments for this document

Attachments 0

No attachments for this document

Creator: AssafSagi on 2005/05/12 21:50
Copyright EDOS Consortium
1.1.1