This document describes design of easyRPM. It could be useful for everyone who would like to contribute this project or just want to know how it works.
This document is not yet finished. Comments, critics, suggestions and language corrections are appreciated.
Main design principle is "just do it". This means architecture is not an aim itself. I believe that code should just do what it should and its beauty will come itself :)
easyRPM is a GUI Application. GUI is what you see. So, let's start from the top.
GUI code is clearly divided from core logics. All gui code could be found in package net.sf.easyrpm.gui.swing and net.sf.easyrpm.gui.swing.dialogs. Take a look at it, it is really simple. Though it is 65kb of code, it is mostly declarative code.
Swing does not look natively and it does not work well on free java stack so users need to download JRE from Sun. This additional dependency is bad for the tool that tries to make dependency handling easier. It is needed to have GUI that looks natively and does not require JRE from Sun. There are several alternatives, but this is already another story (see TODO file ;)
GUI is mostly concerned on presenting a list of packages to the user. net.sf.easyrpm.packagelist.PackageListModel handles all logic of how to form list of installed and available packages. It responsible for updating, filtering and performing other operations on the list. GUI level has only to sort, group and show this list to the user.
An item of a list is a object of ShortPackage class. ShortPackage holds only brief information (name,version,size) of a package it represents. It also has a reference to the real full package information - a PackageLocator object. (see next section for details)
Package list also handles forming of transaction and its processing.
net.sf.easyrpm.RPMPackage provides interface to the package information. This information is quite bulk. easyRPM has to deal with a lot of packages (thousands) and it is impossible to load all of them into memory.
Therefore access to the list of packages is performed through PackageIterator. It allow to process package information serially and to hold in memory only needed information. For instance, ShortPackage that is used in PackageList stores only name, version, size and PackageLocator of original RPMPackage.
PackageLocator allows to retrieve full information at any time. It is unified way to reference to package information. If we have locator of installed package, then it just holds its key in database of installed packages (rpmdb). RPMPackage that was read from local rpm file provides locator that stores path to that file.
easyRPM uses jRPM to read rpm
header data. Several patches should be applied to this library
in order to work properly. What have been changed:
CHAR.java - read 1 byte as char instead of two
Header.java - skipp wholes between header entries
RPMHeader.java, RPMSignature.java: "enum" -> "enumeration" as
"enum" is a keyword in java 1.5.
RPMLead.java "unknown" type should be passes as it is "noarch"
net.sf.easyrpm.rpmdb.RPMDB provides access to database of installed packages. There was no api to this databases for java. Reading output of rpm program seemed ineffective. Instead RPMDB deals primary with database files that are located in reads. This files are Berkeley Database databases. To read this databases berkeleyDB library and its java bindings are used. BerkeleyDB is compiled with "1.85" version database format (--enable-compat185 and --enable-java configuration options).
I could not find database structure description. And I
determined it experimentally. Here is what I found :
Packages
key : 4 bytes
data: rpm file header
"Packages" is where information of installed packages is
stored. Others databases are a sort of indexes, that helps to
find packages by some parameters.
Name
key : string, package name
data: package key (4 bytes)
Group
key : string, group name
data: a set of records with structure - package key (4 bytes)
and 4 null bytes
Basenames
key : string, file name
data: a set of records with structure - package key (4 bytes),
number of file in this package (4 bytes)
Dirnames
key : string, directory name
data: a set of records with structure - package key (4 bytes),
number of file in this package that is located in this direcory
(4 bytes)
Providename
key : string, name of feature
data: a set of records with structure - package key (4 bytes),
number of feature in this package (4 bytes)
Provideversion
key : string, name of feature
data: a set of records - package key (4 bytes), number of
feature with such version in this package (4 bytes)
Requirename and Requirename have the same structure but
concerning required features
RPMDB cashes brief package information into cache.db file. It contains name, group, version and size information, which is needed to form package list. This cache lets to speed up application start time (from 30 to 4 seconds) by reading very little database cache.db (~120Kb) instead of big Packages database (~40 Mb).
Repository stores information of available packages. This are packages that located on hard disk, removable media or in the net. These are packages that are available to be installed. net.sf.easyrpm.repository.Repository defines interface of repository. There are several implementations of it: HibernateRepository, PureSQLRepository and BDRepository. HibernateRepository was used at early development (even earlier than this :) stage when main objects used to change frequently. HibernateRepository is based on SQL database and Hibernate ORM library. Hibernate is too heavy library (3Mb with dependencies) for client application. Then PureSQLRepository was implemented. It is based on SQL database only. But why should easyRPM need SQL database if it already uses Berkeley DB!? And now BDRepository that is based on this database is used instead of two previous. Though HibernateRepository and PureSQLRepository are not in use now and I'm not sure they work correct they they will probably be useful if there would be server-side application based on easyRPM. (I've got such an idea)
Dependency resolving is performed by net.sf.easyrpm.installsystem.TransactionManager. It keeps transaction in a consequent state. If you install new package and it misses some required features, TransactionManager will find needed packages in repository and will add it into transaction. If you delete some package that is required by others installed packages, then TransactionManager will add this packages into transaction for deleting too. Though TransactionManager works fine for me it has some misses. It does not check architecture compatibility of package and system. Also it does not resolve conflicting dependencies.
Searching mechanism is based on filters. net.sf.easyrpm.rpm.filter.PackageFilter represents well-known filter abstraction. There are such obvious filters like NameFilter, GroupFilter, SizeFilter, FileFilter, FeatureFilter. There are also composite filters like: 'AndFilter', 'NotFilter', 'OrFilter'.
When you search using filter both RPMDB and Repository try to use indexes to optimize this process. If they were not able to use indexes they will apply filter to every package they store.
Composite filter abstraction lets user to "compose" very complex
filter expressions and to perform complicated searches. User may
form filters expression using xml.
net.sf.easyrpm.rpm.filter.XMLFilterReader converts xml into filter and vice versa.
For instance, xml expression:
<not>
<name>lib</name>
<name>devel</name>
<group>library</group>
<group>Development</group>
</not>
would be translated into filter that will help user to hide
development and library packages
<or>
<name>game</name>
<group>game</group>
</or>
would find games
<file>/etc/my.cnf</file>
would find package that owns this file