API Overview
The EGE API is a Java based framework that provides basic implementation and interfaces of
mechanisms mentioned in the introduction to EGE. The EGE API is written in Java 1.5 (5.0). It
directly uses three external libraries:
EGE API contains the following Java packages:
- pl.psnc.dl.ege - main package with the implementation of the EGE logic.
- pl.psnc.dl.ege.component - contains the interfaces describing the three main EGE
components.
- pl.psnc.dl.ege.exception - contains the EGE exceptions.
- pl.psnc.dl.ege.types - contains the EGE data types used by the components and conversion
graph.
EGE API specific data types (pl.psnc.dlteam.ege.types)
Format and MIME type - both are represented as standard String data type. Format
represents a name of format, like: ENRICH TEI P5 or EAD. MIME type can be for example:
text/xml, application/pdf or application/msword. The values of format and MIME type are
not validated in any way. This is intentional, because the internal EGE mechanisms are
designed to be very general. If one would like to use the EGE in a context different than
the original context connected with the ENRICH project, this should also be possible.
The pl.psnc.dl.ege.types includes the following classes:
- DataType - contains both format and MIME type information; instances of this
class are used to describe both input and output data types in EGE.
- ConversionAction - every instance of this class describes one particular
conversion operation, which can be performed by specified converter component on
concrete input/output data types. A particular converter A can perform conversion of a
data in X DataType to Y DataType. The information about X and Y DataTypes is stored in
ConversionActionArguments class instance. ConversionAction is also a base object class
for a node in conversion graph. Each conversion action can be dynamically configured
through properties provided with ConversionActionArguments class instances.
- ConversionActionArguments - describes input and output data types for a
conversion action. Input and output data types are instances of a DataType class. Each
instance provides also arguments for dynamic parameterization of conversion.
- ConversionsPath - each instance of this class contains a list (a sequence) of a
ConversionAction objects; this list represents path of chained conversion where: input
data type from first element of sequence is a source data type and output data type from
last element of sequence is a result data type. Adjacent elements of such path have its
input/output data types compatible - this is assured during the process of conversion
graph construction.
- ValidationResult - instances of this class are returned by validation methods.
Each instance contains status of validation result (ERROR,SUCCESS or FATAL) and
messages.
The UML diagram below shows described relationships between classes.
Each instance of ConversionsPath contains sequence of ConversionAction instances.
ConversionAction instance references instance of a class that implements Converter
interface (loaded by the JPF at EGE start-up) and also specifies conversion action data
types (through the instance of the ConversionActionArguments class).
ConversionActionArguments contains information about input and output data types, so each
instance of this class references two instances of DataType class. DataType is a
elementary class that contains information about format and MIME type, both kept as
String.
Conversion process can be dynamically parameterised with properties described within
ConversionActionArguments instances. ConversionActionArguments contains properties
definitions written as String. Each definition of property should contain at least :
unique id of property and its data type. Syntax of properties definitions should be
properly described in converter documentation. With documented syntax client application
can translate available properties in order to provide e.g. user interface for conversion
configuration. Properties configured through default client application interface can be
transferred to converter using map argument of ConversionActionArguments instance, where :
key in map is a unique "id" of property and value is value assigned to property.
Validation of properties and default settings should be a converters task.
Components interfaces
Each component in the EGE (validator, converter or recognizer) can have multiple
implementations provided through the extension mechanism of JPF. In order to provide this
extensibility in the EGE API the three above interfaces were defined in the
pl.psnc.dl.ege.component package:
- Recognizer - declares major functions of the EGE recognizer, has to be implemented
by every external recognizer.
- Validator - declares major functions of the EGE validator, has to be implemented by
every external validator.
- Converter - declares major functions of the EGE converter, has to be implemented by
every external converter.
- ConfigurableConverter - extends the standard Converter interface with one
additional function of configure().
The description of the methods declared in these three interfaces is presented in following
sections.
Recognizer
Recognizers are responsible for the recognition of the MIME type of data. MIME types are media
types identifiers registered by IANA (Internet Assigned Number Authority), the EGE however
intentionally does not provide any checking mechanism of this standardization. Therefore
the MIME types are just instances of the String type.
Declared methods:
- getRecognizeableMimeTypes(): List<String> - should return a list
of MIME types recognized by this recognizer.
- recognize(InputStream inputData) : String - for provided input data a
particular recognizer implementation tries to recognize the input MIME type and returns
it or throws an exception if the input MIME type was not recognized.
Validator
Validators are responsible for validating the input data with respect to its format, e.g. if
sent data is in ENRICH TEI P5 format.
Declared methods:
- getSuportedValidationTypes() : List<DataType> - method returns
data types that the implemented validator is able to validate.
- validate(InputStream inputData, DataType inputDataType) : ValidationResult -
method returns instances of the ValidationResult type. Every result contains a status
(whether the validation ended with success, error or fatal error) and validation
messages (about errors or warnings). If inputDataType is not supported by
Validator implementation, method should throw ValidatorException.
Converter
Converters perform conversion of given input stream and store the result of conversion
into the given output stream.
Declared methods:
- getPossibleConversions() : List<ConversionActionArguments> -
should return list of arguments with pairs of input/output data types supported by
the converter implementation and the properties definitions associated with those
pairs.
- convert(InputStream inputData, OutputStream outputData, ConversionActionArguments
conversionArguments) : void - performs conversion of input data contained within
given input stream and puts the converted data into the given output stream. Both the expected input
data type and the output data type are contained within the conversionArguments parameter and
they should be compatible with the particular converter possibilities. With the basic
input/output arguments method can receive conversion parameters filled according to
the parameters definitions syntax (also contained within the ConversionActionArguments
instance).
ConfigurableConverter
This extends the standard Converter interface with additional configure() method.
Declared methods:
- configure(Map<String,String> params) : void - converters that
implements this interface can receive additional parameters from JPF plugin descriptor.
The method is executed from the EGE configuration manager which translates taken parameters into
the map argument. The converter is responsible for reading the map and setting up
the configuration. The converter can inform EGE configuration manager about configuration errors
by throwing an EGEException; improperly configured converter will be disconnected from EGE
.
EGE interface and implementation
The main EGE interface is the pl.psnc.dl.ege.EGE. It describes functionality for
complex and multiple conversions of data. The intention of the EGE is to construct a graph of
conversions, where every possibility of conversion is describes by graph paths. The graph
structure and its basic algorithms are implemented through external JUNG library by the EGE
framework class - pl.psnc.dl.ege.EGEImpl which is also the main implementation of
the EGE interface.
The main methods of pl.psnc.dl.ege.EGE interface are:
- findConversionPaths(DataType sourceDataType) :
List<ConversionsPath> - finds all possible ways of conversion for
the given data type; all those ways (conversion paths), are returned as a List of
ConversionsPath instances. ConversionsPath instances received from this method can be
used as the input parameter of a performConversion() method.
- findConversionPaths(DataType sourceDataType, DataType resultDataType) :
List<ConversionsPath> - finds all possible ways of conversion from
the specified sourceDataType to resultDataType. Depending on the set of loaded converters,
there can be several parallel paths.
- performConversion(InputStream inputData, OutputStream outputStream, ConversionsPath
path) : void - performs multiple conversions described by the ConversionsPath
parameter. Converted input data - provided through the given input stream is written to
the output described by the given output stream.
- performValidation(InputStream inputData, DataType inputDataType) : ValidationResult
- this method performs validation using all loaded through the extension mechanism Validator
interface implementations. The method returns ValidationResult instance which contains
the status of a result and the validation messages. If no validator recognizes inputDataType as
supported then an exception will occur.
- performRecognition(InputStream inputData) : String - This method performs
the recognition of the MIME type of an input data using all loaded EGE recognizers. If any
of the loaded EGE recognizers decodes the MIME type of the input data, the method returns the String
value of the MIME type, otherwise the method throws an exception.
The EGE interface methods are implemented by the EGEImpl class of the EGE framework. Internally
the conversion graph is initialized in the EGEImpl constructor from the loaded external plugins -
implementations of the Converter interface. JPF extensions are managed through an instance of
pl.psnc.dl.ege.EGEConfigurationManager class. From each loaded converter its
supported ConversionActionArguments are read and used for the creation of nodes for
the conversion graph. During the graph construction the nodes are connected with the directed edge by
rule: an arc from node A to node B can only be added if at least one of the node A output data
types is compatible with at least one of the node B input data types. For each compatible
input/output pair an arc in the graph is added.
Important: EGE API assumes conversion of the data by usage of the streams - one input stream for the input data and one output stream for the output data. In order to make it possible to provide input data and output data consisting of multiple files/directories, EGE implementation requires that every EGE converter accepts data and outputs data by means of a ZIP archive. This requirement is crucial not only for appropriate conversion of data consisting of multiple files/directories, but also for conversion results consisting of multiple files/directories. To have a simple rules for EGE converter creation, EGE implementation requires every converter to obey this requirement. Additionally, for developers' convenience EGE implementation provides functionality for compressing multiple files into a ZIP archive and decompressing ZIP archive. These functions are provided by the ZipIOResolver class. An instance of this class is returned by the getStandardIOResolver() method of the EGEConfigurationManager instance.
Please, note that this requirement is stated by this specific EGE implementation and not by the EGE API itself.