API Overview

The EGE API is a Java based framework that provides basic implementation and interfaces of mechanisms mentioned in the introduction to EGE. The EGE API is written in Java 1.5 (5.0). It directly uses three external libraries:

EGE API contains the following Java packages:

  • pl.psnc.dl.ege - main package with the implementation of the EGE logic.
  • pl.psnc.dl.ege.component - contains the interfaces describing the three main EGE components.
  • pl.psnc.dl.ege.exception - contains the EGE exceptions.
  • pl.psnc.dl.ege.types - contains the EGE data types used by the components and conversion graph.

EGE API specific data types (pl.psnc.dlteam.ege.types)

Format and MIME type - both are represented as standard String data type. Format represents a name of format, like: ENRICH TEI P5 or EAD. MIME type can be for example: text/xml, application/pdf or application/msword. The values of format and MIME type are not validated in any way. This is intentional, because the internal EGE mechanisms are designed to be very general. If one would like to use the EGE in a context different than the original context connected with the ENRICH project, this should also be possible.

The pl.psnc.dl.ege.types includes the following classes:

  • DataType - contains both format and MIME type information; instances of this class are used to describe both input and output data types in EGE.
  • ConversionAction - every instance of this class describes one particular conversion operation, which can be performed by specified converter component on concrete input/output data types. A particular converter A can perform conversion of a data in X DataType to Y DataType. The information about X and Y DataTypes is stored in ConversionActionArguments class instance. ConversionAction is also a base object class for a node in conversion graph. Each conversion action can be dynamically configured through properties provided with ConversionActionArguments class instances.
  • ConversionActionArguments - describes input and output data types for a conversion action. Input and output data types are instances of a DataType class. Each instance provides also arguments for dynamic parameterization of conversion.
  • ConversionsPath - each instance of this class contains a list (a sequence) of a ConversionAction objects; this list represents path of chained conversion where: input data type from first element of sequence is a source data type and output data type from last element of sequence is a result data type. Adjacent elements of such path have its input/output data types compatible - this is assured during the process of conversion graph construction.
  • ValidationResult - instances of this class are returned by validation methods. Each instance contains status of validation result (ERROR,SUCCESS or FATAL) and messages.

The UML diagram below shows described relationships between classes.

UML Diagram of EGE

Each instance of ConversionsPath contains sequence of ConversionAction instances. ConversionAction instance references instance of a class that implements Converter interface (loaded by the JPF at EGE start-up) and also specifies conversion action data types (through the instance of the ConversionActionArguments class). ConversionActionArguments contains information about input and output data types, so each instance of this class references two instances of DataType class. DataType is a elementary class that contains information about format and MIME type, both kept as String.

Conversion process can be dynamically parameterised with properties described within ConversionActionArguments instances. ConversionActionArguments contains properties definitions written as String. Each definition of property should contain at least : unique id of property and its data type. Syntax of properties definitions should be properly described in converter documentation. With documented syntax client application can translate available properties in order to provide e.g. user interface for conversion configuration. Properties configured through default client application interface can be transferred to converter using map argument of ConversionActionArguments instance, where : key in map is a unique "id" of property and value is value assigned to property. Validation of properties and default settings should be a converters task.

Components interfaces

Each component in the EGE (validator, converter or recognizer) can have multiple implementations provided through the extension mechanism of JPF. In order to provide this extensibility in the EGE API the three above interfaces were defined in the pl.psnc.dl.ege.component package:

  • Recognizer - declares major functions of the EGE recognizer, has to be implemented by every external recognizer.
  • Validator - declares major functions of the EGE validator, has to be implemented by every external validator.
  • Converter - declares major functions of the EGE converter, has to be implemented by every external converter.
  • ConfigurableConverter - extends the standard Converter interface with one additional function of configure().

The description of the methods declared in these three interfaces is presented in following sections.

Recognizer

Recognizers are responsible for the recognition of the MIME type of data. MIME types are media types identifiers registered by IANA (Internet Assigned Number Authority), the EGE however intentionally does not provide any checking mechanism of this standardization. Therefore the MIME types are just instances of the String type.

Declared methods:

  • getRecognizeableMimeTypes(): List<String> - should return a list of MIME types recognized by this recognizer.
  • recognize(InputStream inputData) : String - for provided input data a particular recognizer implementation tries to recognize the input MIME type and returns it or throws an exception if the input MIME type was not recognized.

Validator

Validators are responsible for validating the input data with respect to its format, e.g. if sent data is in ENRICH TEI P5 format.

Declared methods:

  • getSuportedValidationTypes() : List<DataType> - method returns data types that the implemented validator is able to validate.
  • validate(InputStream inputData, DataType inputDataType) : ValidationResult - method returns instances of the ValidationResult type. Every result contains a status (whether the validation ended with success, error or fatal error) and validation messages (about errors or warnings). If inputDataType is not supported by Validator implementation, method should throw ValidatorException.

Converter

Converters perform conversion of given input stream and store the result of conversion into the given output stream.

Declared methods:

  • getPossibleConversions() : List<ConversionActionArguments> - should return list of arguments with pairs of input/output data types supported by the converter implementation and the properties definitions associated with those pairs.
  • convert(InputStream inputData, OutputStream outputData, ConversionActionArguments conversionArguments) : void - performs conversion of input data contained within given input stream and puts the converted data into the given output stream. Both the expected input data type and the output data type are contained within the conversionArguments parameter and they should be compatible with the particular converter possibilities. With the basic input/output arguments method can receive conversion parameters filled according to the parameters definitions syntax (also contained within the ConversionActionArguments instance).

ConfigurableConverter

This extends the standard Converter interface with additional configure() method.

Declared methods:

  • configure(Map<String,String> params) : void - converters that implements this interface can receive additional parameters from JPF plugin descriptor. The method is executed from the EGE configuration manager which translates taken parameters into the map argument. The converter is responsible for reading the map and setting up the configuration. The converter can inform EGE configuration manager about configuration errors by throwing an EGEException; improperly configured converter will be disconnected from EGE .

EGE interface and implementation

The main EGE interface is the pl.psnc.dl.ege.EGE. It describes functionality for complex and multiple conversions of data. The intention of the EGE is to construct a graph of conversions, where every possibility of conversion is describes by graph paths. The graph structure and its basic algorithms are implemented through external JUNG library by the EGE framework class - pl.psnc.dl.ege.EGEImpl which is also the main implementation of the EGE interface.

The main methods of pl.psnc.dl.ege.EGE interface are:

  • findConversionPaths(DataType sourceDataType) : List<ConversionsPath> - finds all possible ways of conversion for the given data type; all those ways (conversion paths), are returned as a List of ConversionsPath instances. ConversionsPath instances received from this method can be used as the input parameter of a performConversion() method.
  • findConversionPaths(DataType sourceDataType, DataType resultDataType) : List<ConversionsPath> - finds all possible ways of conversion from the specified sourceDataType to resultDataType. Depending on the set of loaded converters, there can be several parallel paths.
  • performConversion(InputStream inputData, OutputStream outputStream, ConversionsPath path) : void - performs multiple conversions described by the ConversionsPath parameter. Converted input data - provided through the given input stream is written to the output described by the given output stream.
  • performValidation(InputStream inputData, DataType inputDataType) : ValidationResult - this method performs validation using all loaded through the extension mechanism Validator interface implementations. The method returns ValidationResult instance which contains the status of a result and the validation messages. If no validator recognizes inputDataType as supported then an exception will occur.
  • performRecognition(InputStream inputData) : String - This method performs the recognition of the MIME type of an input data using all loaded EGE recognizers. If any of the loaded EGE recognizers decodes the MIME type of the input data, the method returns the String value of the MIME type, otherwise the method throws an exception.

The EGE interface methods are implemented by the EGEImpl class of the EGE framework. Internally the conversion graph is initialized in the EGEImpl constructor from the loaded external plugins - implementations of the Converter interface. JPF extensions are managed through an instance of pl.psnc.dl.ege.EGEConfigurationManager class. From each loaded converter its supported ConversionActionArguments are read and used for the creation of nodes for the conversion graph. During the graph construction the nodes are connected with the directed edge by rule: an arc from node A to node B can only be added if at least one of the node A output data types is compatible with at least one of the node B input data types. For each compatible input/output pair an arc in the graph is added.

Important: EGE API assumes conversion of the data by usage of the streams - one input stream for the input data and one output stream for the output data. In order to make it possible to provide input data and output data consisting of multiple files/directories, EGE implementation requires that every EGE converter accepts data and outputs data by means of a ZIP archive. This requirement is crucial not only for appropriate conversion of data consisting of multiple files/directories, but also for conversion results consisting of multiple files/directories. To have a simple rules for EGE converter creation, EGE implementation requires every converter to obey this requirement. Additionally, for developers' convenience EGE implementation provides functionality for compressing multiple files into a ZIP archive and decompressing ZIP archive. These functions are provided by the ZipIOResolver class. An instance of this class is returned by the getStandardIOResolver() method of the EGEConfigurationManager instance. Please, note that this requirement is stated by this specific EGE implementation and not by the EGE API itself.