Market Data in Financial Environments Like Asset and Risk Management

Vendors, File Formats, perl Classes, ...

Jochen Hayek

Aleph Soft GmbH

  Augsburger Straße 33
  D-10789 Berlin

Release Info: $Id: book.xml 1.48 2007/07/17 11:46:00 johayek Exp $

Publication Date: 2007


Files from market data suppliers come in various different formats, some of them look quite like a CSV file, others look quite different to CSV files.

We have a utility (resp. a family of utilities), capable of reformatting a lot of files from various of those suppliers towards a standard-ish CSV file format.

This kind of utility often gets called parser or similarly, but that's not quite the right name for it, as it does not only parse date, but it also rewrites (i.e. re-writes) the data for further use.

You were not satisfied with a statement like “yes, this is quite a well-formed file according to that format”, but you also want to process the data with yet another utility, right?

You may find this text a little terse, and we agree to that, but the software itself runs pretty stably and more documentation is easily and very willingly written, once the demand for it rises.

Contact us!

Impressum (aka german legalese)

Verantwortlich für diese Web-Seiten („Homepage“): Aleph Soft GmbH, Jochen Hayek, Augsburger Straße 33, D-10789 Berlin


Mit dem Urteil vom 12. Mai 1998 - 312 O 85/98 - "Haftung für Links" hat das Landgericht Hamburg entschieden, dass man durch einen Link auf eine andere Homepage deren Inhalte ggf. mit zu verantworten hat. Dies kann laut LG nur verhindert werden, indem man sich ausdrücklich von diesen Inhalten distanziert. Hiermit distanzieren wir uns ausdrücklich von sämtlichen Inhalten aller von uns per Link angebotenen fremden Seiten.

Table of Contents

1. vendors
1.1. Bloomberg
1.1.1. files with fixed column names
1.1.2. files with name-/value-pairs -- corporate actions
1.2. Citigroup
1.2.1. equity files with names like bmir*.IC*
1.2.2. FI files with names like SL*.csv
1.2.3. various others
1.3. iBoxx
1.4. J.P. Morgan -- various formats
1.5. Lehman Brothers
1.6. Merrill Lynch
1.7. Moody's
1.8. MSCI
1.9. S&R ratings
1.10. stoxx
1.11. Thomson Financial -- various formats
1.12. Wertpapier-Mitteilungen
2. classes of the OO-ified
2.1. re-usable general base classes
2.1.1. file_class
2.1.2. simple_csv_file_class
2.1.3. simple_csv_file_with_history_class
2.1.4. xml_class
2.1.5. rdbms_class
2.2. others
2.3. discussing design ideas
2.3.1. the embedded object approach -- why do you use this level of indirection?

Chapter 1. vendors

1.1. Bloomberg

... ...

1.1.1. files with fixed column names


1.1.2. files with name-/value-pairs -- corporate actions


1.2. Citigroup

... ...

1.2.1. equity files with names like bmir*.IC*


1.2.2. FI files with names like SL*.csv


1.2.3. various others


1.3. iBoxx

... ...

1.4. J.P. Morgan -- various formats

... ...

1.5. Lehman Brothers

... ...

1.6. Merrill Lynch

... ...

1.7. Moody's

... ...

1.8. MSCI

... ...

1.9. S&R ratings

... ...

1.10. stoxx

... ...

1.11. Thomson Financial -- various formats

... ...

1.12. Wertpapier-Mitteilungen

... ...

We make use of the table WM-Felder as a data dictionary in order to print the parsed fields conveniently formatted into a CSV file records.

Documentation from that company comes as zipped Access data base files, the zip-files named like WMDOK*-*.zip the Access data base files named like WMDOK*-*.mde.

When you open such a data base file, a data base macro starts up, that presents the data base as a form controlled application, but actually the data base consists of:

tables (Tabellen)
querys (Abfragen)
forms (Formulare)
reports (Berichte)
macros (Makros)
modules (Module)

One table is called WM-Felder. The name says it all, it's a description of each and every field within the files, they provide you with (one row in that table for a field in such a file), plus a few extras, presented as pseudo fields. All the overall documentation is actually stuffed into pseudo fields resp. pseudo field descriptions.

Humans read those descriptions best via the homonymous report WM-Felder, but the table itself is the ideal basis for mechanical interpretation of the fields of their delivered files, and that's, what we actual do: We dump the table as CSV file, and load it as a sort of data (type) dictionary. Actually we don't dump all the columns, as we only need a couple of them, that makes the data structure quite lean.

The first record is called (i.e. its column Langbezeichnung has a value of ...) VORLAUFSATZ/VF, the FeldidentVF equals -EINL10.

The last record is called (i.e. its column Langbezeichnung has a value of ...) DATENENDESATZ/VF, the FeldidentVF equals -EINL12.

Chapter 2. classes of the OO-ified

2.1. re-usable general base classes

2.1.1. file_class

THE base class -- the mother of all

Provides such methods as test_existence, modification_time, _open

The methods test_existence and modification_time enable the general availability of the homonymous jobs, so in case your class derives from file_class, you don't need to implement test_existence and modification_time yourself, as you inherited them already.

2.1.2. simple_csv_file_class

former proc__simple_csv_file

This class is for CSV files, and you can specify, how dates look, that you want to get reformatted, and also in which line the body starts.

The following classes simply derive from this class:

vendor_djindexes_file_class, vendor_impax_file_class, vendor_telekurs_file_class, vendor_thetakeoverpanel_file_class

The following classes make use of this class for an embedded object:

vendor_bloomberg_file_class, vendor_citigroup_file_class, vendor_iboxx_file_class, vendor_indexco_file_class, vendor_jpmorgan_file_class

2.1.3. simple_csv_file_with_history_class

former proc__simple_csv_file_with_history

This class is also for CSV files, so it looks pretty much like its sister class simple_csv_file_class, but it is dedicated esp. to historical files,

Only for historical files provides the method job_business_date, and only for historical files this class here provides the method business_date. (The latter implements the former.) They retrieve the (business) date of the last record in this file.

Currently there is only one class making use of this for an embedded object: vendor_jpmorgan_file_class

2.1.4. xml_class

no corresponding subroutine in non-OO version

Covers xslt transformations.

Currently there is only one class making use of this for an embedded object: vendor_wm_file_class

2.1.5. rdbms_class

no corresponding subroutine in non-OO version

Covers SQL queries to relational data bases.

Currently there is only one class making use of this for an embedded object: vendor_ids_file_class

2.2. others

The following classes do not make use of smart base classes:

vendor_hsbc_file_class, vendor_ml_file_class, vendor_moodys_file_class, vendor_msci_file_class, vendor_spcompustat_file_class [1] , vendor_spr_file_class, vendor_stx_file_class, vendor_thomson_file_class, vendor_topix_file_class

2.3. discussing design ideas

2.3.1. the embedded object approach -- why do you use this level of indirection?

The indirection can (probably) in theory certainly be avoided entirely, but you would have to pay a high price for it.

As long as you only defer a few standard cases into the embedded object, you can handle the remaining cases in conditional branches, with similar skeletons within the constructor and most of the methods.

If you prefer to have the conditional branches skeleton only once (i.e. within the constructor), you can obviously create a lot of classes, one for each branch condition, and then within the constructor you actually don't construct an embedded object and simply leave out the intermediate level.

Right, an example might help here ...

[1] this is actually a very nice example to copy&paste from


CSV file

CSV stands for “comma separated values”, assuming that the comma character usually gets chosen as the separator, but this is not generally the case.

A flat file representing a table, where each column value of a row resp. record is separated from the next one. If the column value itself contains the separator, you enclose the column value in double quotes. If the column value contains a double quote itself, ...

embedded object

A perl class (resp. an object of such a class) might want to behave sometimes like (an object of) class a and sometimes like class b, depending on how it was to to behave during construction time.

This sounds a little like multiple inheritance, but not entirely.

For this OO-technique you actually use a class attribute, that is actually an object itself, created as of class a resp. ..., to which the calls to the class method actually get deferred to.

historical file

A file with one record per each (business) day of the entire history of the records.