Release Info: $Id: book.xml 1.48 2007/07/17 11:46:00 johayek Exp $
Publication Date: 2007
abstract
Files from market data suppliers come in various different formats, some of them look quite like a CSV file, others look quite different to CSV files.
We have a utility (resp. a family of utilities), capable of reformatting a lot of files from various of those suppliers towards a standard-ish CSV file format.
This kind of utility often gets called parser or similarly, but that's not quite the right name for it, as it does not only parse date, but it also rewrites (i.e. re-writes) the data for further use.
You were not satisfied with a statement like “yes, this is quite a well-formed file according to that format”, but you also want to process the data with yet another utility, right?
You may find this text a little terse, and we agree to that, but the software itself runs pretty stably and more documentation is easily and very willingly written, once the demand for it rises.
Contact us!
Impressum (aka german legalese)
Verantwortlich für diese Web-Seiten („Homepage“): Aleph Soft GmbH, Jochen Hayek, Augsburger Straße 33, D-10789 Berlin
Disclaimer
Mit dem Urteil vom 12. Mai 1998 - 312 O 85/98 - "Haftung für Links" hat das Landgericht Hamburg entschieden, dass man durch einen Link auf eine andere Homepage deren Inhalte ggf. mit zu verantworten hat. Dies kann laut LG nur verhindert werden, indem man sich ausdrücklich von diesen Inhalten distanziert. Hiermit distanzieren wir uns ausdrücklich von sämtlichen Inhalten aller von uns per Link angebotenen fremden Seiten.
Table of Contents
Table of Contents
... www.bloomberg.com ...
... www.citigroup.com ...
... www.iBoxx.com ...
... www.JPMorgan.com ...
... www.Lehman.com ...
... www.ML.com ...
... www.Moodys.com ...
... www.MSCI.com ...
... www.StandardAndPoors.com ...
... www.STOXX.com ...
... www.Thomson.com ...
... www.Wertpapiermitteilung.com ...
We make use of the table WM-Felder
as a data dictionary
in order to print the parsed fields conveniently formatted into a CSV file records.
Documentation from that company comes as zipped Access data base files,
the zip-files named like WMDOK*-*.zip
the Access data base files named like WMDOK*-*.mde
.
When you open such a data base file, a data base macro starts up, that presents the data base as a form controlled application, but actually the data base consists of:
tables (Tabellen) |
querys (Abfragen) |
forms (Formulare) |
reports (Berichte) |
macros (Makros) |
modules (Module) |
One table is called WM-Felder. The name says it all, it's a description of each and every field within the files, they provide you with (one row in that table for a field in such a file), plus a few extras, presented as pseudo fields. All the overall documentation is actually stuffed into pseudo fields resp. pseudo field descriptions.
Humans read those descriptions best via the homonymous report WM-Felder, but the table itself is the ideal basis for mechanical interpretation of the fields of their delivered files, and that's, what we actual do: We dump the table as CSV file, and load it as a sort of data (type) dictionary. Actually we don't dump all the columns, as we only need a couple of them, that makes the data structure quite lean.
The first record is called (i.e. its column Langbezeichnung has a value of ...) VORLAUFSATZ/VF, the FeldidentVF equals -EINL10.
The last record is called (i.e. its column Langbezeichnung has a value of ...) DATENENDESATZ/VF, the FeldidentVF equals -EINL12.
Table of Contents
Provides such methods as
test_existence
, modification_time
, _open
The methods test_existence
and modification_time
enable the general availability of the homonymous jobs,
so in case your class derives from file_class
,
you don't need to implement test_existence
and modification_time
yourself,
as you inherited them already.
This class is for CSV files, and you can specify, how dates look, that you want to get reformatted, and also in which line the body starts.
The following classes simply derive from this class:
vendor_djindexes_file_class
, vendor_impax_file_class
, vendor_telekurs_file_class
, vendor_thetakeoverpanel_file_class
The following classes make use of this class for an embedded object:
vendor_bloomberg_file_class
, vendor_citigroup_file_class
, vendor_iboxx_file_class
, vendor_indexco_file_class
, vendor_jpmorgan_file_class
This class is also for CSV files,
so it looks pretty much like its sister class simple_csv_file_class
,
but it is dedicated esp. to historical files,
Only for historical files
files.pl provides the method job_business_date
,
and only for historical files this class here provides the method business_date
.
(The latter implements the former.)
They retrieve the (business) date of the last record in this file.
Currently there is only one class making use of this for an embedded object: vendor_jpmorgan_file_class
Covers xslt transformations.
Currently there is only one class making use of this for an embedded object: vendor_wm_file_class
Covers SQL queries to relational data bases.
Currently there is only one class making use of this for an embedded object: vendor_ids_file_class
The following classes do not make use of smart base classes:
vendor_hsbc_file_class
, vendor_ml_file_class
, vendor_moodys_file_class
, vendor_msci_file_class
,
vendor_spcompustat_file_class
[1]
, vendor_spr_file_class
, vendor_stx_file_class
, vendor_thomson_file_class
, vendor_topix_file_class
The indirection can (probably) in theory certainly be avoided entirely, but you would have to pay a high price for it.
As long as you only defer a few standard cases into the embedded object, you can handle the remaining cases in conditional branches, with similar skeletons within the constructor and most of the methods.
If you prefer to have the conditional branches skeleton only once (i.e. within the constructor), you can obviously create a lot of classes, one for each branch condition, and then within the constructor you actually don't construct an embedded object and simply leave out the intermediate level.
Right, an example might help here ...
CSV stands for “comma separated values”, assuming that the comma character usually gets chosen as the separator, but this is not generally the case.
A flat file representing a table, where each column value of a row resp. record is separated from the next one. If the column value itself contains the separator, you enclose the column value in double quotes. If the column value contains a double quote itself, ...
A perl class (resp. an object of such a class)
might want to behave sometimes like (an object of) class a
and sometimes like class b
,
depending on how it was to to behave during construction time.
This sounds a little like multiple inheritance, but not entirely.
For this OO-technique you actually use a class attribute,
that is actually an object itself, created as of class a
resp. ...,
to which the calls to the class method actually get deferred to.
A file with one record per each (business) day of the entire history of the records.