xlw 2.1 : a system for building xlls without pain

Mark S. Joshi


Date: May 11, 2008


Centre for Actuarial Studies, Department of Economics, University of Melbourne, Victoria 3010, Australia

mark@markjoshi.com


Key words and phrases: EXCEL plug-in, xlw, parser

1 Introduction

Much numerical coding in quantitative finance is carried out in C++. Much numerical work in finance is performed in EXCEL. It has therefore become common to work with EXCEL plug-ins, that is with programs that add functions to EXCEL, which allows easy input and output to spreadsheets.

There are a number of ways to do this. For example, it can be done using the COM framework. The most venerable way to do it is using EXCEL's C API (application protocol interface), whilst this is many ways archaic and Microsoft has not changed it since early versions of EXCEL, it has the virtue that its stability has led to familiarity and various packages have been develop to wrap it. One such package is xlw (xlw.sourceforge.net). The xlw package provides an easy framework for writing such plug-ins as dlls (dynamics link libraries) commonly known as xlls.

The xlw 2.1 package is an extension of xlw; its basic framework is the same but various changes have been added to improve usability. The largest of these is that rather than coding the registration and data conversion code in C++, a parser routine has been written that takes in a header file and output a C++ source file with all the registration and data conversion already written. This means that a user needs virtually no specialist knowledge in order to convert any C++ routine into an xll.

In addition, the original xlw was written to work with Visual Studio 6.0. Support for Visual Studio 6.0 has been discontinued; this version comes with project files for Visual Studio 7.1, 8.0, and 9.0, as well as for DevCpp 4.9.9.2. DevCpp is a free IDE for the MinGW g++ compiler, and so XLLs can be built using g++.

A number of other more minor changes have been made: throws of strings and similar objects are caught and passed back to EXCEL; failure of arguments to convert to the correct data type throw strings identifying the bad argument rather than ``#VALUE''; and new data types of MyMatrix and CellMatrix are included. CellMatrix in particular allows passing in a table of any cell-type that will arise, i.e., numbers, strings, bools and error codes. It is also been made easy to add customized data types provided they can be created from existing types.

2 Usage

Before using xlw 2.1, one first has to build the xlw 2.1 library and the interface generator. The interface generator project can be found in directory appropriate for your compiler:

This project should be built and will produce a console application called InterfaceGenerator.exe. Note one can use the version of this application built with any one compiler with any other compiler without trouble.

Second one needs to build the xlw 2.1 library to link against. The project files are in the same place as for the console application.

For each compiler, an example project called ExampleAutogenerated is given, defining functions to be exported to the xll. Each project contains a header file Test.h, a source file Test.cpp and an interface file xlwTest.cpp, these are contained in the folder examples/autogenerated. Some files for payoffs can also be found there, and example spreadsheets.

It is the interface file xlwTest.cpp that has been automatically generated. To re-generate it, simply ensure that InterfaceGenerator.exe is in the path or in the same directory as Test.h and then at a command prompt type ``InterfaceGenerator Test.h''.

The Visual Studio solutions contain an NMAKE project RunInterfaceGenerator which detects changes to the inputs of InterfaceGenerator and reruns it automatically as needed.

Simply building the xll project will then produce an xll, which can be opened in EXCEL and produces extra functions in a library called ``MyTestLibrary.''

To use xlw 2.1 for your own functions, one should first write a C++ function which compiles and builds except for the interfacing code. The functions to be exported to EXCEL should be contained in header files which contain nothing else. The InterfaceGenerator should then be applied to them. If the header file is called ``MyFile.h'', the new file will be called ``xlwMyFile.cpp''. The new file should then be added to the project. Note InterfaceGenerator will ignore any preprocessor commands, and will throw an error if the header file contains any classes or function definitions. It will also protest if any unknown data types are found.

The information for the function wizard in EXCEL is taken from comments and the names of argument variables in the header file. This means that arguments must be named. A comment should follow each argument name and this will appear in the function wizard when that argument is being entered. The general description of the function should be in a comment between the type of the function and its name.

Arguments can be passed by reference or by value, and can be const or non-const. (In fact, these have no effect on the coding of the interface file.)

Once the interface file has been added to the project, one simply builds the project and then the output xll file should be openable by EXCEL. Note that one can have any number of interface files in the same xll project.

If you wish to create a new xll project, this can be done. The things to do are

Note if you are working with Visual Studio 8.0 Express, in addition, you must do the following:

Note some users have added a custom build step that causes the InterfaceGenerator tool to run automatically.

3 Step by step guide for visual studio 8.0 express users

This was written by a user, and goes into more detail on how to make a new project using xlw.

4 Basic Data-types

The function to be exported to EXCEL can only use data-types supported by the interface generator. These are divisible into basic data types: double, short, NEMatrix, MyMatrix, MyArray, CellMatrix, string, std::string, and bool, and extended data types: int, unsigned long, ArgumentList, DoubleOrNothing, PayOff.

Note the XLOPER type has two numeric data types which are essentially short and double so other numeric types go via double.

The class MyMatrix is defined via a typedef in MyContainers.h to MJMatrix. You can use change this to your favourite matrix type. The matrix class must support the following: it should have .rows() and .columns() defined; a constructor that takes number of rows and columns; and elements should be accessible via a[i][j]. If your matrix class only supports element access via round brackets, you should define the macro USE_PARENTHESES.

The class NEMatrix is a typedef for MyMatrix but if you declare an argument to be of this type then the function will not be called unless the argument is a non-empty matrix of numbers. (Otherwise, you get #VALUE.) If you are working with very large matrices it should be more stable as the data type is much simpler, (it uses type K for passing instead of type P).

The class MyArray is also defined via a typedef in MyContainers.h. The default is to typedef to std::vector. It must have .size(), a constructor taking the size and operator[] defined.

CellMatrix is a new class declared in CellMatrix.h. It abstracts the idea of a table of cell values in EXCEL. Thus each entry in the matrix can have a value that is a string, number, bool, error value or empty. This facilitates the passing of tables of mixed numbers and strings. Conversion to a CellMatrix should virtually never fail since it allows error codes.

The types std::string and string are both allowed. These are the same class and the difference is simply in whether the namespace std has already been declared via using.

5 Extended data types

xlw 2.1 has been designed to make it easy to work with your own data types. The only constraint is that a function (or method) must exist that takes in a data-type that is already constructible from basic types, and creates the new type. For this purpose, a constructor is equivalent to a function. One simply adds a declaration in the file TypeRegistrations.cpp.

For example,

TypeRegistry::Helper arglistreg("ArgumentList",         // new type
                           "CellMatrix",                // old type
                           "ArgumentList",              // converter name
                           false,                       // is a method
                           true,                        // takes identifier 
                           "",                          // no key
                           "<xlw/ArgList.h>"            // force inclusion of this file
                           );


TypeRegistry::Helper payoffreg("Wrapper<PayOff>",       // new type
                           "ArgumentList",              // old type
                           "GetFromFactory<PayOff>",    // converter name
                           false,                       // is a method
                           false,                       // takes identifier 
                           "" ,                         // no key
                          "<xlw/ArgListFactory.h>"
                           );

The first argument is the identifier for the new type.

The second is the type which the new type is constructed from.

The third is the function or method used to construct the new type from the old one.

The first bool is to specify whether the conversion function is a method of the old class, or simply a function or constructor that takes in an object of the old class.

The second bool indicates whether the converter method or function takes in a second argument which is a string expressing an identifier in case of error - this is very handy when trying to work out which argument in your complicated function is dubious.

The key is to tell EXCEL the type, this is generally only used when converting from XlfOper, i.e. when defining a basic type. This is typically ``R'' or ``P''. Doubles are passed as type ``B'' and non-empty matrices as ``K.''

The last argument allows the forcing of extra #includes in your .cpp interface file. This allows you to ensure that the conversion function is available.

You define new types from other new types. The maximum depth is 26, at which point the parser concludes that you have accidentally created a loop.

The three main data types that have been added for illustration are the DoubleOrNothing, ArgumentList and Wrapper<PayOff>.

The DoubleOrNothing class allows one distinguish between a number passed in or an empty argument. One can therefore choose between a number passed in, and a default value if the argument is empty.

The argument list takes in a table of cells and turns it into a collection of arguments. This allows the passing of a variable number of arguments within one function argument. The top left of the table must be the name of the structure e.g. PayOff. Within the table, the types number, string, array, matrix, cells and list can be used. For each of these one gives a name and then the data below. For an array, the size of the array should be specified directly below the name. For a matrix, cells, and list the number of rows and columns should be given directly below. The type cells is simply a matrix of cell values. The type list is an ArgumentList. The use of ArgumentLists inside ArgumentLists is very useful for doing decoration and composite patterns. Note that everything passed in to an ArgumentList is put into lower case.

The ArgumentList is particularly powerful with factories. We have provided a templatized factory that takes in an ArgumentList, and can be used for arbitrary types. The only restrictions are that the inherited types have constructors that take in an ArgumentList, and that they have a common base class. We illustrate this with the PayOff class.

The factory returns a raw pointer to the base class, so this should be immediately converted to a smart pointer (if you don't you will get memory leaks) so we use a polymorphic copying pointer, Wrapper. However, one could use a boost::shared_ptr or a std::auto_ptr. Our new data type is therefore

Wrapper< PayOff >

, which takes ownership and ensures deletion at the appropriate time.

Examples of how to use this class are given in the spreadsheet examples/autogenerated/Example.xls.

6 xlw commands

The default name of the Library in Excel is the name of the header file. You can change this by

//<xlw:libraryname=MyTestLibrary

in the header file. Note that all functions in the header file will have the same library name which will be that specified by the last libraryname command.

If you want a volatile function then this can be done as follows:

double // system clock
//<xlw:volatile    
SystemTime(DoubleOrNothing ticksPerSecond // number to divide by
                   );

A volatile function is a function whose value can change even if the arguments have not - typically random numbers or time.

You can also time functions by inserting

//<xlw:time
in the same place. (Yes, volatile functions can be timed, the order of the two commands is not important.)

7 The interface file

It is not necessary to ever look at the interface file. However, it can be edited directly if so desired. It first has a DummyFunction declared in an unnamed namespace. This function references the functions xlAutoOpen and xlAutoClose and thus forces their inclusion in the xll. These functions carry out the registration of the other functions with EXCEL and are therefore essential.

There is a line

const char* LibraryName = "MyTestLibrary";
This specifies the name of the library in the EXCEL function wizard. This is enclosed in an anonymous namespace so you can have multiple interface files.

For each function, there are two parts. The first is the registration information. The second is the wrapper function called between EXCEL and the function chosen.

An example of the registration information is

namespace
{
XLRegistration::Arg
ConcatArgs[]=
{
{ "str1"," first string "},
{ "str2","second string "}
};
  
XLRegistration::XLFunctionRegistrationHelper
registerConcat("xlConcat",
"Concat",
" Concatenates two strings ",
LibraryName,
ConcatArgs,
"RR");
}

The code is placed in an unknown namespace to ensure it does not affect any linkage. The arguments are declared in the first part, with the name of each followed by its description.

In the second part, a global variable is declared. The creation of this global variable registers the function with a global singleton, which ensures that it is registered with EXCEL. This approach allows the registration to be split across many files. The information passed to the constructor is the name of the function in C++ in the interface file, the name of the function in the function in EXCEL, the function description, the name of the library in EXCEL, the arguments declared above and the types of the arguments. The types are expressed via a code: e.g.

In xlw 2.1, only the types P, B, K, and R are used. P is used for CellMatrix and MyMatrix. K is used for NEMatrix. B for double. R is used for all other basic types. Types such as bool are therefore first passed in as LPXLOPERs and then transformed into the right data type.

An example of the interface function definition is

extern "C"
{
LPXLOPER EXCEL_EXPORT
xlConcat(
LPXLOPER xlstr1_,
LPXLOPER xlstr2_)
{
EXCEL_BEGIN;

if (XlfExcel::Instance().IsCalledByFuncWiz())
     return XlfOper(true);

XlfOper xlstr1(xlstr1_);
std::string str1(xlstr1.AsString("str1"));

XlfOper xlstr2(xlstr2_);
std::string str2(xlstr2.AsString("str2"));

std::string result(
        Concat(
                str1,
                str2)
        );
return XlfOper(result);
EXCEL_END
}
}

The extern `C' command is necessary as we are using the C API and therefore must use C linkage.

The return type of the function is always LPXLOPER but since this is a polymorphic data type, this is not a hindrance. EXCEL_EXPORT is a macro:

#define EXCEL_EXPORT __declspec(dllexport)
This ensures that the function is a dll export, and so can be dynamically linked against.

Note that types are passed in as LPXLOPERs not XlfOpers or other types. XlfOpers are not used since they cause a crash with the MinGW compiler as they are not POD objects.

The macros EXCEL_BEGIN and EXCEL_END contain starting and finishing information common to all function. In particular, EXCEL_END contains catches for common data types to return information to EXCEL.

The routine then checks if it is being called from the function wizard. If it is, then it immediately returns to EXCEL. This avoids time consuming computations being called whilst data is being entered.

Each argument is then converted. First to an XlfOper, and then using the .As methods to the correct type. Note a string is passed into the .As method, this allows a throw to identify the offending argument. If you use extended types there will be a string of conversions here.

Once the arguments have been converted, the original function is called and the result stored. For return to EXCEL, it is converted into an XlfOper and this is returned as an LPXLOPER.

8 The interface generator

The interface generator is written as a simple C++ routine. It is a console application that takes in one argument. The output file name is an optional argument and it defaults to adding ``xlw'' on the front and ``.cpp'' at the end.

The routine first reads in the file and places it in a vector of chars for convenience. It is then turned into tokens. Each token will be an identifier, preprocessor directive, comment or delimiter.

Tokens corresponding to consts and ampersands are then removed, since they will not affect the coding of the interface routine. At this stage, unsigned identifiers are also dealt with.

Once this has been done, the file is turned into a list of functions, with each function having a list of arguments with names and types.

The next operation is to identify all the types and find the conversion routines.

Once this has been done, the output file is written into a vector, and then written to a file.

9 Troubleshooting

If you find that you can build the xll but that the functions do not register. Here are some common problems with xlls in general.

If absolutely nothing happens then check security settings for macros. The default setting is to ignore files containing macros. Note for a reasonable level of security setting, you will get asked whether to enable macros, if this does not happen then security levels are too high (or too low!)

If you get the unrecognizable format error, possibilities are

The dumpbin utility can be used to check whether the right functions are being exported. If you have followed the instructions here, this should not be an issue.


erik 2008-05-11