Initialization

Using and Porting GNU CC

16.16.5: How Initialization Functions Are Handled

The compiled code for certain languages includes constructors (also called initialization routines)---functions to initialize data in the program when the program is started. These functions need to be called before the program is ``started''---that is to say, before main is called.

Compiling some languages generates destructors (also called termination routines) that should be called when the program terminates.

To make the initialization and termination functions work, the compiler must output something in the assembler code to cause those functions to be called at the appropriate time. When you port the compiler to a new system, you need to specify how to do this.

There are two major ways that GCC currently supports the execution of initialization and termination functions. Each way has two variants. Much of the structure is common to all four variations.

The linker must build two lists of these functions---a list of initialization functions, called __CTOR_LIST__, and a list of termination functions, called __DTOR_LIST__.

Each list always begins with an ignored function pointer (which may hold 0, -1, or a count of the function pointers after it, depending on the environment). This is followed by a series of zero or more function pointers to constructors (or destructors), followed by a function pointer containing zero.

Depending on the operating system and its executable file format, either `crtstuff.c' or `libgcc2.c' traverses these lists at startup time and exit time. Constructors are called in forward order of the list; destructors in reverse order.

The best way to handle static constructors works only for object file formats which provide arbitrarily-named sections. A section is set aside for a list of constructors, and another for a list of destructors. Traditionally these are called `.ctors' and `.dtors'. Each object file that defines an initialization function also puts a word in the constructor section to point to that function. The linker accumulates all these words into one contiguous `.ctors' section. Termination functions are handled similarly.

To use this method, you need appropriate definitions of the macros ASM_OUTPUT_CONSTRUCTOR and ASM_OUTPUT_DESTRUCTOR. Usually you can get them by including `svr4.h'.

When arbitrary sections are available, there are two variants, depending upon how the code in `crtstuff.c' is called. On systems that support an init section which is executed at program startup, parts of `crtstuff.c' are compiled into that section. The program is linked by the gcc driver like this:

ld -o output_file crtbegin.o ... crtend.o -lgcc

The head of a function (__do_global_ctors) appears in the init section of `crtbegin.o'; the remainder of the function appears in the init section of `crtend.o'. The linker will pull these two parts of the section together, making a whole function. If any of the user's object files linked into the middle of it contribute code, then that code will be executed as part of the body of __do_global_ctors.

To use this variant, you must define the INIT_SECTION_ASM_OP macro properly.

If no init section is available, do not define INIT_SECTION_ASM_OP. Then __do_global_ctors is built into the text section like all other functions, and resides in `libgcc.a'. When GCC compiles any function called main, it inserts a procedure call to __main as the first executable code after the function prologue. The __main function, also defined in `libgcc2.c', simply calls `__do_global_ctors'.

In file formats that don't support arbitrary sections, there are again two variants. In the simplest variant, the GNU linker (GNU ld) and an `a.out' format must be used. In this case, ASM_OUTPUT_CONSTRUCTOR is defined to produce a .stabs entry of type `N_SETT', referencing the name __CTOR_LIST__, and with the address of the void function containing the initialization code as its value. The GNU linker recognizes this as a request to add the value to a ``set''; the values are accumulated, and are eventually placed in the executable as a vector in the format described above, with a leading (ignored) count and a trailing zero element. ASM_OUTPUT_DESTRUCTOR is handled similarly. Since no init section is available, the absence of INIT_SECTION_ASM_OP causes the compilation of main to call __main as above, starting the initialization process.

The last variant uses neither arbitrary sections nor the GNU linker. This is preferable when you want to do dynamic linking and when using file formats which the GNU linker does not support, such as `ECOFF'. In this case, ASM_OUTPUT_CONSTRUCTOR does not produce an N_SETT symbol; initialization and termination functions are recognized simply by their names. This requires an extra program in the linkage step, called collect2. This program pretends to be the linker, for use with GNU CC; it does its job by running the ordinary linker, but also arranges to include the vectors of initialization and termination functions. These functions are called via __main as described above.

Choosing among these configuration options has been simplified by a set of operating-system-dependent files in the `config' subdirectory. These files define all of the relevant parameters. Usually it is sufficient to include one into your specific machine-dependent configuration file. These files are:

`aoutos.h': For operating systems using the `a.out' format.
`next.h': For operating systems using the `MachO' format.
`svr3.h': For System V Release 3 and similar systems using `COFF' format.
`svr4.h': For System V Release 4 and similar systems using `ELF' format.
`vms.h': For the VMS operating system.

The following section describes the specific macros that control and customize the handling of initialization and termination functions.