previous
index
next

Calling Conventions

Before we go any further

It is important to understand that this section isn’t a general purpose description of the present calling conventions. It merely explains the calling conventions for the parameter/return types supported by dyncall, not for aggregates (structures, unions and classes), SIMD data types (__m64, __m128, __m128i, __m128d), etc.
We strongly advise the reader not to use this document as a general purpose calling convention reference.

x86 Calling Conventions

Overview

There are numerous different calling conventions on the x86 processor architecture, like cdecl [8], MS fastcall [10], GNU fastcall [11], Borland fastcall [12], Watcom fastcall [13], Win32 stdcall [9], MS thiscall [14], GNU thiscall [15], the pascal calling convention [16] and a cdecl-like version for Plan9 [17] (dubbed plan9call by us), etc.
dyncall support

Currently cdecl, stdcall, fastcall (MS and GNU), thiscall (MS and GNU) and plan9call are supported.

cdecl

Registers and register usage
Name Brief description


eax scratch, return value
ebx permanent
ecx scratch
edx scratch, return value
esi permanent
edi permanent
ebp permanent
esp stack pointer
st0 scratch, floating point return value
st1-st7 scratch
Table 13: Register usage on x86 cdecl calling convention
Parameter passing Return values Stack layout

Stack directly after function prolog:

               |------------------------|
               | ...                      |
               |------------------------|                  )
       local data(|------------------------|)                 ||||
             { | ...                     |}                 }
parameter area ( | ...                     |) stack parameters  || caller’s frame
               |-...---------------------|                  ||)
               |-return address-----------|                  )
       local data|------------------------|                  }
  parameter area|------------------------|                  ) current frame
               | ...                      |
                -------------------------
Figure 1: Stack layout on x86 cdecl calling convention

MS fastcall

Registers and register usage
Name Brief description


eax scratch, return value
ebx permanent
ecx scratch, parameter 0
edx scratch, parameter 1, return value
esi permanent
edi permanent
ebp permanent
esp stack pointer
st0 scratch, floating point return value
st1-st7 scratch
Table 14: Register usage on x86 fastcall (MS) calling convention
Parameter passing Return values Stack layout

Stack directly after function prolog:

               |------------------------|
               | ...                      |
               |------------------------|                  )
       local data(|------------------------|)                 ||||
             { | ...                     |}                 }
parameter area ( | ...                     |) stack parameters  || caller’s frame
               |-...---------------------|                  ||)
               |-return address-----------|                  )
       local data|------------------------|                  }
  parameter area|------------------------|                  ) current frame
               | ...                      |
                -------------------------
Figure 2: Stack layout on x86 fastcall (MS) calling convention

GNU fastcall

Registers and register usage
Name Brief description


eax scratch, return value
ebx permanent
ecx scratch, parameter 0
edx scratch, parameter 1, return value
esi permanent
edi permanent
ebp permanent
esp stack pointer
st0 scratch, floating point return value
st1-st7 scratch
Table 15: Register usage on x86 fastcall (GNU) calling convention
Parameter passing Return values Stack layout

Stack directly after function prolog:

               |------------------------|
               | ...                      |
               |------------------------|                  )
       local data(|------------------------|)                 ||||
             { | ...                     |}                 }
parameter area ( | ...                     |) stack parameters  || caller’s frame
               |-...---------------------|                  ||)
               |-return address-----------|                  )
       local data|------------------------|                  }
  parameter area|------------------------|                  ) current frame
               | ...                      |
                -------------------------
Figure 3: Stack layout on x86 fastcall (GNU) calling convention

Borland fastcall

Registers and register usage
Name Brief description


eax scratch, parameter 0, return value
ebx permanent
ecx scratch, parameter 2
edx scratch, parameter 1, return value
esi permanent
edi permanent
ebp permanent
esp stack pointer
st0 scratch, floating point return value
st1-st7 scratch
Table 16: Register usage on x86 fastcall (Borland) calling convention
Parameter passing Return values Stack layout

Stack directly after function prolog:

               |------------------------|
               | ...                      |
               |------------------------|                  )
       local data(|------------------------|)                 ||||
             { | ...                     |}                 }
parameter area ( | ...                     |) stack parameters  || caller’s frame
               |-...---------------------|                  ||)
               |-return address-----------|                  )
       local data|------------------------|                  }
  parameter area|------------------------|                  ) current frame
               | ...                      |
                -------------------------
Figure 4: Stack layout on x86 fastcall (Borland) calling convention

Watcom fastcall

Registers and register usage
Name Brief description


eax scratch, parameter 0, return value@@@
ebx scratch when used for parameter, parameter 2
ecx scratch when used for parameter, parameter 3
edx scratch when used for parameter, parameter 1, return value@@@
esi scratch when used for return pointer @@@??
edi permanent
ebp permanent
esp stack pointer
st0 scratch, floating point return value
st1-st7 scratch
Table 17: Register usage on x86 fastcall (Watcom) calling convention
Parameter passing Return values Stack layout

Stack directly after function prolog:

               |------------------------|
               | ...                      |
               |------------------------|                  )
       local data(|------------------------|)                 ||||
             { | ...                     |}                 }
parameter area ( | ...                     |) stack parameters  || caller’s frame
               |-...---------------------|                  ||)
               |-return address-----------|                  )
       local data|------------------------|                  }
  parameter area|------------------------|                  ) current frame
               | ...                      |
                -------------------------
Figure 5: Stack layout on x86 fastcall (Watcom) calling convention

win32 stdcall

Registers and register usage
Name Brief description


eax scratch, return value
ebx permanent
ecx scratch
edx scratch, return value
esi permanent
edi permanent
ebp permanent
esp stack pointer
st0 scratch, floating point return value
st1-st7 scratch
Table 18: Register usage on x86 stdcall calling convention
Parameter passing Return values Stack layout

Stack directly after function prolog:

               |------------------------|
               | ...                      |
               |------------------------|                  )
       local data(|------------------------|)                 ||||
             { | ...                     |}                 }
parameter area ( | ...                     |) stack parameters  || caller’s frame
               |-...---------------------|                  ||)
               |-return address-----------|                  )
       local data|------------------------|                  }
  parameter area|------------------------|                  ) current frame
               | ...                      |
                -------------------------
Figure 6: Stack layout on x86 stdcall calling convention

MS thiscall

Registers and register usage
Name Brief description


eax scratch, return value
ebx permanent
ecx scratch, parameter 0
edx scratch, return value
esi permanent
edi permanent
ebp permanent
esp stack pointer
st0 scratch, floating point return value
st1-st7 scratch
Table 19: Register usage on x86 thiscall (MS) calling convention

Parameter passing

Return values Stack layout

Stack directly after function prolog:

               |------------------------|
               | ...                      |
               |------------------------|                  )
       local data(|------------------------|)                 ||||
             { | ...                     |}                 }
parameter area ( | ...                     |) stack parameters  || caller’s frame
               |-...---------------------|                  ||)
               |-return address-----------|                  )
       local data|------------------------|                  }
  parameter area|------------------------|                  ) current frame
               | ...                      |
                -------------------------
Figure 7: Stack layout on x86 thiscall (MS) calling convention

GNU thiscall

Registers and register usage
Name Brief description


eax scratch, return value
ebx permanent
ecx scratch
edx scratch, return value
esi permanent
edi permanent
ebp permanent
esp stack pointer
st0 scratch, floating point return value
st1-st7 scratch
Table 20: Register usage on x86 thiscall (GNU) calling convention
Parameter passing Return values Stack layout

Stack directly after function prolog:

               |------------------------|
               | ...                      |
               |------------------------|                  )
       local data(|------------------------|)                 ||||
             { | ...                     |}                 }
parameter area ( | ...                     |) stack parameters  || caller’s frame
               |-...---------------------|                  ||)
               |-return address-----------|                  )
       local data|------------------------|                  }
  parameter area|------------------------|                  ) current frame
               | ...                      |
                -------------------------
Figure 8: Stack layout on x86 thiscall (GNU) calling convention

pascal

The best known uses of the pascal calling convention are the 16 bit OS/2 APIs, Microsoft Windows 3.x and Borland Delphi 1.x. Registers and register usage

Name Brief description


eax scratch, return value
ebx permanent
ecx scratch
edx scratch, return value
esi permanent
edi permanent
ebp permanent
esp stack pointer
st0 scratch, floating point return value
st1-st7 scratch
Table 21: Register usage on x86 pascal calling convention
Parameter passing Return values Stack layout

Stack directly after function prolog:

               |------------------------|
               | ...                      |
               |------------------------|                  )
       local data(|------------------------|)                 ||||
             { | ...                     |}                 }
parameter area ( | ...                     |) stack parameters  || caller’s frame
               |-...---------------------|                  ||)
               |-return address-----------|                  )
       local data|------------------------|                  }
  parameter area|------------------------|                  ) current frame
               | ...                      |
                -------------------------
Figure 9: Stack layout on x86 pascal calling convention

plan9call

Registers and register usage
Name Brief description


eax scratch, return value
ebx scratch
ecx scratch
edx scratch
esi scratch
edi scratch
ebp scratch
esp stack pointer
st0 scratch, floating point return value
st1-st7 scratch
Table 22: Register usage on x86 plan9call calling convention
Parameter passing Return values Stack layout

Stack directly after function prolog:

               |------------------------|
               | ...                      |
               |------------------------|                  )
       local data(|------------------------|)                 ||||
             { | ...                     |}                 }
parameter area ( | ...                     |) stack parameters  || caller’s frame
               |-...---------------------|                  ||)
               |-return address-----------|                  )
       local data|------------------------|                  }
  parameter area|------------------------|                  ) current frame
               | ...                      |
                -------------------------



Figure 10: Stack layout on x86 plan9call calling convention

x64 Calling Convention

Overview

The x64 (64bit) architecture designed by AMD is based on Intel’s x86 (32bit) architecture, supporting it natively. It is sometimes referred to as x86-64, AMD64, or, cloned by Intel, EM64T or Intel64.
On this processor, a word is defined to be 16 bits in size, a dword 32 bits and a qword 64 bits. Note that this is due to historical reasons (terminology didn’t change with the introduction of 32 and 64 bit processors).
The x64 calling convention for MS Windows [24] differs from the SystemV x64 calling convention [25] used by Linux/*BSD/... Note that this is not the only difference between these operating systems. The 64 bit programming model in use by 64 bit windows is LLP64, meaning that the C types int and long remain 32 bits in size, whereas long long becomes 64 bits. Under Linux/*BSD/... it’s LP64.

Compared to the x86 architecture, the 64 bit versions of the registers are called rax, rbx, etc.. Furthermore, there are eight new general purpose registers r8-r15.
dyncall support

dyncall supports the MS Windows and System V calling convention.

MS Windows

Registers and register usage
Name Brief description


rax scratch, return value
rbx permanent
rcx scratch, parameter 0 if integer or pointer
rdx scratch, parameter 1 if integer or pointer
rdi permanent
rsi permanent
rbp permanent, may be used as frame pointer
rsp stack pointer
r8-r9 scratch, parameter 2 and 3 if integer or pointer
r10-r11 scratch, permanent if required by caller (used for syscall/sysret)
r12-r15 permanent
xmm0 scratch, floating point parameter 0, floating point return value
xmm1-xmm3 scratch, floating point parameters 1-3
xmm4-xmm5 scratch, permanent if required by caller
xmm6-xmm15 permanent
Table 23: Register usage on x64 MS Windows platform
Parameter passing Return values Stack layout

Stack frame is always 16-byte aligned. Stack directly after function prolog:

               |------------------------|
               | ...                      |
               |------------------------|                  )
       local data(|------------------------|)                 ||||
             |||| | ...                     |}                 ||||
             |||| | ...                     |) stack parameters  ||||
             { | ...                     |)                 }
parameter area || | r9 or xmm3              |||}                 || caller’s frame
             |||| | r8 or xmm2              |  spill area        ||||
             ||( | rdx or xmm1             |||)                 ||||
               |-rcx-or xmm0-------------|                  ||)
               |-return address-----------|                  )
       local data|------------------------|                  }
  parameter area|------------------------|                  ) current frame
               | ...                      |
                -------------------------
Figure 11: Stack layout on x64 Microsoft platform

System V (Linux / *BSD / MacOS X)

Registers and register usage
Name Brief description


rax scratch, return value
rbx permanent
rcx scratch, parameter 3 if integer or pointer
rdx scratch, parameter 2 if integer or pointer, return value
rdi scratch, parameter 0 if integer or pointer
rsi scratch, parameter 1 if integer or pointer
rbp permanent, may be used as frame pointer
rsp stack pointer
r8-r9 scratch, parameter 4 and 5 if integer or pointer
r10-r11 scratch
r12-r15 permanent
xmm0 scratch, floating point parameters 0, floating point return value
xmm1-xmm7 scratch, floating point parameters 1-7
xmm8-xmm15 scratch
st0-st1 scratch, 16 byte floating point return value
st2-st7 scratch
Table 24: Register usage on x64 System V (Linux/*BSD)
Parameter passing Return values Stack layout

Stack frame is always 16-byte aligned. Note that there is no spill area. Stack directly after function prolog:

               |------------------------|
               | ...                      |
               |------------------------|                  )
       local data(|------------------------|)                 ||||
             { | ...                     |}                 }
parameter area ( | ...                     |) stack parameters  || caller’s frame
               |-...---------------------|                  ||)
               |-return address-----------|                  )
       local data|------------------------|                  }
  parameter area|------------------------|                  ) current frame
               | ...                      |
                -------------------------
Figure 12: Stack layout on x64 System V (Linux/*BSD)

PowerPC (32bit) Calling Convention

Overview dyncall support

Dyncall and dyncallback are supported for PowerPC (32bit) Big Endian (MSB) on Darwin (tested on Apple Mac OS X) and Linux, however, fail for *BSD.

Mac OS X/Darwin

Registers and register usage
Name Brief description


gpr0 scratch
gpr1 stack pointer
gpr2 scratch
gpr3,gpr4 return value, parameter 0 and 1 for integer or pointer
gpr5-gpr10 parameter 2-7 for integer or pointer parameters
gpr11 permanent
gpr12 branch target for dynamic code generation
gpr13-31 permanent
fpr0 scratch
fpr1 floating point return value, floating point parameter 0 (always double precision)
fpr2-fpr13 floating point parameters 1-12 (always double precision)
fpr14-fpr31 permanent
v0-v1 scratch
v2-v13 vector parameters
v14-v19 scratch
v20-v31 permanent
lr scratch, link-register
ctr scratch, count-register
cr0-cr1 scratch
cr2-cr4 permanent
cr5-cr7 scratch
Table 25: Register usage on Darwin PowerPC 32-Bit
Parameter passing Return values Stack layout

Stack frame is always 16-byte aligned. Stack directly after function prolog:

               |------------------------|
               | ...                      |
               |------------------------|                     )
       local data(|------------------------|)                    ||||
             |||| | ...                     |}                    ||||
             ||{ | ...                     |) stack parameters     ||||
parameter area   | ...                     |)                    ||||
             |||| | ...                     |}                    ||||
             ||( | ...                     |) spill area (as needed) }
             ( |-gpr3-or fpr1-------------|                     || caller’s frame
             |||| | reserved                 |                     ||||
             ||{ | reserved                 |                     ||||
  linkage area   | reserved                 |                     ||||
             |||| | return address           |                     ||||
             ||( | reserved for callee         |                     ||)
               |-saved-by callee----------|                     )
       local data|------------------------|                     }
  parameter area|------------------------|                     ) current frame
     linkage area| ...                      |
                -------------------------
Figure 13: Stack layout on ppc32 Darwin

System V PPC 32-bit

Status Registers and register usage
Name Brief description


r0 scratch
r1 stack pointer
r2 system-reserved
r3-r4 parameter passing and return value
r5-r10 parameter passing
r11-r12 scratch
r13 Small data area pointer register
r14-r30 Local variables
r31 Used for local variables or environment pointer
f0 scratch
f1 parameter passing and return value
f2-f8 parameter passing
f9-13 scratch
f14-f31 Local variables
cr0-cr7 Conditional register fields, each 4-bit wide (cr0-cr1 and cr5-cr7 are scratch)
lr Link register (scratch)
ctr Count register (scratch)
xer Fixed-point exception register (scratch)
fpscr Floating-point Status and Control Register
Table 26: Register usage on System V ABI PowerPC Processor
Parameter passing Return values Stack layout

Stack frame is always 16-byte aligned. Stack directly after function prolog:

               |---------------------------|
               | ...                         |
               |---------------------------|                   )
       local data(|---------------------------| )                 ||||
             { | ...                        | }                 ||}
parameter area ( | ...                        | )stack parameters    caller’s frame
               |-...------------------------|                   ||||
               |-saved-return address-(for callee)                   ||)
               |-parent stack frame-pointer---|                   )
       local data|---------------------------|                   }
  parameter area|---------------------------|                   ) current frame
               | ...                         |
                ----------------------------



Figure 14: Stack layout on System V ABI for PowerPC 32-bit calling convention

PowerPC (64bit) Calling Convention

Overview dyncall support

Dyncall supports PowerPC (64bit) Big Endian and Little Endian ELF ABIs on System V systems (Linux, etc.), including syscalls. Mac OS X is not supported.

PPC64 ELF ABI

Registers and register usage

@@@ Parameter passing

@@@

Return values

@@@ Stack layout

@@@

ARM32 Calling Convention

Overview

The ARM32 family of processors is based on the Advanced RISC Machines (ARM) processor architecture (32 bit RISC). The word size is 32 bits (and the programming model is LLP64).
Basically, this family of microprocessors can be run in 2 major modes:

Mode Description


ARM 32bit instruction set
THUMB compressed instruction set using 16bit wide instruction encoding


For more details, take a look at the ARM-THUMB Procedure Call Standard (ATPCS) [18], the Procedure Call Standard for the ARM Architecture (AAPCS) [19], as well as the Debian ARM EABI port wiki [22].

dyncall support

Currently, the dyncall library supports the ARM and THUMB mode of the ARM32 family (ATPCS [18] and EABI [22]), excluding manually triggered ARM-THUMB interworking calls. Although it’s quite possible that the current implementation runs on other ARM processor families as well, please note that only the ARMv4t family has been thoroughly tested at the time of writing. Please report if the code runs on other ARM families, too.
It is important to note, that dyncall supports the ARM architecture calling convention variant with floating point hardware disabled (meaning that the FPA and the VFP (scalar mode) procedure call standards are not supported). This processor family features some instruction sets accelerating DSP and multimedia application like the ARM Jazelle Technology (direct Java bytecode execution, providing acceleration for some bytecodes while calling software code for others), etc. that are not supported by the dyncall library.

ATPCS ARM mode

Registers and register usage

In ARM mode, the ARM32 processor has sixteen 32 bit general purpose registers, namely r0-r15:

Name Brief description


r0 parameter 0, scratch, return value
r1 parameter 1, scratch, return value
r2-r3 parameters 2 and 3, scratch
r4-r10 permanent
r11 frame pointer, permanent
r12 scratch
r13 stack pointer, permanent
r14 link register, permanent
r15 program counter (note: due to pipeline, r15 points to 2 instructions ahead)
Table 27: Register usage on arm32
Parameter passing Return values Stack layout

Stack directly after function prolog:

                                |-----------------------|
                                |...                      |
                                |-----------------------|                      )
                 register save area|-----------------------|                      ||||
                       local da(ta|-----------------------| )                    }
                             ||||  |...                    | }                    || caller’s frame
                             ||||  |...                    | )stack parameters     ||)
                             {  |...--------------------| )                    )
                 parameter area|| |r3                     | ||}                    |||
                             ||||  |r2                     |  spill area (if needed) ||||
                             ||(  |r1                     | ||)                    |}
                                |r0---------------------|                      | current frame
register save area (with return address)----------------------|                      ||||
                       local data|-----------------------|                      |||)
                   parameter area|...                      |
                                -------------------------
Figure 15: Stack layout on arm32

ATPCS THUMB mode

Status Registers and register usage

In THUMB mode, the ARM32 processor family supports eight 32 bit general purpose registers r0-r7 and access to high order registers r8-r15:

Name Brief description


r0 parameter 0, scratch, return value
r1 parameter 1, scratch, return value
r2,r3 parameters 2 and 3, scratch
r4-r6 permanent
r7 frame pointer, permanent
r8-r11 permanent
r12 scratch
r13 stack pointer, permanent
r14 link register, permanent
r15 program counter (note: due to pipeline, r15 points to 2 instructions ahead)
Table 28: Register usage on arm32 thumb mode
Parameter passing Return values Stack layout

Stack directly after function prolog:

                                |-----------------------|
                                |...                      |
                                |-----------------------|                      )
                 register save area|-----------------------|                      ||||
                       local da(ta|-----------------------| )                    }
                             ||||  |...                    | }                    || caller’s frame
                             ||||  |...                    | )stack parameters     ||)
                             {  |...--------------------| )                    )
                 parameter area|| |r3                     | ||}                    |||
                             ||||  |r2                     |  spill area (if needed) ||||
                             ||(  |r1                     | ||)                    |}
                                |r0---------------------|                      | current frame
register save area (with return address)----------------------|                      ||||
                       local data|-----------------------|                      |||)
                   parameter area|...                      |
                                -------------------------
Figure 16: Stack layout on arm32 thumb mode

EABI (ARM and THUMB mode)

The ARM EABI is very similar to the ABI outlined in ARM-THUMB procedure call standard (ATPCS) [18] - however, the EABI requires the stack to be 8-byte aligned at function entries, as well as for 64 bit parameters. The latter are aligned on 8-byte boundaries on the stack and 2-registers for a parameter passed via register. In order to achieve such an alignment, a register might have to be skipped for parameters passed via registers, or 4-bytes on the stack for parameters passed via the stack. Refer to the Debian ARM EABI port wiki for more information [22].

Status

ARM on Apple’s iOS (Darwin) Platform

The iOS runs on ARMv6 (iOS 2.0) and ARMv7 (iOS 3.0) architectures. Typically code is compiled in Thumb mode.

Register usage

Name Brief description


R0 parameter 0, scratch, return value
R1 parameter 1, scratch, return value
R2,R3 parameters 2 and 3, scratch
R4-R6 permanent
R7 frame pointer, permanent
R8 permanent
R9 permanent(iOS 2.0) and scratch (since iOS 3.0)
R10-R11 permanent
R12 scratch, intra-procedure scratch register (IP) used by dynamic linker
R13 stack pointer, permanent
R14 link register, permanent
R15 program counter (note: due to pipeline, r15 points to 2 instructions ahead)
CPSR Program status register
D0-D7 scratch. aliases S0-S15, on ARMv7 also as Q0-Q3. Not accessible from Thumb mode on ARMv6.
D8-D15 permanent, aliases S16-S31, on ARMv7 also as Q4-A7. Not accesible from Thumb mode on ARMv6.
D16-D31 Only available in ARMv7, aliases Q8-Q15.
FPSCR VFP status register.
Table 29: Register usage on ARM Apple iOS

The ABI is based on the AAPCS but with some important differences listed below:

ARM hard float (armhf)

Most debian-based Linux systems on ARMv7 (or ARMv6 with FPU) platforms use a calling convention referred to as armhf, using 16 32-bit floating point registers of the FPU of the VFPv3-D16 extension to the ARM architecture. The instruction set used for armhf is Thumb-2. Refer to the debian wiki for more information [23].

Code is little-endian, rest is similar to EABI with an 8-byte aligned stack, etc..

Register usage

Name Brief description


R0 parameter 0, scratch, non floating point return value
R1 parameter 1, scratch, non floating point return value
R2,R3 parameters 2 and 3, scratch
R4,R5 permanent
R6 scratch
R7 frame pointer, permanent
R8 permanent
R9,R10 scratch
R11 permanent
R12 scratch, intra-procedure scratch register (IP) used by dynamic linker
R13 stack pointer, permanent
R14 link register, permanent
R15 program counter (note: due to pipeline, r15 points to 2 instructions ahead)
CPSR Program status register
S0 floating point argument, floating point return value, single precision
D0 floating point argument, floating point return value, double precision, aliases S0-S1,
S1-S15 floating point arguments, single precision
D1-D7 aliases S2-S15, floating point arguments, double precision
FPSCR VFP status register.
Table 30: Register usage on armhf
Parameter passing Return values Stack layout

Stack directly after function prolog:

                                |-----------------------|
                                |...                      |
                                |-----------------------|                      )
                 register save area|-----------------------|                      ||||
                       local da(ta|-----------------------| }                    ||}
                             ||{  |r0-r3-------------------| )spill area (if needed)   caller’s frame
                 parameter area  |...                    | }                    ||||
                             ||(  |...                    | )stack parameters     ||)
                                |...--------------------|                      )
register save area (with return address)----------------------|                      }
                       local data|-----------------------|                      ) current frame
                   parameter area|...                      |
                                -------------------------
Figure 17: Stack layout on arm32 armhf

Architectures

The ARM architecture family contains several revisions with capabilities and extensions (such as thumb-interworking, more vector registers, ...) The following table sums up the most important properties of the various architecture standards, from a calling convention perspective.

Arch Platforms Details



ARMv4
ARMv4T ARM 7, ARM 9, Neo FreeRunner (OpenMoko)
ARMv5 ARM 9E BLX instruction available
ARMv6 No vector registers available in thumb
ARMv7 iPod touch, iPhone 3GS/4, Raspberry Pi 2 VFP throughout available, armhf calling convention on some platforms
ARMv8 iPhone 6 and higher 64bit support
Table 31: Overview of ARM Architecture, Platforms and Details

ARM64 Calling Convention

Overview

ARMv8 introduced the AArch64 calling convention. ARM64 chips can be run in 64 or 32bit mode, but not by the same process. Interworking is only intra-process.
The word size is defined to be 32 bits, a dword 64 bits. Note that this is due to historical reasons (terminology didn’t change from ARM32).
For more details, take a look at the Procedure Call Standard for the ARM 64-bit Architecture [20].
dyncall support

The dyncall library supports the ARM 64-bit AArch64 PCS ABI, for calls and callbacks.

AAPCS64 Calling Convention

Registers and register usage

ARM64 features thirty-one 64 bit general purpose registers, namely x0-x30. Also, there is SP, a register with restricted use, used for the stack pointer, and PC dedicated as program counter. Additionally, there are thirty-two 128 bit registers v0-v31, to be used as SIMD and floating point registers, referred to as q0-q31, d0-d31 and s0-s31, respectively, depending on their use:

Name Brief description


x0-x7 parameters, scratch, return value
x8 indirect result location pointer
x9-x15 scratch
x16 permanent in some cases, can have special function (IP0), see doc
x17 permanent in some cases, can have special function (IP1), see doc
x18 reserved as platform register, advised not to be used for handwritten, portable asm, see doc
x19-x28 permanent
x29 permanent, frame pointer
x30 permanent, link register
SP permanent, stack pointer
PC program counter
Table 32: Register usage on arm64
Parameter passing Return values Stack layout

Stack directly after function prolog:

                   |-----------------------|
                   |...                      |
                   |-----------------------|                     )
    register save area |-----------------------|                     ||||
          local dat(a|-----------------------|)                    }
                ||||  |...                    |}                    || caller’s frame
                ||||  |...                    |) stack parameters     ||)
                ||||  |...--------------------|)                    )
                ||||  |x0                     ||||                    |||
                ||||  |x1                     |||||                    ||||
                {  |...                    |||||                    ||||
   parameter area || |x2                     ||||}                    ||||
                ||||  |x7                     |  spill area (if needed) ||||
                ||||  |d0                     ||||                    ||||
                ||||  |d1                     |||||                    |}
                ||||  |...                    |||||                    | current frame
                ||(  |d2                     ||||)                    ||||
                   |d7---------------------|                     ||||
    register save area |-----------------------|                     ||||
          local data|-----------------------|                     ||||
link and frame register|x30                    |                     ||||
                   |x29--------------------|                     |||)
      parameter area |...                      |
                   -------------------------
Figure 18: Stack layout on arm64

Apple’s ARM64 Function Calling Conventions

Overview

Apple’s ARM64 calling convention is based on the AAPCS64 standard, however, diverges in some ways. Only the differences are listed here, for more details, take a look at Apple’s official documentation [21].

MIPS32 Calling Convention

Overview

Multiple revisions of the MIPS Instruction set exist, namely MIPS I, MIPS II, MIPS III, MIPS IV, MIPS32 and MIPS64. Nowadays, MIPS32 and MIPS64 are the main ones used for 32-bit and 64-bit instruction sets, respectively.
Given MIPS processor are often used for embedded devices, several add-on extensions exist for the MIPS family, for example:

MIPS-3D
simple floating-point SIMD instructions dedicated to common 3D tasks.
MDMX
(MaDMaX) more extensive integer SIMD instruction set using 64 bit floating-point registers.
MIPS16e
adds compression to the instruction stream to make programs take up less room (allegedly a response to the THUMB instruction set of the ARM architecture).
MIPS MT
multithreading additions to the system similar to HyperThreading.

Unfortunately, there is actually no such thing as ”The MIPS Calling Convention”. Many possible conventions are used by many different environments such as O32[35], O64[36], N32[37], N64[37], EABI[38] and NUBI[39].
dyncall support

Currently, dyncall supports for MIPS 32-bit architectures the widely-used O32 calling convention (for big- and little-endian targets), as well as EABI (which is used on the Homebrew SDK for the Playstation Portable). dyncall currently does not support MIPS16e (contrary to the like-minded ARM-THUMB, which is supported). Both, calls and callbacks are supported.

MIPS EABI 32-bit Calling Convention

Register usage
Name Alias Brief description



$0 $zero Hardware zero
$1 $at Assembler temporary
$2-$3 $v0-$v1 Integer results
$4-$11 $a0-$a7 Integer arguments, or double precision float arguments
$12-$15,$24 $t4-$t7,$t8 Integer temporaries
$25 $t9 Integer temporary, hold the address of the called function for all PIC calls (by convention)
$16-$23 $s0-$s7 Preserved
$26,$27 $kt0,$kt1 Reserved for kernel
$28 $gp Global pointer, preserve
$29 $sp Stack pointer, preserve
$30 $s8 Frame pointer, preserve
$31 $ra Return address, preserve
hi, lo Multiply/divide special registers
$f0,$f2 Float results
$f1,$f3,$f4-$f11,$f20-$f23 Float temporaries
$f12-$f19 Single precision float arguments
Table 33: Register usage on MIPS32 EABI calling convention
Parameter passing Stack layout

Stack directly after function prolog:

                                |-----------------------|
                                |...                      |
                                |-----------------------|                   )
                 register save area|-----------------------|                   ||||
                       local da(ta|-----------------------| )                 }
                             {  |...                    | }                 || caller’s frame
                 parameter area( |...                    | )stack parameters  ||)
                                |...--------------------|                   )
register save area (with return address)----------------------|                   |||
                       local data|-----------------------|                   |}
                   parameter area|-----------------------|                   | current frame
                                |...                      |                   |||)
                                -------------------------
Figure 19: Stack layout on mips32 eabi calling convention

MIPS O32 32-bit Calling Convention

Register usage
Name Alias Brief description



$0 $zero hardware zero
$1 $at assembler temporary
$2-$3 $v0-$v1 return value, scratch
$4-$7 $a0-$a3 first integer arguments, scratch
$8-$15,$24 $t0-$t7,$t8 temporaries, scratch
$25 $t9 temporary, hold the address of the called function for all PIC calls (by convention)
$16-$23 $s0-$s7 preserved
$26,$27 $k0,$k1 reserved for kernel
$28 $gp global pointer, preserved by caller
$29 $sp stack pointer, preserve
$30 $fp frame pointer, preserve
$31 $ra return address, preserve
hi, lo multiply/divide special registers
$f0-$f3 float return value, scratch
$f4-$f11,$f16-$f19 float temporaries, scratch
$f12-$f15 first floating point arguments, scratch
$f20-$f31 preserved
Table 34: Register usage on MIPS O32 calling convention
Parameter passing Stack layout

Stack directly after function prolog:

                                |-----------------------|
                                |...                      |
                                |-----------------------|                   )
                       local data|-----------------------|                   ||||
                 register save area|return address           |                   ||||
                                |s.7                     |                   ||||
                                |..                      |                   ||||
                             (  |s0---------------------| )                 ||}
                             ||  |...                    | }                   caller’s frame
                             ||||  |...                    |  stack parameters  ||||
                             ||{  |...                    | ))                 ||||
                 parameter area  |a3                     | ||                 ||||
                             ||||  |a2                     | }                 ||||
                             ||||  |a1                     | ||spill area         ||)
                             (  -a0---------------------- )
                       local data|-----------------------|                   )|
register save area (with return address)----------------------|                   |||}
                   parameter area|-----------------------|                     current frame
                                |.                      |                   |||
                                -..-----------------------                   |)
Figure 20: Stack layout on MIPS O32 calling convention

MIPS N32 32-bit Calling Convention

@@@

MIPS64 Calling Convention

Overview

There are two main ABIs in use for MIPS64 chips, N64[37] and N32[37]. Both are basically the same, except that N32 uses 32-bit pointers and long integers, instead of 64. All registers of a MIPS64 chip are considered to be 64-bit wide, even for the N32 calling convention.
The word size is defined to be 32 bits, a dword 64 bits. Note that this is due to historical reasons (terminology didn’t change from MIPS32).
Other than that there are 64-bit versions of the other ABIs found for MIPS32, e.g. the EABI[38] and O64[36]. dyncall support

For MIPS 64-bit machines, dyncall supports the N64 calling conventions for calls and callbacks (for big- and little-endian targets). The N32 calling convention might work - it used to, but hasn’t been tested, recently.

MIPS N64 Calling Convention

Register usage
Name Alias Brief description



$0 $zero Hardware zero
$1 $at Assembler temporary
$2-$3 $v0-$v1 Integer results
$4-$11 $a0-$a7 Integer arguments, or double precision float arguments
$12-$15,$24 $t4-$t7,$t8 Integer temporaries
$25 $t9 Integer temporary, hold the address of the called function for all PIC calls (by convention)
$16-$23 $s0-$s7 Preserved
$26,$27 $kt0,$kt1 Reserved for kernel
$28 $gp Global pointer, preserve
$29 $sp Stack pointer, preserve
$30 $s8 Frame pointer, preserve
$31 $ra Return address, preserve
hi, lo Multiply/divide special registers
$f0,$f2 Float results
$f1,$f3,$f4-$f11,$f20-$f23 Float temporaries
$f12-$f19 Float arguments
$f24-$f31 Preserved
Table 35: Register usage on MIPS N64 calling convention
Parameter passing Stack layout

Stack directly after function prolog:
@@@ WIP, might be wrong

               |------------------------|
               | ...                      |
               |------------------------|                  )
 register save area------------------------|                  ||||
       local data(|------------------------|)                 }
             { | ...                     |}                 || caller’s frame
parameter area ( | ...                     |) stack parameters  ||)
               |-...---------------------|                  )
 register save area padding                 |                  |||
               | $ra                     |                  ||||
               | $s8                     |                  |}
               |-$gp--------------------|                  | current frame
       local data|------------------------|                  ||||
  parameter area|------------------------|                  |||)
               | ...                      |
                -------------------------
Figure 21: Stack layout on mips64 n64 calling convention

SPARC Calling Convention

Overview

The SPARC family of processors is based on the SPARC instruction set architecture, which comes in basically tree revisions, V7, V8[28][26] and V9[29][27]. The former two are 32-bit whereas the latter refers to the 64-bit SPARC architecture (see next chapter). SPARC uses big endian byte order.
dyncall support

dyncall fully supports the SPARC 32-bit instruction set (V7 and V8), for calls and callbacks.

SPARC (32-bit) Calling Convention

Register usage
Name Alias Brief description



%g0 %r0 Read-only, hardwired to 0
%g1-%g7 %r1-%r7 Global
%o0,%o1 and %i0,%i1 %r8,%r9 and %r24,%r25 Output and input argument registers, return value
%o2-%o5 and %i2-%i5 %r10-%r13 and %r26-%r29 Output and input argument registers
%o6 and %i6 %r14 and %r30, %sp and %fp Stack and frame pointer
%o7 and %i7 %r15 and %r31 Return address (caller writes to o7, callee uses i7)
%l0-%l7 %r16-%r23 preserve
%f0,%f1 Floating point return value
%f2-%f31 scratch
Table 36: Register usage on sparc calling convention
Parameter passing Stack layout

Stack directly after function prolog:

                           |-----------------------|
                           |...                      |
                           |-----------------------|                   )
      local data (and padding() |-----------------------|)                  ||||
                        ||||  |argument x             |}                  ||||
                        ||||  |...                    |) stack parameters   ||||
                        {  |argument 6             |)                  }
            parameter area || |input argument 5 spill    |}                  ||caller’s frame
                        ||||  |...                    |) spill area         ||||
                        ||(  |input argument 0 spill    |                   ||||
                           |struct/union-return pointer|                   ||)
register save area (%i* and %l*)---------------------|                   )
      local data (and padding) |-----------------------|                   }
              parameter area |-----------------------|                   )current frame
                           |...                      |
                           -------------------------



Figure 22: Stack layout on sparc32 calling convention

SPARC64 Calling Convention

Overview

The SPARC family of processors is based on the SPARC instruction set architecture, which comes in basically tree revisions, V7, V8[28][26] and V9[29][27]. The former two are 32-bit (see previous chapter) whereas the latter refers to the 64-bit SPARC architecture. SPARC uses big endian byte order, however, V9 supports also little endian byte order, but for data access only, not instruction access.

There are two probosals, one from Sun and one from Hal, which disagree on how to handle some aspects of this calling convention.
dyncall support

dyncall fully supports the SPARC 64-bit instruction set (V9), for calls and callbacks.

SPARC (64-bit) Calling Convention

Name Alias Brief description



%g0 %r0 Read-only, hardwired to 0
%g1-%g7 %r1-%r7 Global
%o0-%o3 and %i0-%i3 %r8-%r11 and %r24-%r27 Output and input argument registers, return value
%o4,%o5 and %i4,%i5 %r12,%r13 and %r28,%r29 Output and input argument registers
%o6 and %i6 %r14 and %r30, %sp and %fp Stack and frame pointer (NOTE, value is pointing to stack/frame minus a BIAS of 2047)
%o7 and %i7 %r15 and %r31 Return address (caller writes to o7, callee uses i7)
%l0-%l7 %r16-%r23 preserve
%d0,%d2,%d4,%d6 Floating point arguments, return value
%d8,%d10,...,%d30 Floating point arguments
%d32,%d36,...,%d62 scratch (but, according do Hal, %d16,...,%d46 are preserved)
Table 37: Register usage on sparc64 calling convention
Parameter passing Stack layout

Stack directly after function prolog:

                           |-----------------------|
                           |...                      |
                           |-----------------------|                   )
      local data (and padding() |-----------------------|)                  ||||
                        ||||  |argument x             |}                  ||||
                        ||{  |...                    |) stack parameters   ||}
            parameter area   |argument 6             |)                   caller’s frame
                        ||||  |input argument 5 spill    |}                  ||||
                        ||(  |...                    |) spill area         ||||
                           |input argument-0-spill----|                   ||)
register save area (%i* and %l*)---------------------|                   )
      local data (and padding) |-----------------------|                   }
              parameter area |-----------------------|                   )current frame
                           |...                      |
                           -------------------------



Figure 23: Stack layout on sparc64 calling convention


previous
index
next