Discussion
Q:
Why use the MMU? Just put it in thread local storage and don't worry about the
MMU. Using the MMU to remap different instances to the same location means
changing the MMU on every context switch.
A:
TLS is used already. The TLS on x86_64 (well, it's not really thread local
storage yet) is used to fetch SysBase and KernelBase. Keep in mind, however,
that on SMP system, several SysBases would need to exist in parallel. Since
some applications access SysBase fields, like SysBase->ThisTask directly,
each SysBase would need to reflect the state of one CPU core. Now imagine the
following pseudo code:
struct ExecBase *SysBase = TLS_GET(SysBase);
struct Task *me = SysBase->ThisTask;
If the task switch would happen between these two lines of code, and the task
would migrate to another CPU (that might happen, depending on scheduler
strategy used), the SysBase->ThisTask would not point to the right task. Yes,
I do agree that such code sucks most. Only AROS (Amiga!) makes it possible ;)
Now, imagine we have a SysBase at some fixed virtual address, whereas each CPU
maps different portion of physical address space there. If a task switch would
happen between two lines of code mentioned above, nothing particular
(nothing bad!) would happen. No matter what CPU the task runs on, it would
always fetch the SysBase valid for the very CPU.
No, we would not need to change MMU table on every context switch. Every CPU
would get it's own MMU map during the boot process. Moreover x86_64 needs MMU
in 64-bit mode anyway.
Introduction
The original API on 68k Amiga computers had variadic functions, like DoMethod.
The implementation and use of those functions relied on the way compilers
were placing the arguments to such functions. At the time these API functions
were invented, arguments were placed on the stack right before the function
was called.
Nowadays the placement of function arguments is often referred to as calling
conventions and as such are defined in ABI documents. The ABI defines a common
set of rules so binaries created by different compilers for the same CPU
architecture that claim to support a certain ABI can be linked together. Today
68k and x86 are examples of architectures that still place arguments on the
stack.
The X86_64 and PowerPc are examples of architectures that place a fixed number
of arguments in registers and additional arguments are then placed on the
stack of the caller of a function.
Examples
What does all that mean in practice? Look at the current implementation of
DoMethod for x86:
IPTR DoMethod(Object* obj, IPTR MethodId, ...)
{
return CALLHOOKPKT((struct Hook *) OCLASS(obj), obj, &MethodId);
}
DoMethod is used to call a method of an object belonging to a certain class
(you need to know a little about OOP to understand all this). It is done by
calling a hook function of that class. This hook function is actually the
class dispatcher function that looks at the method Id and calls the
corresponding method function with additional arguments given to DoMethod.
A call to DoMethod can look like this:
Object* MyObject;
...
DoMethod(MyObject, MyId, 1, 2, 3, 4, 5, 6, 7, 8);
For x86 all the arguments are placed on the stack. That means they are
linearized in memory, as if they were the elements of an array or a struct.
It can look like this:
Lower address
MyObject MyId 1 2 3 4 5 6 7 8
So if we take the address of MethodId inside of DoMethod we can access the
additional actual arguments by using pointer arithmetic:
*(&MethodId + 2)
would reference the address were the number 2 is stored. This can also be done
in a method function were we got the address of MethodId from the call to
DoMethod.
For X86_64 and PowerPc it is impossible to do it this way. Remember the first
arguments of DoMethod are passed in registers, up to 8 registers for PowerPc
and 6 for X86_64. So if we take the address of MethodId the compiler puts the
registers value holding MethodId onto the stack frame of DoMethod (not onto
the stack frame of the caller of DoMethod where additional arguments were
placed) and returns the address of that stack location. Then we add to that
address and expect the number 2 at the resulting location, which actually is
still in a register. What we get will normally simply lead to a crash, in real
application code.
Solutions
Normally, if you start searching for solutions to a problem, you first check
what others have done. Well, there are MOS and OS4, both being successors of
the original AmigaOS, running on PowerPc. They have solved this problem, both
of them basically in the same way. But this is only one of the possible
solutions. If possible, you want to choose from more solutions. The following
is a list of solutions proposed on the AROS developer mailing list.
- Throw away the variadic functions and use the ones that accept a pointer to
a structure, i.e. use DoMethodA.
- Use macros. With DoMethod not being a real function but a macro, its
arguments can be serialized into a local array.
- Use a patched compiler to always pass arguments to some special attributed
functions on the stack .
- Use macros from stdarg.h, i.e. va_start, va_end, va_copy, va_arg.
- Use AROS' already existing SLOWSTACK macros.
- Use assembler stubs that extend the callers stack frame to store the
register arguments at the right position, then call the function (DoMethod).
Upon function return all stack manipulations are reverted and control goes
back to the function caller. Example of asm stub in PowerPC assembler.
Pros & Cons
- Lots of code will have to be rewritten. People will probably not accept this
decision. The original API had those functions we should support them as
well.
- Is already used at some places, but especially together with DoMethod there
is a huge usage of stack space.
- This is what MOS and OS4 did, and they failed to get the patches into the
main gcc tree. I.e. the patch has to be done for every new gcc version in
use. If we would use their compiler we'll be getting this work done for
PowerPc for free and we'll have to do it for all the other architectures
that pass args in registers on our own. We also had our own patch for the
PowerPc port for some time and it worked well, but there were only few
people working on PowerPc port. Maybe this was one of the reasons.
- Is the only clean and standard way to do it. But it also requires a lot of
work. Regarding DoMethod we'll need an additional variadic dispatcher for
every class. Here va_arg and va_copy are used to get the arguments and call
the methods with them.
- Is what X86_64 and PowerPc are using at the moment. It is slow and there
are potential buffer overruns because some of these use fixed size arrays.
- This is a hack, because we are tinkering with stack frames but it works. It
has already been tested on PowerPc together with DoMethod. An assembler stub
would be needed for each architecture.
Example of assembler stub:
/*
Copyright © 2008, The AROS Development Team. All rights reserved.
$Id$
Desc: alib variadic stubs
Lang: english
*/
#include "aros/ppc/asm.h"
#define AROS_VARIADIC_STUB(name, arg) \
.globl name ;\
_FUNCTION(name) ;\
.set ext, 12*4 ; /* extend callers stack */ \
name: ;\
lwz 11, 0(1) ; /* get caller's stack back chain pointer */ \
sub 11, 11, 1; /* calc current stack frame size */ \
addi 11, 11, ext; /* extend stack frame size */ \
not 11, 11 ; /* create two's complement of */ \
addi 11, 11, 1; /* extended stack frame size */ \
mr 12, 1 ; /* move current stack pointer for later backup */ \
lwz 1, 0(1) ; /* load caller's stack back chain pointer */ \
stwux 1, 1, 11; /* create the extended stack frame */ \
lwz 11, ext+ 4(1) ; /* save caller's link register backup */\
stw 11, ext-24(1) ;\
lwz 11, ext+ 0(1) ; /* save caller's stack pointer */ \
stw 11, ext-28(1) ;\
stw 10, ext+ 4(1) ; /* save argument registers right below */ \
stw 9, ext+ 0(1) ; /* additional arguments passed on the */ \
stw 8, ext- 4(1) ; /* caller's stack frame */ \
stw 7, ext- 8(1) ;\
stw 6, ext-12(1) ;\
stw 5, ext-16(1) ;\
stw 4, ext-20(1) ;\
stw 12, ext-32(1) ; /* save current stack pointer */ \
mflr 12 ;\
stw 12, ext-36(1) ; /* save link register */ \
addi arg+3, 1, (arg+6)*4; /* save pointer to arguments in argument register */ \
lis 11, __##name@ha ; /* load address of variadic function */ \
la 12, __##name@l(11) ;\
mtlr 12 ;\
blrl ; /* call C part of variadic function */ \
lwz 12, ext-36(1) ; /* restore link register */ \
mtlr 12 ;\
lwz 11, ext-28(1) ; /* restore caller's stack back chain pointer */ \
stw 11, ext+ 0(1) ;\
lwz 12, ext-24(1) ; /* restore caller's link register backup */ \
stw 12, ext+ 4(1) ;\
lwz 1, ext-32(1) ; /* restore current stack pointer */ \
blr
.text
_ALIGNMENT
AROS_VARIADIC_STUB(DoMethod, 1)