GCC Inline Assembly - PowerPC constraints - HOWTO?

13 posts / 0 new
Last post
Belxjander
Belxjander's picture
Offline
Last seen: 9 years 3 weeks ago
Joined: 2011-02-25 11:53
GCC Inline Assembly - PowerPC constraints - HOWTO?

Can anyone point out if there is any better way to have the following inline assembly block functionally complete where the Arguments are able to be defined as part of the C wrapper and used properly within the Assembly?

I'm wanting to keep

r0-r2,r13 and r31 SYS V4 ABI Reserved
r3-r7 for Function Returns and Arguments
r8-r12 for Polymorph Internal variables
r14-r31 for Plugin Internal variables

I have so far got no reason to see why I can not do this ...

Can anyone point me at further reading for clarifying this please?

The code in question is the following... the arguments to the function can be in r3-r7
as per normal(?to my limited understanding at least?) ... is there anything else I should worry about?

  1. APTR ECALL_ExecOpCode16(USHORT offset,
  2. LONG a ,
  3. LONG b ,
  4. LONG c ,
  5. LONG d )
  6. {
  7. __asm("lhzu %r8,(%r9);\n\t" // Load the Short to Interpret and update the IXP
  8. "addi %r8,%r8,%r3;\n\t" // Add the OpCodeVector Offset
  9. "rlwinm %r7,2,%r8,2,18;\n\t" // Multiply and Mask restrict the Offset range
  10. "mbar;\n\t" // Force DataCache Sync
  11. "lwz %r8,(%r7);\n\t" // What OpCodeVector Address to call?
  12. "isync;\n\t" // Instruction Sequence lock
  13. "bl (%r8);\n\t" // ECPU OpCodeVector() Execution
  14. "mbar;\n\t" // Force DataCache Sync
  15. "nop;\n\t"
  16. "nop;\n\t"
  17. );
  18. return;
  19. }

That is what I have for now... but I am wondering if I have anything missing for making sure all the details are dealt with properly

hypex
hypex's picture
Offline
Last seen: 5 months 1 week ago
Joined: 2011-09-09 16:20
Hello. BTW is that ABI the

Hello.

BTW is that ABI the same version as the SysV or SystemV ABI?

Once you are in that routine it should be fine. And you should be able to call it like any C function from a C code directly like you have setup. Just make sure you have the compile option set for optimising function calls so the parameters don't get stuck on the stack. IIRC the first 8 go in registers and the rest are stacked usually. Also be careful of the stack frame. IIRC "-fomit-frame-pointer" avoids putting a new stack frame in place for each function call.

Apart from that perhaps access each parameter by name if possible instead of by its direct assumed register to ensure clarity in your ASM passages and to make sure you access the correct parameter. It also makes the resulting code implicit and takes out any guesswork. As well as being easier to edit or update should you need too. By the looks of the code above the parameters look backwards but I can see it works forwards. See, I don't know the code, and at first it confuses me! :-)

Belxjander
Belxjander's picture
Offline
Last seen: 9 years 3 weeks ago
Joined: 2011-02-25 11:53
I'm going to add that

I'm going to add that -fomit-frame-pointer option to the GNUmakefile, thanks...

as for the variables, I'm stepping through this particular section of code in a recursive manner and preferring to avoid stack usage where possible.

Registers r0,r1,r2,r13 and r31 are system-wide usable and untouched in these routines,
Registers r3 through to r7 are usable as volatile variables across calls
Registers r8 through r12 are used as variables through multiple depths of calls as "global" values inside the threads running this routine.
Register r14 is a "Flags" register on a per-thread basis,
Registers r15 through to r30 are also used globally for remapping registers inside a CPU Simulation.

"jsr ((Offset+OpCode)<<2)" to select which routine is a summary of what the core routine is suppossed to calculate for address calling.

that is kind of why I am setting up the registers in a "weird" way as I need to have *multiple* calls affect the same value no matter how many times it occurs recursively or not.

I can't just let the compiler declare where it will put them as the routines called are not part of the library module containing the calling routine.

and the use of labels inside the Assembly have no relation to variables or symbols outside the assembly from what I have read about the generic GCC Inline Assembly HOWTO material.

hypex
hypex's picture
Offline
Last seen: 5 months 1 week ago
Joined: 2011-09-09 16:20
Looks like you have it all

Looks like you have it all neatly bound. Making good use of registers there. And the PowerPC with SYSV ABI would help avoid stack usage.

As to variables or parameters in my example direcly accessed in the assembly it looks like "%0", "%1" and so on can be used to access them. Or by "%[name]" for better readability. Which is what I implied. I Should try myself to see if it works. :-)

There's a little more info here:
http://wiibrew.org/wiki/Inline_Assembler

And:
http://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html

Belxjander
Belxjander's picture
Offline
Last seen: 9 years 3 weeks ago
Joined: 2011-02-25 11:53
Well I am completely "green"

Well I am completely "green" to using GCC on AmigaOS 4.x along with the PPC specifics of the Assembly,

I have no previously used the Assembly parts of GCC and other than what I already have I would like to make sure I don't miss any details or have any unplanned register changes,

the variables themselves are being changed in both the parent and children of this function along with during the execution of this function...

all three layers of stacking will be using the registers for the same values however when the various "call" and "return" stack frame changes happen I would like to retain changes that have occured.

I'm not sure if I have left any details out or need further details specifically declared for this

Would using the extended Assembly stuff doesn't make itself clear to me (I want the variables in question declared directly in the registers for the entire duration of the function described along with the function I am calling.

how to do that without them being pushed back to the stack again?

hypex
hypex's picture
Offline
Last seen: 5 months 1 week ago
Joined: 2011-09-09 16:20
Omitting the stack frame

Omitting the stack frame should help to avoid any clobbering of your data registers. But it can be used to store data and is needed for proper debugging. You can also use a "volatile" keyword after the "asm" to avoid your registers being clobbered on.

But I can see that your data in r8 to r12 does overlap with volatiles according to the ABI I am looking at. SVR4. As long as nothing overwtites them they will be fine. Of course the cokmpiler excpects you will save them first.

Declaring a C function as you have done and jumping directly into ASM inside it should be fine. As the variables passed to the C routine will be in registers according to the ABI spec. The only caveat is r3 being used as a return value. So compiler may clear this upon exit of the C routine. Unless you declare it as void in which case it shouldn't touch it.

Unless you need over 8 parmeters there shouldn't be any stack pushing as you would know. And not passing any parameters shouldn't change any volatiles. Of course, when you come out of your ASM routine and back into the main C code then have to save those pararaters if yo uneed too. Or cache them in a register.

Belxjander
Belxjander's picture
Offline
Last seen: 9 years 3 weeks ago
Joined: 2011-02-25 11:53
then the following code for

then the following code for the C wrapping the assembly concerned compiled with -fomit-frame-pointer

  1. void myfunction(void)
  2. {
  3. __asm volatile ();
  4. };

this will be enough to contain the specific routines as long as I used the fixed register usage within the specific thread model I will be using?
Is that what you are saying? as the specific Simulation processes being run will have a standardized use of registers and this will be known inside the affected plugins concerned.

hypex
hypex's picture
Offline
Last seen: 5 months 1 week ago
Joined: 2011-09-09 16:20
Yes, with your code, that

Yes, with your code, that should be fine. As long as you keep your ASM to yourself and take care of calling subroutines how you want too. Of course, you would want to check it to make sure. Using a test routine to make sure the registers stay put.

By using the C wrapper you should be able to pass your normal arguments to it and have the compiled code save registers before it enters your ASM block. Having said that, if it needs to set up a stack frame, then that being done outside your code should work fine. I think the way it's done is the cleanest approach you could do.

Of course you have got to watch out for having permanent data in volatile classified registers. I know there are methods to declare a register variable and what number to put it in. But I don't know if volatile ones can be reserved should you happen to need the use of one.

Belxjander
Belxjander's picture
Offline
Last seen: 9 years 3 weeks ago
Joined: 2011-02-25 11:53
updated the code... I forgot

updated the code...
I forgot to make the calling base relative to the r11 register content.

Hopefully these snippets are understandable (I've deliberately included the full commentary as well for clarity)

2012-06-03 : latest update... if anyone can review and comment ?

  1. /*
  2.  * ExecOpCode16() and ExecOpCode8() Execute OpCodeVector() functions from outside this library to impliment bytecode processing
  3.  * so the external code requires the Internal Information provided from these routines and the InitEmulation() & ExitEmulation() functions
  4.  *
  5.  * r0,r1,r2,r13,r31 are reserved for system usage
  6.  * r3-r7 are defined as volatile
  7.  * r8-r12 are defined as ECPU call restricted access
  8.  * r14 is defined as the ECPU/ESYS combined Flags register
  9.  * r15-r30 are defined as ESYS private state (volatile to use in plugins)
  10.  *
  11.  * r8 = OpCode = Current Instruction Selection
  12.  * r9 = InstructionPtr = Instruction Address to Interpret from
  13.  * r10 = ExceptionVector = Plugin Special Operation Table
  14.  * r11 = OpCodeVector = Decoded Bytecode function table (current)
  15.  * r12 = EprocVector = Polymorph.Library plugin Interface
  16.  * r14 = EprocMachineWord = CPU Flags (Emulator specific)
  17. */
  18. APTR ECALL_ExecOpCode16(
  19. register USHORT offset __asm("%r4"),
  20. POLYMORPH_REGISTER_MAPPED)
  21. {
  22. /*
  23. * EVERYTHING IS IN THE ASSEMBLY BLOCK
  24. *
  25. * Output: %r3 = next address of execution after OpCodeVector Processing
  26. * Input: %r3 = offset (count of functions to the first OpCodeVector function
  27. *
  28. */
  29. __asm("lhzu %r8,(%r9);\n\t" // Load the Short to Interpret and update the IXP
  30. "rlwinm %r6,2,%r4,2,18;\n\t" // r6 = (r4 << 2) && 0x3FFFC ARG:offset
  31. "rlwinm %r7,2,%r8,2,18;\n\t" // r7 = (r8 << 2) && 0x3FFFC OpCode
  32. "stwu %r1,-16(%r1);\n\t" //
  33. "add %r5,%r6,%r7;\n\t" // OpCode+Offset after shift and mask
  34. "mflr %r0;\n\t" //
  35. "lwzx %r5,r5,%r11;\n\t" // Load OpCodeVector[OpCode+Offset]
  36. "stw %r0,20(%r1);\n\t" //
  37. "bl (%r5);\n\t" // ECPU OpCodeVector() Execution
  38. "lwz %r0,20(%r1);\n\t" //
  39. "addi %r1,%r1,16;\n\t" //
  40. "mtlr %r0;\n\t" //
  41. "eieio;\n\t" // Input & Output Sync
  42. "isync;\n\t" // Instruction Sync
  43. "mbar;\n\t" // Memory Sync
  44. );
  45. return;
  46. }
  47.  
  48. APTR ECALL_ExecOpCode8(
  49. register USHORT offset __asm("%r4"),
  50. POLYMORPH_REGISTER_MAPPED)
  51. {
  52. /*
  53.  * EVERYTHING IS IN THE ASSEMBLY BLOCK
  54.  *
  55.  * Output: %r3 = next address of execution after OpCodeVector Processing
  56.  * Input: %r3 = offset (count of functions to the first OpCodeVector function
  57.  *
  58. */
  59. __asm("lbzu %r8,(%r9);\n\t" // Load the Octet to Interpret and update the IXP
  60. "rlwinm %r6,2,%r4,2,18;\n\t" // r6 = (r4 << 2) && 0x3FFFC ARG:offset
  61. "rlwinm %r7,2,%r8,2,10;\n\t" // r7 = (r8 << 2) && 0x3FC OpCode
  62. "stwu %r1,-16(%r1);\n\t" //
  63. "add %r5,%r6,%r7;\n\t" // OpCode+Offset after shift and mask
  64. "mflr %r0;\n\t" //
  65. "lwzx %r5,r5,%r11;\n\t" // Load OpCodeVector[OpCode+Offset]
  66. "stw %r0,20(%r1);\n\t" //
  67. "bl (%r5);\n\t" // ECPU OpCodeVector() Execution
  68. "lwz %r0,20(%r1);\n\t" //
  69. "addi %r1,%r1,16;\n\t" //
  70. "mtlr %r0;\n\t" //
  71. "eieio;\n\t" // Input & Output Sync
  72. "isync;\n\t" // Instruction Sync
  73. "mbar;\n\t" // Memory Sync
  74. );
  75. return;
  76. }
hypex
hypex's picture
Offline
Last seen: 5 months 1 week ago
Joined: 2011-09-09 16:20
Well that looks fine. See you

Well that looks fine. See you specified where the parameter should be placed. :-)

After rechecking ABI you gotta check volatile registers when you jump back into the C section. Or rather, save yours and restore them before you go back and reload yours before you go in. As outside your code compiler may use anything defined as volatile such as for locals. Unless it is protected somehow.

You cam use the GCC -S switch enable assembly output and then it's okay.

http://www.delorie.com/djgpp/v2faq/faq8_20.html

Belxjander
Belxjander's picture
Offline
Last seen: 9 years 3 weeks ago
Joined: 2011-02-25 11:53
okay...first off... I am

okay...first off... I am defining my own ABI on top of the Sys V ABI generally used for the r8 through r30 register range with a hole at r13 for the near data section reservation,

secondly, NONE of the register contents may be save/restored in-place within the same routine.

as ExecOpCode(OpCodeVector(ExecOpCode(OpCodeVector()))) recursion is permissible and **on*exit*requires*the*content*of*the*deepest*function*to*remain*as*current*state**

At any point in that recursion if "state save" and "state restore" events occur, then the entire recursive call is in effect a null event and repeated. this is a logical equivalent to compiler output from "for(;;) do_nothing();" hammering away inside the CPU core and eating it into uselessness

so no "save/restore" of register content is against the operational requirements of this algorithm,
"correcting" for register argument passing convention, again I need to actively define this as part of the function declaration (I'm not placing long term variables in any kind of "volatile" register layout) the registers require to retain variable state for the lifetime of the process that contains these execution constraints.

there is two sets of stacks... the normal minimal process stack which will barely have enough memory to handle the normal processing and the wrapped-stack which is the stack of the program contained within the Interpreter/JIT engine.

What I need is specific help in organizing telling the compiler the exacting constraints for all the register contained variables concerned.

I know I can use "register atomic_type variable_name regname;" constructs in SAS/C and StormC but I need to know the equivalent to this for GCC to work with these assembly fragments properly,

or some means where I can feed a C language variable into an Assembly fragment to handle these specifics properly.

the functions called in the OpCodeVector[]() reference are an entirely separate library and Interface from the calling function of ExecOpCode8() and ExecOpCode16()

I would appreciate any pointers or reference examples that are usable examples and are not messed up with x86 specific constraints.

I've already looked around and spent at least a couple of weeks with the GCC Documentation along with various IRC channels asking for help.

so far I keep getting "oh assembly, x86 blah blah blah..." responses... or referred back to the GCC documentation which says nothing about telling it which register that I want to use for what variable (I seem to get the impression of "just leave register handling to the compiler" which is something I can't do... I am crossing an API/ABI boundary between two compiled objects here, as well as recursive re-use of the same function making the boundary cross call being called as well...)

I don't know at this point how much I need to explain about this for it to be understood, having tried with blog entries AND direct IRC chats... so far only 2 people have seemed to grasp what I am actually doing...the rest... I am simply unsure if I managed to explain the concept parts well enough.

hypex
hypex's picture
Offline
Last seen: 5 months 1 week ago
Joined: 2011-09-09 16:20
I can understand what

I can understand what restrictions you need to place on your code.

I think I may have found something. But you've already touched upon it in your code. However, referring to section 5.38, page 307 of the gcc manual in SDK should help you in defining global register variables. Similar to your passing of "offset" in registter r4.

For locking registers into place and fixing them so they cannot be touched check out the -ffixed-reg option. Page 213. :-)

BTW, what's with the cache insructions, why are they there? And two nops together on PPC looks funnny.

Belxjander
Belxjander's picture
Offline
Last seen: 9 years 3 weeks ago
Joined: 2011-02-25 11:53
I'm not sure quite what the

I'm not sure quite what the GCC documentation means but I will check that out again,

I'm more really able to work from an existing sample than working from the existing GCC documentation...

I'm sure I am missing something when reading it (I am self taught so I definitely have gaps in what I know)

and the cache instructions are there so that the routines being called during the fragments are not-needing to care about such things (designing some "speed improvements" into the way things are handled as part of the design)

the two-nops placement is literally a debugging marker for that routine to clear and lock pipelining on superscalar CPUs, but this code is PowerPC specific, I made an equivalent routine entirely "in-register" based for the 486DX Emulation some years ago.

It is one particular example where mathematical precision and coding technique are at odds with each other, (do to the "one return value" rule employed by C being effectively bypassed)

each register value is a "global" value within the particular processes running these routines and task-switching by the OS performs the register-save/restore as part of process seperation.

I've already added the -ffixed-reg option I believe but I will check that and make sure (I also need to sort out some kind of Internet connection from my home here in Japan to update changes further or push from a USB stick with subversion somehow if there is a portable version for windows without needing to be "installed" to the host).

I've got the code commented for now since when I uncommment it GCC throws errors at me or behaves badly (I take this as I have missed describing this properly somewhere... and the input/output/clobber constraints of the Assembly have not made any sense to me just yet)

Log in or register to post comments