Polymorph VM project - The Basic operations of a Virtual Processor...

  • up
    43%
  • down
    57%
This is mostly a braindump of thoughts so will be rambling... please ask me questions and I will answer as best I can I will need to start with describing some aspects of the existing PowerPC CPUs on the AmigaOS4.x systems, and also in part describing some low-level assembler elements of the 680x0 CPUs as well. The details are... "CPU registers" and what is available. in the 680x0 series CPUs there are the following registers, D0 D1 D2 through to D7 and the same for A0 through A7. These are the primary "General Purpose" registers. the D registers are Integer values, with the A registers usable for Address values. The presence of an FPU for the 680x0 series add a range of 8 FP registers (FP0 through FP7) Additionally there are a couple of additional registers for specific purposes (CCR and PC) but I will not be needing to immediately deal with them at this time. in the PPC CPU there is a set of 32 "R" registers, along with the presence of FP registers in the FPU. This leads me towards some considerations that may not seem immediately obvious for others but please bear with me... with 16 or 24(FPU added) registers in the 68000 series, and 32 or more registers in the PPC series processors... I also looked at the x86 series processors as well...and found there was a specific number of registers for general purpose or FPU operations there as well. This concludes the CPU specific register layouts I needed to know... what I next looked at was the size of "opcode words"... in the x86 processor... each instruction can be a multiple of 8bits in length, and the 68000 series processors require 16bits per "opcode word". Looking into the PPC documentation I saw that the same "opcode word" size is also relevant. so regardless of processor we have some common elements, "registers" for working with Integer, Address or Floating-Point numbers. "opcode word"s at a preset size with at least one used for each instruction. so far apparently simple... lets add a detail...what does the processor actually do? it reads an instruction, determines what it is, and performs the operation it is for. this can be provided in a loop... but what if the instruction was used to select an array entry... We *could* have an array of addresses... select the "opcode N"th value in that array and call the function that is pointed to by that address.... This would be quite plausible for creation of a CPU Emulating Interpreter... however I did recall something in the Amiga OS design that would assist my making this possibly faster. Two details came to mind at this point from my prior experience. The M680x0 series processors can "pipeline" 2 instructions together to work really quickly, where they do not write to the same target register. a small example assembler function is... start: move.l d0,d7 move.l a0,d6 moveq.l #0,d0 move.l $4,a6 lea DosLibName,a0 moveq.l #0,d1 jsr _LVOOpenLibrary(a6) beq.w exit nop nop /* the snippet above works but is very very simplified */ exit: rts DosLibName: dc.b "dos.library",0,0,0,0,0 ; deliberate hand-alignment to 16 bytes Each of the instructions will be 16bits with the exception of the "beq" instruction which has an additional 16bits for an address and the "lea" which has a 32bit value following the instruction. After thinking about such details... I had the thought of seeing something about 16bits of instruction and 32bits of address elsewhere as well... There is a very common data structure that the Amiga OS depends on usable for such a table *already existing* and able to be used for creation of a high-speed function table. The "Library Vector Offset" information for "Library Jump Tables", each entry is the instruction and address of a function and called directly... instead of reading an entry from the table, I had the thought of calculating where I would jump for each instruction. within a fixed table. Interpreting each instruction would be able to be made simpler... Read the Instruction "opcode word" value, shift it (as a high speed multiplication this works rather well) followed by adding the base address for the Emulation function table. this comes with some options as well. there is the option of pre-processing the string of instructions to be run through the Emulator... making a "JumpTable" where each instruction is added to the base address and a listing of "jsr address" values are written into a memory area followed by directly calling the first "jsr " instruction of that memory area. It is crude but effective in translating from one CPU to another. but this also comes at the cost of needing a set of functions based on the original CPU. While this is not new in any practical sense, it is most definitely able to be made as a workable technique regardless of the actual processor and would require one other consideration... remapping the host processor registers based on what operation is being performed within the JumpTable collective set of functions. so... what if we defined the usage of 68000 registers based on using the PPC registers instead? Looking at kas1e's blog about low-level Hacking I learned that the first 4 registers are generally system reserved... lets start in the middle instead, in 32 registers I would start at register 16 and work up for the A registers and use registers r8 through r15 for the D registers, leaving 16 registers for other purposes. This has a knockon effect in the design of the functions addresses by the JumpTable representing the 68000 instructions as well... Each "instruction" function would in effect be running only those instructions essential to perform the original operation. Would this be an effective method of interpreting or JIT processing functional blocks of code? My own opinion on this matter is yes it would be effective... as each entry in the JumpTable would be able to be placed at a preset size apart, and we can work out the total length of each function in that table... we can in a lot of cases make a simple listing where each "entrypoint" or function label is a fixed distance apart. Calculate the Address from the JumpTable BaseAddress for each "opcode function", and call it... in JIT form, jsr "opcode address" jsr "opcode address" jsr "opcode address" jsr "opcode address" jsr "opcode address" jsr "opcode address" listings would have the speed of the faster PPC processor and the same full instruction set as the 68000 or x86 CPU being Emulated. a side-note at this point would be that the Amiga OS on the 68000 and PPC systems both use the same type of storage into memory and the x86 differs from this... the "Big Endian / Little Endian" options... This is true and false... the PPC also provides for *reversed* reading of Integers from Memory. so a PPC can Emulate an x86 without significant speed loss (where the Emulation routines stay within the cache on the CPU... the speed will be significantly quick) I have however skipped some details for the way I am presenting this idea and it is also another "braindump" in effect. I would prefer to at least document materials here instead of only having paper notes for various pieces of Polymorph kept to myself. For those readers who are well versed in particular details of assembler... and have noted that the "jsr ..." JumpTable Lists presented for the JIT formatted usage of the above details... Can you possibly see something else? As the JIT "in memory" image is going to use a fixed address for the JumpTable BaseAddress there becomes a significant processing issue... the above details allow for "jsr ..." sequences to replace the original bytecodes in some cases and additionally due to the nature of the PPC it is also possible to recalculate and "fix" any address differences based on knowing what size instruction and address values are used. There comes an added quirk with this... what happens if you *reverse* the Interpretation of the JIT sequences... providing a "JIT header" and subsequent "modules"... it would be possible to create an ELF binary image for AmigaOS4.x that starts with the Emulation functions for the first segment and additional code segments for specific functionality based on the original CPU that is not a PPC. that would be all well and good... but the "feature" I would personally expose at this point is the option of "reversing the JIT" generator... allowing reconstruction of the original binary sequences from the JIT sequences themselves. This also leads to some other options that seem strange possibilities as well. I plan to experiment with this "reversibility" option as creation of the sequences for a loaded copy of a non-Amiga program to work on AmigaOS in a manner similar to Wine for Linux without actual need of an x86 CPU may be demonstrated and workable without significantly impacting the resources available. I have tested the above Interpreter theory in practice once before and was significantly impressed by the speed at which the written first version was running. for the functional sequences produced, it was appreciably quick but had other issues mostly to do with non-AmigaOS specifics being imported. it will be more than feasible to put any CPU Emulation into an AmigaOS library in the manner described above. the difficulties are entirely on how different the OS which is used in addition to the CPU Emulation chosen. I do apologise for the rambling nature of these blog entries... I am trying to think of how to explain this better. Hopefully some of the above details will help explain further part of what I have in mind for this particular project after applying a customized loader routine to properly handle the file type and relocation information. Not everything will work cleanly... as any CPU Emulation will have to also be partnered with an OS Simulation. I can not simply choose to Emulate an x86 processor without any support routines. but will I simply fake particular hardware? or fake the OS that uses that processor to execute? I don't have specific answers to these questions at this time as my previous experimentation in trying to write my own answer was not completed due to various limitations I was trying to work around. Anyway... I hope the above blog entry does help clarify something of my intentions and notes. Thank you for your time in reading this... AbH Belxjander Draconis Serechai

Blog post type: