Dump PPC disassembly on crash
[Michael Schmitt] Attached is a patch to SheepShaver to fix memory allocation problems when OS X 10.5 is the host. It also relaxes the 512 MB RAM limit on OS X hosts. Problem ------- Some users have been unable to run SheepShaver on OS X 10.5 (Leopard) hosts. The symptom is error "ERROR: Cannot map RAM: File already exists". SheepShaver allocates RAM at fixed addresses. If it is running in "Real" addressing mode, and can't allocate at address 0, then it was hard-coded to allocate the RAM area at 0x20000000. The ROM area as allocated at 0x40800000. The normal configuration is for SheepShaver to run under SDL, which is a Cocoa wrapper. By the time SheepShaver does its memory allocations, the Cocoa application has already started. The result is the SheepShaver memory address space already contains libraries, fonts, Input Managers, and IOKit areas. On Leopard hosts these areas can land on the same addresses SheepShaver needs, so SheepShaver's memory allocation fails. Solution -------- The approach is to change SheepShaver (on Unix & OS X hosts) to allocate the RAM area anywhere it can find the space, rather than at a fixed address. This could result in the RAM allocated higher than the ROM area, which causes a crash. To prevent this from occurring, the RAM and ROM areas are allocated contiguously. Previously the ROM starting address was a constant ROM_BASE, which was used throughout the source files. The ROM start address is now a variable ROMBase. ROMBase is allocated and set by main_*.cpp just like RAMBase. A side-effect of this change is that it lifts the 512 MB RAM limit for OS X hosts. The limit was because the fixed RAM and ROM addresses were such that the RAM could only be 512 MB before it overlapped the ROM area. Impact ------ The change to make ROMBase a variable is throughout all hosts & addressing modes. The RAM and ROM areas will only shift when run on Unix & OS X hosts, otherwise the same fixed allocation address is used as before. This change is limited to "Real" addressing mode. Unlike Basilisk II, SheepShaver *pre-calculates* the offset for "Direct" addressing mode; the offset is compiled into the program. If the RAM address were allowed to shift, it could result in the RAM area wrapping around address 0. Changes to main_unix.cpp ------------------------ 1. Real addressing mode no longer defines a RAM_BASE constant. 2. The base address of the Mac ROM (ROMBase) is defined and exported by this program. 3. Memory management helper vm_mac_acquire is renamed to vm_mac_acquire_fixed. Added a new memory management helper vm_mac_acquire, which allocates memory at any address. 4. Changed and rearranged the allocation of RAM and ROM areas. Before it worked like this: - Allocate ROM area - If can, attempt to allocate RAM at address zero - If RAM not allocated at 0, allocate at fixed address We still want to try allocating the RAM at zero, and if using DIRECT addressing we're still going to use the fixed addresses. So we don't know where the ROM should be until after we do the RAM. The new logic is: - If can, attempt to allocate RAM at address zero - If RAM not allocated at 0 if REAL addressing allocate RAM and ROM together. The ROM address is aligned to a 1 MB boundary else (direct addressing) allocate RAM at fixed address - If ROM hasn't been allocated yet, allocate at fixed address 5. Calculate ROMBase and ROMBaseHost based on where the ROM was loaded. 6. There is a crash if the RAM is allocated too high. To try and catch this, check if it was allocated higher than the kernel data address. 7. Change subsequent code from using constant ROM_BASE to variable ROMBase. Changes to Other Programs ------------------------- emul_op.cpp, main.cpp, name_registery.cpp, rom_patches.cpp, rsrc_patches.cpp, emul_ppc.cpp, sheepshaver_glue.cpp, ppc-translate-cpp: Change from constant ROM_BASE to variable ROMBase. ppc_asm.S: It was setting register to a hard-coded literal address: 0x40b0d000. Changed to set it to ROMBase + 0x30d000. ppc_asm.tmpl: It defined a macro ASM_LO16 but it assumed that the macro would always be used with operands that included a register specification. This is not true. Moved the register specification from the macro to the macro invocations. main_beos.cpp, main_windows.cpp: Since the subprograms are all expecting a variable ROMBase, all the main_*.cpp pgrams have to define and export it. The ROM_BASE constant is moved here for consistency. The mains for beos and windows just allocate the ROM at the same fixed address as before, set ROMBaseHost and ROMBase to that address, and then use ROMBase for the subsequent code. cpu_emulation.h: removed ROM_BASE constant. This value is moved to the main_*.cpp modules, to be consistent with RAM_BASE. user_strings_unix.cpp, user_strings_unix.h: Added new error messages related to errors that occur when the RAM and ROM are allocated anywhere.
Happy New Year!
Sync with new SIGSEGV API.
Enable JIT in non-constructor so that a user-defined value can be set later
Remove specialised decoders. This will be done differently, if necessary.
Remove use of global register A0 (now aliased to T0). This makes it possible to cache the CPU context pointer to a register and thus rendering generated code CPU context independent. Not useful to SheepShaver, but it is for another project for threads emulation on plain x86-32. Note: AltiVec performance may drop a little on x86 but this will be restored (and even improved) in the future.
Optimize generated code to NQD & CheckLoad functions. They don't call into 68k or MacOS code, so they don't need to be a termination point. i.e. don't split into two basic blocks and thus avoid a full hash search. Also add missing NQD_unknown_hook NativeOp from previous commit.
NQD dirty boxes, generic code + while we are at it, also rename a few NQD related NativeOps.
Add patches for native GetNamedResource() and Get1NamedResource(). This will be useful to fix a bug in the AppleShare extension (see DRVR .AFPTranslator in Basilisk II) Unrelated improvement: call sheepshaver_cpu::get_resource() directly, don't get it through another global function.
don't trigger interrupt through deleted cpu object (XXX may need locks)
Align PowerPC registers struct manually, i.e. don't depend on non-portable compiler extensions (e.g. GCC __attribute__((aligned(N)))).
Minor tweaks to support compilation of ether.cpp within MacOS. i.e. mostly migrate the Ethernet driver to the MacOS side. This is enabled for DIRECT_ADDRESSING cases. I didn't want to alter much of ether.cpp (as it would have required to support that mode). Of course, in REAL_ADDRESSING mode (the default) and for debugging purposes, the old driver is still available.
Improve idle wait mechanism. Now, the emulator thread can be suspended (idle_wait) until events arrived and notified through TriggerInterrupt(). i.e. we no longer sleep a fixed amount of time on platforms that support a thread wait/signal mechanism.
Completely avoid any form of nested interrupt processing.
Fix CR save/restore in EmulOp thunk. I don't know what it fixes for real but that was definitely wrong to only preserve CR2 there.
Rework sheepshaver_cpu object allocation and get rid of POSIX'ish functions.
it makes more sense to dump the crash dump header to stderr
Revert to no nested native ppc interrupt processing, also filter out cases where InterruptFlags is 0.
Preserve all necessary registers on interrupt, thus also permitting nested interrupts to occur. SheepShaver locks should now be reduced.
Happy New Year 2005!
add FP regs & state to preserved context on interrupt
ethernet seems to work with sheepnet, even on kernel 2.6/x86_64!
16-byte aligned memory allocator only for sheepshaver_cpu
Use BUILD_SHEEPSHAVER_PROCEDURE to allocate static procedures into the SheepShaver globals. Fix build of sheepshaver_glue.cpp without JIT.
Implement Direct Addressing mode similarly to Basilisk II. This is to get SheepShaver working on OSes that don't support maipping of Low Memory globals at 0x00000000, e.g. Windows.
Revert last change until I can check myself...
Enable ethernet everywhere, several users got it to work. Hangs may be unrelated to ethernet code anyway and ethernet driver should be endian safe nowadays.
Remove "native" EmulOp stuff as it is useless and duplicates functionalities
Performance of VOSF is heuristically determined at run-time, so have to initialize SIGSEGV handlers early, as in Basilisk II. Besides, also add missing call to vm_init() in case host system doesn't have MAP_ANON.
SDL support in SheepShaver too, though it doesn't work in native mode on Linux/ppc as libSDL is pulling in libpthread which conflicts with our sheepthreads.
Don't handle XLM_IRQ_NEST atomically in emulated mode. That's useless since this variable is modified only within a single thread and interrupts are not handled asynchronously.
Always handle interrupt even if InterruptFlags == 0, though it should not really happen in practise.
STATS: Account for all interrupts, but still count native interrupts. It turns out that for a regular bootup sequence to the Finder, less than 30% interrupts triggered were in native mode. Default EMUL_TIME_STATS to 0, end user probably doesn't want garbage to be printed to his console.
Check for SIGSEGVs from DR Cache code too.
Fix NativeOp code generation, especially in PPC_REENTRANT_JIT mode
Get rid of old (and broken) ASYNC_IRQ / MUTICORE code
Don't allow "recursive" NanoKernel interrupts
Better interrupt context checking code
Make NativeOp() handler a sheepshaver_cpu handler, thus getting rid of ugly GPR macro definition. Make the JIT engine somewhat reentrant. This brings a massive performance boost for applications that cause many Execute68k(). e.g. audio in PlayerPRO.
Don't take an EMUL_OP mode switch for Microseconds() and SynchIdleTime()
Handle SAFE_INTERRUPT_PPC to check possible nested calls (and this happens)
Extend NativeOp count to 64 (6-bit value), aka fix NATIVE_FILLRECT opcpdes. Translate NQD_{bitblt,fillrect,invrect} to direct native calls. Use Mac2HostAddr() for converting Mac base address to native.
NQD: use ReadMacInt*() and WriteMacInt*() accessors, i.e. code should now be little-endian and 64-bit safe.
Basic fillrect/invrect NQD. Code may need to be factored out somehow. Verify that bitblt NQD transfer modes are really CopyBits() ones [MB5].
Start Native QuickDraw acceleration
16-byte aligned memory allocator will try the following functions in-order (determined at compile-time): posix_memalign, memalign, valloc, malloc.
Make SheepShaver work with OS 8.6 out-of-the-box with no extra patch for the time being. i.e. ignore writes to the zero page when faking SCSIGlobals
we have to 16-byte align sheepshaver_cpu object has it contains SSE values that require this alignment.
GCC 3.4 does not allow the lazy_allocator instantiation, the other form is not supported by any GCC but ICC accepts it.
AltiVec emulation! ;-)
Generate PowerPC code wrapping GetResource() replacements. That way, it's a normal PPC function invocation that can be JIT compiled to native code instead of nesting execute() calls which may lead to use the interpreter (this took around 11% of total execution time on boot, downto 3%). Also, optimize some SheepShaver EmulOps and actually report non-CTI.
Happy New Year! :)
Match Linux/ppc native version better: jump to ROM with EmulatorData in r4, preserve CR & XER registers on EmulOp.
Use an alternate stack base while servicing PowerPC interrupts.
Use a unique ExecuteNative() interface in any case, i.e. native & emulated
Add new thunking system for 64-bit fixes.
Add "jit" prefs item. Fix PPC_DECODE_CACHE version to fill in new min_pc & max_pc members of block info. Increase -finline-limit to 10000 for older gcc
better handling of static translation cache allocation, handle nested execution paths from the cpu core, cleanups for KPX_MAX_CPUS == 1.
Merge in-progress PowerPC "JIT1" engine for AMD64, IA-32, PPC. The merge probably got wrong as there are some problems probably due to the experiment begining with CR deferred evaluation. With nbench/ppc, performance improvement was around 2x. With nbench on x86, performance improvement was around 4x on average. Incompatible change: instr_info_t has a new field in the middle. But since insertion of PPC_I(XXX) identifiers is auto-generated, there is no problem.
Fix "ignoresegv" case to actually skip the faulty instruction. Merge conditions to skip instruction on SIGSEGVfrom PowerPC native mode. The instruction skipper takes care to set the output register to 0.
- XLM_IRQ_NEST is always in native byte order format since any write to this variable go through {Enable,Disable}Interrupt(). - Add Ether thunks but only for WORDS_BIGENDIAN case since we do need more complicated translation functions.
Add some statistics for interrupt handling, Execute68k/Trap, MacOS & NativeOp
Implement partial block cache invalidation. Rewrite core cached blocks execution loop with a Duff's device. Gather some predecode time statistics. This shows that only around 2% of total emulation time is spent for predecoding the instructions.
Optimized pointers to non virtual member functions. This reduces space and overhead since runtime checks are eliminated. Actually, it yields up to 10% performance improvement with specialized decoders.
Integrate spcflags handling code to kpx_cpu core. We can also remove oldish EXEC_RETURN handling with a throw/catch mechanism since we do have a dependency on extra conditions (invalidated cache) that prevents fast execution loops.
Fix ASYNC_IRQ build but locks may still happen. Note that with a predecode cache, checking for pending interrupts may not be the bottle neck nowadays.
Rewrite interrupts handling code so that the emulator can work with a predecode cache. This implies to run in interpreted mode only while processing EmulOps or other native (nested) runs. Note that the FLIGHT_RECORDER with a predecode cache gets slower than without caching at all.
- enable multicore cpu emulation with ASYNC_IRQ - move atomic_* operations to main_unix so that they could use spinlocks or other platform-specific locking mechanisms
Preserve CR in execute_68k(). This enables MacOS 8.6 to work. ;-)
- Handle MakeExecutable() replacement - Disable predecode cache in CVS for now - Fix flight recorder ordering in predecode cache mode
- Minor optimization to execute_ppc() as we apparently don't need to move target PC into CTR. - Fix breakage introduced during little endian fixing. We now assume that MacOS doesn't rely on any PPC register that may have been saved on top of it stack. i.e. register state is saved onto native stack.
little endian fixes, note that trampolines are still not 64-bit clean either
- Share EmulatorData & KernelData struct definitions - Introduce new SheepShaver data area for alternate stacks, thunks, etc. - Experimental asynchronous interrupts handling. This improves performance by 30% but some (rare) lockups may occur. To be debugged!
use B2 sigsegv API instead of rewriting yet another sigsegv handler for x86
Try to handle XLM_IRQ_NEST atomically in emulated PPC views. Fix placement of fake SCSIGlobals (disabled for now). Switch back to mono core emulation until things are debugged enough. Implement get_resource() et al.
Merge in old kpx_cpu snapshot for debugging
This form allows you to request diffs between any two revisions of this file. For each of the two "sides" of the diff, select a symbolic revision name using the selection box, or choose 'Use Text Field' and enter a numeric revision.