[Charles Srstka] Attached is a set of patches to port the precise timer that is currently used in the Linux and BeOS builds of SheepShaver to Mac OS X (and any other Mach-based operating systems). Currently, the Linux build uses the clock_gettime() function to get nanosecond-precision time, and falls back on gettimeofday() if it is not present. Unfortunately, Mac OS X does not currently support clock_gettime(), and gettimeofday() has only microsecond granularity. The Mach kernel, however, has a clock_get_time() function that does very nearly the same thing as clock_gettime(). The patches to BasiliskII cause the timing functions such as timer_current_time() to use clock_get_time() instead of gettimeofday() on Mach-based systems that do not support clock_gettime(). The changes to SheepShaver involve the precise timer. The existing code for Linux uses pthreads and real-time signals to handle the timing. Mac OS X unfortunately does not seem to support real-time signals, so Mach calls are again used to suspend and resume the timer thread in order to attempt to duplicate the Linux and BeOS versions of the timer. The code is somewhat ugly right now, as I decided to leave alone the pre-existing style of the source file, which unfortunately involves #ifdefs scattered throughout the file and some duplication of code. A future patch may want to clean this up to separate out the OS-specific code and put it all together at the top of the file. However, for the time being, this seems to work. This has not been extensively tested, because I have not been able to get my hands on a good test-case app for the classic Mac OS that would run inside the emulator and try out the timer. However, performance does seem to be better than with the pre-existing code, and nothing seems to have blown up as far as I can tell. I did find a game via a Google search - Cap'n Magneto - that is known to have problems with Basilisk/SheepShaver's legacy 60 Hz timer, and the opening fade-to-color for this game appears to run much more smoothly with the precise timer code in place.
Don't profile by default - as this is no use to non-developers.
Happy New Year!
merge PPC_PROFILE_REGS_USE fixes from KPX branch
Rearrange powerpc_registers struct and nuke fp_result register which is only needed for JIT (and to be handled differently in the future).
Add more micro asm optimisations to x86{,-64} (mulhw, mulhwu, slw, srw, cntlzw and subf* series). Also now enable the optimzations on x86_64 by default.
Define UNALIGNED_PROFITABLE on x86 platforms
Move likely() definitions to dyngen-exec.h, they are only used in the CPU core where it's most useful (give a stronger hint to gcc4)
Helper macros to annotate likely branch directions. Colateral effect: this also fixes build with GCC 4.1 (ppc-dyngen-ops.cpp) since the branches are re-ordered in a way there is now only one exit-point in op_jump_next_A0().
Always use the complete non-stubs Ethernet driver (XXX probably do that only in Emulated PPC mode for performance reasons?)
Do use predecode cache in case the JIT is disabled by the user ("jit" option)
Minor tweaks to support compilation of ether.cpp within MacOS. i.e. mostly migrate the Ethernet driver to the MacOS side. This is enabled for DIRECT_ADDRESSING cases. I didn't want to alter much of ether.cpp (as it would have required to support that mode). Of course, in REAL_ADDRESSING mode (the default) and for debugging purposes, the old driver is still available.
Remove obsolete and broken Cygwin/X11 hacks. Forbid builds of the Windows version from within the Unix/ directory.
Hopefully fix the remaining issue in the High Resolution Timing support code and re-enable it on Linux platforms (they have clock_nanosleep). Why did I trigger an interrupt inside a held lock? Hmmm, we should probably add an _ack semaphore like we do e.g. for ethernet.
Re-enable spinlocks on {i386,x86_64} since they are now used only for really small atomic operations (add/sub). This implementation should be enough for that purpose.
Use fast spinlocks only for small enough atomic operations. Otherwise, you run into some performance problems in e.g. video graphics experience because of busywaits in the current spin_lock() implementation.
We HAVE_PTHREADS even if we use our own pthreads implementation, this also induces availability of locking primitives. I will merge the !HAVE_PTHREADS case (a la Basilisk II) for EMULATED_PPC when I get back to home.
Extend internal math library from GNU libc to accomodate older systems with glibc 2.2.X or simply no C99 capable C library. Fix vrfiz instruction to really truncate on float values.
Disable high-res timings as it could still hang. The advantage is that we now can use special mutexes to debug deadlocks
Enable high precision timings on POSIX systems supporting clock_nanosleep(). Since pthread_suspend_np() is not available to Linux (but NetBSD 2.0), thread suspend is implemented likewise to boehm-gc.
Fix native Linux/ppc with recent enough glibc that supports TLS; r2 is used in that case. Tell me if I broke other arches, e.g. r13 is no longer saved in Video and Ethernet stubs, though it seems to be OK. Colateral feature: SheepShaver should now run on Linux/ppc64 with relevant 32-bit runtime. Native Linux/ppc64 support is harder as low mem globals are 32-bit in mind and e.g. the TLS register there is %r13, %r2 is the TOC (PowerOpen/AIX ABI)
Initial support for NetBSD/ppc in native mode (some crashes occur but I could boot MacOS 9.0.4)
Support FreeBSD 5.3: - fix implementation of offsetof() with GCC >= 3.4 and C++ code
Happy New Year 2005!
Cygwin Direct Addressing hack.
Implement Direct Addressing mode similarly to Basilisk II. This is to get SheepShaver working on OSes that don't support maipping of Low Memory globals at 0x00000000, e.g. Windows.
Don't bother with predecode cache when using JIT.
Disable testandset() locks, use pthread's as SheepShaver occasionnally hangs with spinlocks. Weird as those are derived from x86 linuxthreads.
add and fix testandset for x86_64
Get rid of old (and broken) ASYNC_IRQ / MUTICORE code
Make NativeOp() handler a sheepshaver_cpu handler, thus getting rid of ugly GPR macro definition. Make the JIT engine somewhat reentrant. This brings a massive performance boost for applications that cause many Execute68k(). e.g. audio in PlayerPRO.
Use assembly optimizations on x86 for adde/addo/etc. emulation
Direct block chaining works on all supported platforms, enabled by default
direct block chaining, aka faster block dispatcher
fix for SheepThreads (native mode)
Portability fixes: declare Set_pthread_attr() only if HAVE_PTHREADS. Merge add_{serial,ether}_names() from B2 prefs editor for FreeBSD/IRIX.
Use bswap instruction on IA-32 too. Optimize bswap_64 on little-endian (x86 for now) systems.
Add spinlocks for Darwin/PPC
Happy New Year! :)
Fix IA-32 testandset(), make spinlock_t volatile int.
Add fast X11 display locking routines based on spinlocks, or on pthreads in the worst case. Optimize out GetScrap() case when we already own the selection. i.e. make it smoother. Use our own XDisplay{Un,}Lock() routines.
Declare timing functions from timer_unix.cpp
Add PPC_PROFILE_GENERIC_CALLS, don't enable PPC_PROFILE_COMPILE_TIME by default.
Gather stats about compile time. Define KPX_MAX_CPUS to 1 for allowing allocation of translation cache into .data section on PowerPC.
Optimized bswap_32() for AMD64
Remove obsolete code related to PPC_NO_FPSCR_UPDATE, PPC_LAZY_PC_UPDATE, PPC_LAZY_CC_UPDATE, PPC_HAVE_SPLIT_CR defines.
Fix ASYNC_IRQ build but locks may still happen. Note that with a predecode cache, checking for pending interrupts may not be the bottle neck nowadays.
Rewrite interrupts handling code so that the emulator can work with a predecode cache. This implies to run in interpreted mode only while processing EmulOps or other native (nested) runs. Note that the FLIGHT_RECORDER with a predecode cache gets slower than without caching at all.
- Handle MakeExecutable() replacement - Disable predecode cache in CVS for now - Fix flight recorder ordering in predecode cache mode
- Add support for FLIGHT_RECORDER with predecode cache - Always enable predecode cache & flight recorder for now
Move PPC emulator config to here
Make older & bogus compilers happy. aka. force "static" storage class for SPIN_LOCK_UNLOCKED constant.
- Share EmulatorData & KernelData struct definitions - Introduce new SheepShaver data area for alternate stacks, thunks, etc. - Experimental asynchronous interrupts handling. This improves performance by 30% but some (rare) lockups may occur. To be debugged!
spinlocks from QEMU
Add byteswap routines
Import VOSF from Basilisk II for faster and more accurate video refresh. There may be some bugs left though. Rework sigsegv_handler() a little to accomodate VOSF way of life. TODO: merge video drivers infrastructure from B2.
Sync with changes from cxmon and B2. I have yet to find out why my old disk image (8.1 based) no longer boots completely. :-/
added dummy Set_pthread_attr()
Imported sources
This form allows you to request diffs between any two revisions of this file. For each of the two "sides" of the diff, select a symbolic revision name using the selection box, or choose 'Use Text Field' and enter a numeric revision.