Happy New Year!
Implement CMOV.B and CMOV.W translations. Only the latter has a native x86 equivalent however.
Fix JIT for 68020/68030 emulation mode.
Add support for comma-separated elements in "jitblacklist" item.
Remove the 33-bit addressing hack as it's overly complex for not much gain. Rather, use an address override prefix (0x67) though Intel Core optimization reference guide says to avoid LCP prefixes. In practise, impact on performance is measurably marginal on e.g. Speedometer tests.
Fix for LAZY_FLUSH_ICACHE_RANGE. Blocks are indexed by native addresses.
prefer lower indexes in register allocation, this avoids REX prefixes on x86_64 when %r8 - %r15 are used (very light speedup expected)
JIT generated code is not guaranteed to be leaf, e.g. there could be a call to a generic instruction handler (untranslated code). This caused problems on MacOS X for Intel where the unaligned stack conditions turned out to be more visible. Performance loss is really neglectable and this is the right fix now anyway.
fix stack alignment (theoritically but it was OK in practise) in generated functions, move m68k_compile_execute() to compiler/ dir since it's JIT generic and it now depends on USE_PUSH_POP (as it should)
Much improved responsiveness on NetBSD systems. On those systems, it's really hard to get high resolution timings and the system oftens fails to honour a timeout in less than 20 ms. The idea here is to have an average m68k instruction count (countdown quantum) that triggers real interrupt checks. The quantum is calibrated every 10 ticks and has a 1000 Hz resolution on average.
Really make translation through constant jumps functional. This can be disabled with the new prefs item "jitinline". Some rapid Speedometer 4 benchmarks showed only a 4% improvement.
Enable FLIGHT_RECORDER for generated code but don't record registers in that case (yet).
ensure allocated code fits under 32-bit boundaries
Recognize lahf_lm from Dual Core Opterons. This enables use of LAHF/SETO instructions in long mode (64-bit). However, there seems to be another bug in the JIT preventing it from being fully supported. m68k.h & codegen_x86.h are easily fixed bug another patch is still needed.
Happy New Year!
add some code to gather stats on m68k registers used in translated blocks
fix inline dispatcher to really generate a cmove on x86-64 (silly bug!)
Merge BSF simulation on P4 from Amithlon. Use 33-bit memory addressing model.
fix protection changes on translation cache + cosmetic fixlet
revive and fix almost two-year old port to x86_64
Happy New Year! :)
Implement lazy icache range invalidation. Disable for now until it shows a real benefit over only 2%
Add "jitblacklist" prefs item so that opcodes ranges could be excluded for translation. This should help debugging of (badly) translated code. Usage: jitblacklist xxxx(-yyyy)?(;xxxx(-yyyy)?)* where xxxx/yyyy are hexadecimal numbers
Make sure a 32-bit B2/JIT works reasonnably well on AMD64 too. This implies to force RAMBaseHost < 0x80000000. This is empirically determined to work on Linux/x86 and Linux/amd64.
flags are live after a call to fflags_into_flags_internal()
Remove some dead code. Start implementation of optimized calls to interpretive fallbacks for untranslatable instruction handlers. Disabled for now since call_m_01() is not correctly imeplemented yet.
Add facility to filter out some opcodes from the compfunctbl[] et al.
Implement a generic setzflg_l() for P4, thus permitting to re-enable translation of ADDX/SUBX/BCLR/BTST/BSET/BCHG instructions. i.e. make it faster. ;-)
Workaround change in flags handling for BSF instruction on Pentium 4. i.e. currently disable translation of ADDX/SUBX/B<CHG,CLR,SET,TST> instructions in that case. That is to say, better (much?) slower than inaccurate. :-(
Fix align_target with a padding of 0 bytes
Remove obsolete CFLOW_* constants but keep cpuop_{begin,end} for an inline-threaded core.
Add raw_emit_nop_filler() with more efficient no-op fillers stolen from GNU binutils 2.12.90.0.15. Speed bump is marginal (less than 6%). Make it default though, that's conditionalized by tune_nop_fillers constant.
JIT add copyright notices just to notify people that's real derivative work from GPL code (UAE-JIT). Additions and improvements are from B2 developers.
- Turn on runtime detection of loop and jump alignment as Aranym people reported they got some improvement with it and larger loops. Small loops are an issue for now until unrolling is implemented for DBcc. - Const jumps are identified in readcpu. I don't want to duplicate code uselessly. Rather, it's the JIT job to know whether we are doing block inlining and un-marking those instructions as end-of-block.
Add PROFILE_UNTRANSLATED_INSNS information. Interestingly, the following are the bottleneck now: DIVS, BSR.L (why isn't it translated yet?), bit-field instructions (I need to self-motivate enough for that), and A-Traps.
- Remove dead code in readcpu.cpp concerning CONST_JUMP control flow. - Replace unused fl_compiled with fl_const_jump - Implement block inlining enabled with USE_INLINING && USE_CHECKSUM_INFO. However, this is currently disabled as it doesn't give much and exhibits even more a cache/code generation problem with FPU JIT compiled code. - Actual checksum values are now integral part of a blockinfo regardless of USE_CHECKSUM_INFO is set or not. Reduce number of elements in that structure and speeds up a little calculation of checksum of chained blocks. - Don't care about show_checksum() for now.
- Rewrite blockinfo allocator et al. Use a template class so that this can work with other types related to blockinfos. - Add new method to compute checksums. This should permit code inlining and follow-ups of const_jumps without breaking the lazy cache invalidator. aka. chain infos for checksuming. TODO: Incomplete support thus disabled.
- Optimize use of quit_program variable. This is a real boolean for B2. - Remove unused/dead code concerning surroundings of (debugging). - m68k_compile_execute() is generated and optimized code now.
- Rewrite raw_init_cpu() to match more details, from kernel sources. - Add possibility to tune code alignment to the underlying processor. However, this is turned off as I don't see much improvement and align_jumps = 64 for Athlon looks suspicious to me. - Remove two extra align_target() that are already covered. - Remove unused may_trap() predicate.
Move -DSAHF_SETO_PROFITABLE down in x86 & gas specific block. Also ensure SAHF_SETO_PROFITABLE is defined when compiling the JIT. Aka I don't want to support obsolete and probably bogus code nowadays.
Don't forget to use vm_realease() to free up translation cache. Also free the right amount of memory that was previously allocated.
Use vm_acquire() to allocate translation cache
Import JIT compiler
This form allows you to request diffs between any two revisions of this file. For each of the two "sides" of the diff, select a symbolic revision name using the selection box, or choose 'Use Text Field' and enter a numeric revision.