Cope with assembler updates.
Happy New Year!
Implement CMOV.B and CMOV.W translations. Only the latter has a native x86 equivalent however.
Fix CMOV emulation on x86_64 in case the CPU doesn't support that instruction (which is very unlikely).
The older code generator is now deprecated on x86-32 too.
Use SAHF_SETO_PROFITABLE wherever possible on x86-64, it's faster. This can't be the default because some very ancient CPUs don't support LAHF in long mode
Remove the 33-bit addressing hack as it's overly complex for not much gain. Rather, use an address override prefix (0x67) though Intel Core optimization reference guide says to avoid LCP prefixes. In practise, impact on performance is measurably marginal on e.g. Speedometer tests.
fix FETOX & FTWOTOX translations for x86_64
Fix SAHF_SETO_PROFITABLE code for x86-64 platforms. This was only an experiment. Improvement was marginal: only +3% on AMD64 (an Athlon 64 3200+). However, it may be interesting to test it on EM64T (e.g. newer P4s) since an older P3/800, hence in 32-bit mode, got a +15% improvement in Speedometer 4 benchmarks. Rationale: lahf/seto sequences avoid load/stores to the stack (push/pop) and it was thus hoped to be faster. Anyhow, SAHF_SETO_PROFITABLE can only be enabled manually at this time. Edit your generated Makefile for testing, but first make sure your CPU supports lahf in 64-bit mode (lahf_lm flag in /proc/cpuinfo).
more precise callee-saved register set
fix stack alignment (theoritically but it was OK in practise) in generated functions, move m68k_compile_execute() to compiler/ dir since it's JIT generic and it now depends on USE_PUSH_POP (as it should)
Stop abort()'ing when we fail to recognize the underlying processor, assume an obsolete i386 instead. Keep report on stderr though.
recognize more P4 cores
Recognize lahf_lm from Dual Core Opterons. This enables use of LAHF/SETO instructions in long mode (64-bit). However, there seems to be another bug in the JIT preventing it from being fully supported. m68k.h & codegen_x86.h are easily fixed bug another patch is still needed.
Allocate executable space to detect cpu features (cpuid). aka don't crash on non-executable .data sections on x86-64 with NX support enabled.
Happy New Year!
fix tester for BSF flags handling
Merge BSF simulation on P4 from Amithlon. Use 33-bit memory addressing model.
fix JIT FPU for x86_64
preserve r11 as the register used to resolve pointers to functions
- affine need_to_preserve[] to get close to linux/x86_64 ABI - optimize NOP fillers on x86-64 (based on GNU as implementation)
revive and fix almost two-year old port to x86_64
Happy New Year! :)
Call correct PUSHF/POPF macro
Remove some dead code. Start implementation of optimized calls to interpretive fallbacks for untranslatable instruction handlers. Disabled for now since call_m_01() is not correctly imeplemented yet.
Detect x86-64
Emulate CMOV in the new code generator for processors that don't support this intruction
Add missing wrappers of the new runtime-assembler primitives
Add new backend, disabled for until it's proofread and fully functional Remove obsolete string-related instructions
clobber "cc" for flags, not "flags". Thanks Milan for noticing it.
Implement a generic setzflg_l() for P4, thus permitting to re-enable translation of ADDX/SUBX/BCLR/BTST/BSET/BCHG instructions. i.e. make it faster. ;-)
Workaround change in flags handling for BSF instruction on Pentium 4. i.e. currently disable translation of ADDX/SUBX/B<CHG,CLR,SET,TST> instructions in that case. That is to say, better (much?) slower than inaccurate. :-(
Some instructions assume offsets are only 1-byte long. I don't think this is 100% correct. Therefore, insert some asserts so that would fail.
Add raw_emit_nop_filler() with more efficient no-op fillers stolen from GNU binutils 2.12.90.0.15. Speed bump is marginal (less than 6%). Make it default though, that's conditionalized by tune_nop_fillers constant.
Don't forget to note CPU detection code mostly comes from Linux kernel.
JIT add copyright notices just to notify people that's real derivative work from GPL code (UAE-JIT). Additions and improvements are from B2 developers.
- #include "flags_x86.h" here to get NATICE_CC_?? helper macros - Add raw_cmp_b_mi() and raw_call_m_indexed() for generated m68k_compile_execute() function
Fix align_jumps for athlon, that's really "16" and gcc-3.2 sources contained the same error. ;-)
- Rewrite raw_init_cpu() to match more details, from kernel sources. - Add possibility to tune code alignment to the underlying processor. However, this is turned off as I don't see much improvement and align_jumps = 64 for Athlon looks suspicious to me. - Remove two extra align_target() that are already covered. - Remove unused may_trap() predicate.
Optimize runtime assembler with shorter equivalents when the accumulator (%eax) is referenced along with immediates.
Import JIT compiler
This form allows you to request diffs between any two revisions of this file. For each of the two "sides" of the diff, select a symbolic revision name using the selection box, or choose 'Use Text Field' and enter a numeric revision.