Remove some dead code. Start implementation of optimized calls to interpretive fallbacks for untranslatable instruction handlers. Disabled for now since call_m_01() is not correctly imeplemented yet.
Add facility to filter out some opcodes from the compfunctbl[] et al.
Implement a generic setzflg_l() for P4, thus permitting to re-enable translation of ADDX/SUBX/BCLR/BTST/BSET/BCHG instructions. i.e. make it faster. ;-)
Workaround change in flags handling for BSF instruction on Pentium 4. i.e. currently disable translation of ADDX/SUBX/B<CHG,CLR,SET,TST> instructions in that case. That is to say, better (much?) slower than inaccurate. :-(
Fix align_target with a padding of 0 bytes
Remove obsolete CFLOW_* constants but keep cpuop_{begin,end} for an inline-threaded core.
Add raw_emit_nop_filler() with more efficient no-op fillers stolen from GNU binutils 2.12.90.0.15. Speed bump is marginal (less than 6%). Make it default though, that's conditionalized by tune_nop_fillers constant.
JIT add copyright notices just to notify people that's real derivative work from GPL code (UAE-JIT). Additions and improvements are from B2 developers.
- Turn on runtime detection of loop and jump alignment as Aranym people reported they got some improvement with it and larger loops. Small loops are an issue for now until unrolling is implemented for DBcc. - Const jumps are identified in readcpu. I don't want to duplicate code uselessly. Rather, it's the JIT job to know whether we are doing block inlining and un-marking those instructions as end-of-block.
Add PROFILE_UNTRANSLATED_INSNS information. Interestingly, the following are the bottleneck now: DIVS, BSR.L (why isn't it translated yet?), bit-field instructions (I need to self-motivate enough for that), and A-Traps.
- Remove dead code in readcpu.cpp concerning CONST_JUMP control flow. - Replace unused fl_compiled with fl_const_jump - Implement block inlining enabled with USE_INLINING && USE_CHECKSUM_INFO. However, this is currently disabled as it doesn't give much and exhibits even more a cache/code generation problem with FPU JIT compiled code. - Actual checksum values are now integral part of a blockinfo regardless of USE_CHECKSUM_INFO is set or not. Reduce number of elements in that structure and speeds up a little calculation of checksum of chained blocks. - Don't care about show_checksum() for now.
- Rewrite blockinfo allocator et al. Use a template class so that this can work with other types related to blockinfos. - Add new method to compute checksums. This should permit code inlining and follow-ups of const_jumps without breaking the lazy cache invalidator. aka. chain infos for checksuming. TODO: Incomplete support thus disabled.
- Optimize use of quit_program variable. This is a real boolean for B2. - Remove unused/dead code concerning surroundings of (debugging). - m68k_compile_execute() is generated and optimized code now.
- Rewrite raw_init_cpu() to match more details, from kernel sources. - Add possibility to tune code alignment to the underlying processor. However, this is turned off as I don't see much improvement and align_jumps = 64 for Athlon looks suspicious to me. - Remove two extra align_target() that are already covered. - Remove unused may_trap() predicate.
Move -DSAHF_SETO_PROFITABLE down in x86 & gas specific block. Also ensure SAHF_SETO_PROFITABLE is defined when compiling the JIT. Aka I don't want to support obsolete and probably bogus code nowadays.
Don't forget to use vm_realease() to free up translation cache. Also free the right amount of memory that was previously allocated.
Use vm_acquire() to allocate translation cache
Import JIT compiler
This form allows you to request diffs between any two revisions of this file. For each of the two "sides" of the diff, select a symbolic revision name using the selection box, or choose 'Use Text Field' and enter a numeric revision.