PC Engine Homebrew News: The duo that brought you FX-Unit Yuki returns! A demo for "Nyanja!" is available, an action platformer akin to games like Bubble Bobble & Snow Bros in gameplay style.
Main Menu

HuC questions.

Started by elmer, 05/24/2016, 02:21 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

touko

Quote from: dshadoff on 05/30/2016, 08:49 PM
Quote from: touko on 05/30/2016, 07:27 AMDo you know why huc include a dummy .dw between each datas included ??
I'm not 100% sure whether I'm clear on what you're asking, but it could be to force 16-bit alignment on 16-bit word data.  At least, I seem to recall there was something like that.

-Dave
Ok, thanks dave, it's a little bit annoying when you can transfert multiple datas at once(in ASM), and you cannot because of that .

elmer

Quote from: TailChao on 05/30/2016, 08:47 PMGiving up a chunk of the ZeroPage for a (zp,X) software stack is not a huge loss. I think that's livable considering the speed improvements over (zp),Y or (zp).

But I think the real question is what people want to use C for on this platform, and how they want to write it.

Making everything static is really the only way to get good performance on the 65x family outside of the 65816, especially for your object system - statically allocated arrays of individual attributes.

Right when you bring any requirement for address + displacement into the equation, performance drops on the 6502. The problem is that many of C's great conveniences depend upon it. If you're stuck writing restricted C in order to cater to the shortcomings of the architecture then (personally) I don't see the benefit over just writing the assembly.
Quote from: dshadoff on 05/30/2016, 11:56 PMFirst, I wouldn't want a compiler to tell me that I can no longer write hand-coded assembly which accesses zero page.

Second, don't forget that the stack frame is not used only for parameter passing; it's also used for local variables in a standard C compiler.  So, if somebody decides to have 15 local int variables (not unlikely), that's 30 of your 200 bytes in just one call level.  If somebody wants to allocate a local array or struct, it could be completely gone.

By the way, this is why I have said repeatedly in the past that globals are the way to go for variables in HuC, as they are given a specific address and are accessed with absolute addressing mode (many times faster than stack).  In fact, I would even like the opportunity to selectively promote some of these globals to ZP for faster direct access.
Hmmmm ... the more that I think about this and actually mangle CC65's source code, the more that I'm coming to the conclusion that I need to step back for a while and rethink this.  :-k

As I look at the code, and get passed the idea of how much faster that one addressing mode "zp,x" is than "(zp),y" ... I'm thinking more about the actual usage of the stack, and I can see that you're both looking at things from a more experienced and superior perspective.

There's absolutely no way that I'm going to make stack-based access a sensible alternative to static and global variables, and that the limitations that I'm imposing with a permanent stack pointer in the X register, and requiring the use of the so much zero-page memory, and both too much of a cost for the benefits that they might provide.


Quote from: TailChao on 05/30/2016, 08:47 PMA compiler that knows to split a statically allocated array of structs into a struct of arrays, then further split each element larger than a byte into individual byte arrays, then access everything that way would be pretty cool (maybe something does this already?). I think this is really the biggest performance gain area - but it's also so contrary to C in general.
Yes, that would be lovely ... but, as you say, it's not really C anymore if the compiler is going to do that.

I think that with the limits of the 65xx, we're really looking at C as more of a semi-familiar structured-assembler.

Trying to write anything that looks like "normal" C code is just going to lead to terrible frustration.

elmer

Quote from: guest on 05/31/2016, 01:36 AMThe perk to having it in C first is, now I have the code for if I want to go plop the bastard on a different platform.    6502 is braindamaged.   Converting 6502 to z80 would make me want to shoot myself.    Rebuilding C to z80 and re-writing where needed would be much less moronic.
Quick question ... what C compiler are you using on the Z80?  :-k

TurboXray

If you decide on CC65, you might want to look into a 6502 plugin for Eclipse. Would be nice to modify it for 6280.

TurboXray

Also, about this stack optimization stuff: instead of using ZP, why not have a three or four stack system. As in, each stack is only 256 bytes (because if indexing directly), but the compiler could assign at compiler time which stack each function uses. And in the case of nesting of the same function, there could be 2 or 3 versions which the compiler could decide to use to keep the stack(s) usages from going out of bounds.

 ABS,y is only +1 cycle more than ZP,y. And you'd get away from the [stack],y mode or worse manually building the offset to the stack each time (not sure if HuC does this or not).

elmer

#55
Quote from: TurboXray on 06/06/2016, 06:15 PMIf you decide on CC65, you might want to look into a 6502 plugin for Eclipse. Would be nice to modify it for 6280.
Hahaha ... not Eclipse ... never Eclipse!  :lol:

A 177MB download and fracking Java just for an editor ... not on my computer.

I'll stick with Zeus (http://www.zeusedit.com/index.html), and sometimes the free PSPad (http://www.pspad.com/en/).


Quote from: TurboXray on 06/06/2016, 06:26 PMAlso, about this stack optimization stuff: instead of using ZP, why not have a three or four stack system. As in, each stack is only 256 bytes (because if indexing directly), but the compiler could assign at compiler time which stack each function uses. And in the case of nesting of the same function, there could be 2 or 3 versions which the compiler could decide to use to keep the stack(s) usages from going out of bounds.

ABS,y is only +1 cycle more than ZP,y. And you'd get away from the [stack],y mode or worse manually building the offset to the stack each time (not sure if HuC does this or not).
Yes, I'd come to the same conclusion.  :-k

The nice thing about this, is that stack pointer can spend most of its time loaded into the Y register, and only gets kicked out when the Y register is needed to access something through a pointer. That's easy to manage in the peephole optimizer.

I'm part-way through implementing that in CC65, but it may just break things.

However, once you make the design choice to go that route, then it becomes sensible to think about removing all the C-stack pushes and pops within a function, and just calculate the stack space that a function needs and then allocate it all-at-once at the start of the function.

Again, that's something that could potentially be done during/after HuC or CC65's peephole optimizers.

Changing the frame layout would be better handled at the code-generation stage ... but that might be difficult to accomplish in either HuC or CC65.

If you can get a frame pointer that doesn't change during a function, and you use the "abs,y" addressing mode to access the stack, then stack-based variables are often as fast as using statically allocated variables.

IMHO, that could be a bit of a game-changer.  :wink:

Anyway ... even more interesting than attempting to improve HuC or CC65, is the possiblity of actually getting SDCC to support the 6502.

That's just a much superior foundation to build upon than the Small-C roots of both HuC and CC65.

I've got one of the SDCC developers showing some signs of interest in working on a 65C02 code generator for SDCC, and I'll see what I can do to help that process and to try to keep his interest alive.  [-o<

DildoKKKobold

I have a quick question - would the goal be to be able to port HuC code directly to CC65 or SDCC? Or would someone better at this than me actually have to rewrite all of the Turbo-related functions for these compilers?

AvatarDildoKKKobold.jpg
For a good time with the legendary DarkKobold, email: kylethomson@gmail.com
Dildos provided free of charge, no need to bring your own! :lol:
DoxPhile .com / chat
IMG

elmer

#57
Quote from: guest on 06/08/2016, 03:16 PMI have a quick question - would the goal be to be able to port HuC code directly to CC65 or SDCC? Or would someone better at this than me actually have to rewrite all of the Turbo-related functions for these compilers?
Personally, I don't have enough time/energy invested in HuC to worry too much about compatibility, nor do I have enough knowledge of HuC's quirks to be the right person to try to shoehorn in HuC's way of doing things into a different environment.

I'm happy to consider trying not to break things when it doesn't have any effect upon the efficiency of the result ... but getting a "better" C compiler is my primary interest, not compatibility.

Now, if such a theoretical "better" C compiler can be made (which isn't at all certain), then there are definitely some other folks here that might be prompted/pushed into working on a HuC compatability layer.

But from what I'm seeing ... it's up to me (or someone else that has a similar interest) to prove that something better is available before anyone else will take their time to become involved.

That's not overly surprising (but yet, still a little disheartening).

If you try to change things ... then sometimes, perhaps even often, you'll fail.

But if nobody ever even risks that failure, then things never improve for anyone:roll: