10/31/2023: Localization News - Dead of the Brain 1!

No, NOT a trick, a Halloween treat! Presenting the Dead of the Brain 1 English patch by David Shadoff for the DEAD last official PC Engine CD game published by NEC before exiting the console biz in 1999! I helped edit/betatest and it's also a game I actually finished in 2023, yaaay! Shubibiman also did a French localization. github.com/dshadoff/DeadoftheBrain
twitter.com/NightWolve/PCENews
Main Menu

The new fork of HuC

Started by TurboXray, 08/15/2016, 09:31 PM

Previous topic - Next topic

0 Members and 3 Guests are viewing this topic.

elmer

#100
Quote from: TurboXray on 08/18/2016, 07:45 PMOK. I got HuC compiled with the changes and removal of fmemopen. It compiles a simple source file, but pceas is complaining that it can find the x86 pseudo reg names (they now have an underscore). This looks to be a backend lib thing and not a executable issue (there were notes mentioning of moving and changing the asm lib stuffs).

EDIT: Nevermind about the x86 pseudo reg issue. Seems to be from add32() fastcall that's internal to HuC. No one bothered to change the names of the pseudo regs for the internal fastcalls to the match the changes to the backend lib. I'll update those now in HuC source code.. I'm not even sure why there are even internal fastcalls. Those should all be in the library..
Quote from: cabbage on 11/04/2016, 08:58 AMAnother project always fails to build, and in the same way each time:
04:A267                  sta     <__fptr
Undefined symbol in operand field!
04:A27D                  sta     <__fptr+1
Undefined symbol in operand field!
04:A27F                  lda     [__fptr]
Undefined symbol in operand field!
Hmmmm ... the problem that Bonknuts was seeing is the same type of error as the problem that cabbage has compiling his program.

Uli's new source files contain an inconsistent mix of "fptr" and "__fptr", just as though he was in the middle of changing things and then stopped part way through.

I've renamed all the instances back to their old HuC names "__fptr", and now cabbage's VWF test program compiles.

But it doesn't work properly, either with the old cygwin build, or with the new mingw build.  ](*,)

...
<hours pass>
...

OK, cabbage's source code now builds and runs properly with the new compiler.  :D

The problem wasn't in his code, it was a change that Uli made in the new compiler.

Uli decided that "char" should be signed instead of unsigned, and that "#incbin" should define its label as a "signed char *".

This was breaking cabbage's program, and lots of other people's HuC programs probably, because that's different to how the old versions of HuC do things (and they were doing the sensible thing IMHO).

I've changed things so that #incbin always defines the label as an "unsigned char *", and I've changed the default char to be unsigned as well.

There are new compiler flags to change that default char setting "-fsigned-char" and "-funsigned-char".

With all of today's changes, cabbage's source code now compiles and runs properly with the latest version of the compiler.

The changes have been checked into github, and here is a new Windows 32-bit version of HuC ...

<EDIT - Removed obsolete build.>

Please let me know if folks still have problems with it.

cabbage

Great work, elmer! With this updated build, the other random error I spoke of is gone and the program compiles and runs just fine.

Since you asked, I found another issue. Here's an example program:
#include "huc.h"
char j;
#incpal(pal,"img.pcx");
#incchr(img,"img.pcx");
main(){
load_palette(0,pal,1);
for(j=0;j<64;j++){
load_vram(0x1000+0x10*j,img+j*0x10,0x10);
put_raw(0x100+j,j%32,j/32);
}
}
And here's an img.pcx file to go with it.

HuC "Denki release" handles it just fine, but the new build gives an error:
HuC (v3.98-d31dfaa-dirty, 2016-11-05)
;error: test.c(8)
; load_vram(0x1000+0x10*j,img+j*0x10,0x10);
;                                        ^
;******  can't get farptr  ******
It's due to the img+j*0x10 part -- by deleting +j*0x10, HuC stops complaining.

TurboXray

It's not just +j*0x10 , but anything with j in the seconded argument. Even this causes a problems: load_vram(0x1000+0x10*j , j , 0x10). But somehow load_vram(0x1000+0x10*j , img , 0x10) compiles fine.

elmer

#103
Quote from: TurboXray on 11/06/2016, 10:32 AMIt's not just +j*0x10 , but anything with j in the seconded argument. Even this causes a problems: load_vram(0x1000+0x10*j , j , 0x10). But somehow load_vram(0x1000+0x10*j , img , 0x10) compiles fine.
The code to do the conversion from an arg_to_fptr() is in function.c

It's barely changed since the old HuC ... but looking at the information coming into it, I can't see how-on-earth it ever worked with complex values like "img+j*0x10".

It *seems* like the problem is in the code that checks to see if the argument is legal more than the actual code generation itself.

<EDIT>

I *suspect* that both the old & new code that checks the argument is basically wrong, and only worked before because the old code generator output the math in a very specific order, and wasn't very optimized.

Not sure how to fix this without opening up a can of worms.

elmer

#104
Quote from: elmer on 11/06/2016, 12:25 PMI *suspect* that both the old & new code that checks the argument is basically wrong, and only worked before because the old code generator output the math in a very specific order, and wasn't very optimized.

Not sure how to fix this without opening up a can of worms.
I've rewritten the arg_to_fpt() function to deal with the optimizations that Uli made in the expression parsing.

HuC should now get passed all the variations of input code that cabbage was throwing at it.

It can still get confused, and will give errors on some legal-but-unusual ANSI  C code, but it should work for all common usage, and it is no worse (IMHO) than the code that was previously in there.

The changes are checked into github, and there's a new build here ...

<Removed link to obsolete version>

TurboXray

I didn't look over load_vram's second argument. Is it expecting a far_ptr as the argument type? Is it just a matter of the expression inside of the second argument not returning a far_ptr type?

elmer

Quote from: TurboXray on 11/07/2016, 04:52 PMI didn't look over load_vram's second argument. Is it expecting a far_ptr as the argument type? Is it just a matter of the expression inside of the second argument not returning a far_ptr type?
Yes, it was all a problem with converting the second argument into a far_ptr type.

HuC needs to figure out which bank contains the address that you're trying to use for that 2nd parameter.

The code that it uses to figure that out is in arg_to_fptr().

Unfortunately, that code doesn't have full access to the all the syntactic information that it needs, and has to guess at which symbol the programmer wants to use as the base address just from looking at the I-CODE stream (the intermediate code representation).

The old HuC code that was in there couldn't handle the changes that Uli made in the expression parsing to produce a more-optimized I-CODE sequence.

I had to experiment to see what kinds of I-CODE sequences are now produced for the common types of C-usage that are going to be written by HuC programmers.

Then I had to figure out how to determine where the base address is, but still reject the common illegal-parameter errors that programmers often make.

The end-result is very similar to the original HuC code, but it's a bit more flexible in its ability to handle the different I-CODE streams that Uli's changes produce.

************

Even though there are still niggles like these to fix, I'm really impressed with all the work that Uli put into improving the HuC compiler.

It's pretty amazing that it's gone from using 54 different I-CODE functions in the old HuC to 94 now.

That's a lot of new codes to allow him to optimize the output that HuC generates!  :shock:

elmer

Just in case I was actually spending some time looking at the HuC code generation to see what improvements Uli had made, then, well, I might just have noticed the processing for the "switch" statement.

WTF ... how come nobody has rewritten that darned thing in real assembly-language in the last decade ... it's a monstrosity!!!  ](*,)

I know that I've ragged on HuC a bit (well, too much, in truth) ... but really ... it's absolutely amazing the amount of work that Dave Shadoff and the others put into HuC; and their love of the PCE shows.

But someone should have re-written that crazy ___case function *years* ago!!!  :shock:

Gredler

Quote from: elmer on 11/10/2016, 12:41 AMJust in case I was actually spending some time looking at the HuC code generation to see what improvements Uli had made, then, well, I might just have noticed the processing for the "switch" statement.

WTF ... how come nobody has rewritten that darned thing in real assembly-language in the last decade ... it's a monstrosity!!!  ](*,)

I know that I've ragged on HuC a bit (well, too much, in truth) ... but really ... it's absolutely amazing the amount of work that Dave Shadoff and the others put into HuC; and their love of the PCE shows.

But someone should have re-written that crazy ___case function *years* ago!!!  :shock:
Sounds like a volunteer offer to me! :D

elmer

Quote from: Gredler on 11/10/2016, 12:43 AMSounds like a volunteer offer to me! :D
Ah ... but if *I* do it, then you'd have to take all the other things that I'm interested in, like the small-but-incredibly-fast zero-page stack.

And that goes along with a re-arrangement of CPU register usage that will piss-off anyone that's got lots of inline assembly, or who calls assembly-language functions that they've written from HuC.  #-o

The benefits of rearranging the registers are too positive to ignore.  :wink:

elmer

#110
Quote from: Gredler on 11/10/2016, 12:43 AM
Quote from: elmer on 11/10/2016, 12:41 AMBut someone should have re-written that crazy ___case function *years* ago!!!  :shock:
Sounds like a volunteer offer to me! :D
OK, you got me, it wasn't hard to change it from the new register layout to the old one, so here's a new build with the faster switch/case processing ...  :)

<Removed link to obsolete version.>

This build also fixes the compiler test suite that Uli added to deal with the changes that I've made in the last week, and all the tests pass (including today's changes to switch/case).

The changes are checked into github as usual.


Quote from: elmer on 11/07/2016, 04:39 PMI've rewritten the arg_to_fpt() function to deal with the optimizations that Uli made in the expression parsing.
It would be really nice to get some confirmation from HuC users that things are working (or not) with their projects.

Gredler

Wow - I was only teasing with my comment, but thank you for your time and energy to further improve the toolsets for us all. I hope this helps the programmers - I have still not gotten into the coding yet outside of a basic hello world :P

elmer

Quote from: Gredler on 11/11/2016, 01:33 PMWow - I was only teasing with my comment, but thank you for your time and energy to further improve the toolsets for us all. I hope this helps the programmers - I have still not gotten into the coding yet outside of a basic hello world :P
I'd rewritten it anyway, and the old-register version is 99% the same as the new one ... so it worked out.  :wink:

I've been meaning to see how Uli's testsuite stuff worked, and it was a convenient time to figure that out, as well.

The real questions come moving forwards ... does anyone actually *want* a better performing version of HuC if it's not 100% identical to the current version (i.e. the register usage changes)?

Then there's the question of the current multiply function which uses the standard slow shift-then-add algorithm.

Would people be willing to give up 2KB of ROM space and 8-bytes of ZP for a super-fast version that's actually usable?

Most 6502 programmers just avoid things like multiplies and divides, so I'm not sure that it's worth it.  :-k

TurboXray

Does HuC call a function to do the multiply? If so, maybe it can be handled via backend lib and passing a switch via command line (if <> defined, else - etc). But I think the 2k method should have been in there to begin with (such a brilliant method).

 Is the shift handling in HuC still really slow in the new build? The old build would call a function and do a loop - ugh. A little bit of bloat from a variable length macro in the trade off for speed would be worth it IMO.

DildoKKKobold

Quote from: elmer on 11/11/2016, 05:01 PMThe real questions come moving forwards ... does anyone actually *want* a better performing version of HuC if it's not 100% identical to the current version (i.e. the register usage changes)?

Would people be willing to give up 2KB of ROM space and 8-bytes of ZP for a super-fast version that's actually usable?
As a noob programmer, would I even notice these being used up? I'm pretty sure HuC's target audience would benefit more from increased performance, than they'd ever notice how you are changing these aspects.

Additionally, most HuC programmers don't set what is in the zero page versus not. I just declare my variables, and they go in ram wherever they go. Maybe more advanced HuC users, such as Arkhan or Cabbage have more input.
AvatarDildoKKKobold.jpg
For a good time with the legendary DarkKobold, email: kylethomson@gmail.com
Dildos provided free of charge, no need to bring your own! :lol:
DoxPhile .com / chat
IMG

elmer

Quote from: TurboXray on 11/11/2016, 05:58 PMDoes HuC call a function to do the multiply? If so, maybe it can be handled via backend lib and passing a switch via command line (if <> defined, else - etc). But I think the 2k method should have been in there to begin with (such a brilliant method).
Yep, there are both a mulu, and a muls function.

I agree, if you're going to use a darned multiply, then it *should* be fast, and it *could* probably be included conditionally.


QuoteIs the shift handling in HuC still really slow in the new build? The old build would call a function and do a loop - ugh. A little bit of bloat from a variable length macro in the trade off for speed would be worth it IMO.
Uli already inlined fast code for shifts by a constant 1, 2 and 8. Other immediate values drop through to a routine that's a streamlined version of the general variable-shift code.

So it's about as improved as it can be for the time being.

Seriously ... this new version of HuC has so many improvements over the old version that I'm kinda shocked that there seems to be so little interest in it.

If folks don't kick the tires and give it a spin, then we'll never find out if there's anything else broken like the farptr stuff was (but isn't anymore).

Gredler

DK I'll buy you a beer if you give it a whirl and post feedback about what differences you notice :P

DildoKKKobold

Quote from: elmer on 11/11/2016, 06:28 PMIf folks don't kick the tires and give it a spin, then we'll never find out if there's anything else broken like the farptr stuff was (but isn't anymore).
I'll give it a try this weekend. Did the conflict with squirrel get taken out, or do I still need to solve that before I can continue?
AvatarDildoKKKobold.jpg
For a good time with the legendary DarkKobold, email: kylethomson@gmail.com
Dildos provided free of charge, no need to bring your own! :lol:
DoxPhile .com / chat
IMG

TurboXray

Quote from: elmer on 11/11/2016, 06:28 PMSeriously ... this new version of HuC has so many improvements over the old version that I'm kinda shocked that there seems to be so little interest in it.
Yeah. I was impressed by the code generation for the a lot of stuff in Uli's updated HuC.

elmer

QuoteDid the conflict with squirrel get taken out, or do I still need to solve that before I can continue?
Not by me ... I don't have an example that "breaks".

It should just take a trivial multi-file search-n-replace within the Squirrel directory (or wherever the files are).

If you don't have the tools to do that, then I can do it if you send me the files.

elmer

Quote from: TurboXray on 11/11/2016, 05:58 PMBut I think the 2k method should have been in there to begin with (such a brilliant method).
Dang it, but that's a *really* fast multiply routine!  :shock:

I wish that I'd known about that one back in the 1980s!  #-o

It's such a nice way of dealing with fixed-point numbers, too.

HuCard users wouldn't care about the 2KB of tables, but CD users might prefer the smaller 1.5KB version.

It's also interesting that the bottom 16-bits of a signed 16x16 multiply is *exactly* the same as an unsigned 16x16 multiply.

So there's no need for an "smul" routine if you're just wanting a 16-bit result.


Quote from: guest on 11/11/2016, 06:21 PMAs a noob programmer, would I even notice these being used up? I'm pretty sure HuC's target audience would benefit more from increased performance, than they'd ever notice how you are changing these aspects.

Additionally, most HuC programmers don't set what is in the zero page versus not. I just declare my variables, and they go in ram wherever they go. Maybe more advanced HuC users, such as Arkhan or Cabbage have more input.
There's so much about *how* people are using HuC in practice that I don't know about.

AFAIK, people are avoiding using local variables and parameters to functions as much as they can, and just use global variables instead.

So I *think* that would put all the variables into main RAM and not in fast zero-page.

Actually ... I've just been looking for zero-page usage in HuC, but can't find it.

Does anyone know how you can specify variables in zero-page in HuC?

TurboXray

Quote from: elmer on 11/12/2016, 02:34 PMDoes anyone know how you can specify variables in zero-page in HuC?
#asm
   .zp
  var1:  .ds 1
  ptr1:  .ds 2
  tinyarray:  .ds 8
 #endasm

 No need to worry about addresses. Just use the label names. This is what I've always done in HuC for ZP variables. And of course .BSS for non ZP variables.

Sunray

Using local variables isn't as bad in this version if you compile with -fno-recursive -msmall.

Sunray

In my game I don't use any explicit multiplications at all, so personally I don't want the 2k table in my ROM. But there are probably a bunch of implicit ones from array accesses though.

I use "16bitvalue >> 4" a lot though, I haven't looked into optimizing the generated code for that. Is that worth doing?

elmer

Quote from: TurboXray on 11/12/2016, 02:54 PM #asm
   .zp
  var1:  .ds 1
  ptr1:  .ds 2
  tinyarray:  .ds 8
 #endasm

 No need to worry about addresses. Just use the label names. This is what I've always done in HuC for ZP variables. And of course .BSS for non ZP variables.
Hahaha ... Thanks, that makes sense!  :)

I keep on forgetting that HuC is really just a pre-processor for PCEAS, and that you can just drop down into assembly like that.


Quote from: Sunray on 11/13/2016, 04:11 AMUsing local variables isn't as bad in this version if you compile with -fno-recursive -msmall.
Yep, that's right, "-fno-recursive" just makes every local variable into a global variable, and so they go through the existing semi-fast processing for globals.

Then "-msmall" just drops the high-byte adjustment of the stack pointer, leading to shorter and faster stack code.

But stack usage is still *slow*, and it's used *constantly* for intermediate results and so it would benefit from being faster, even if you don't use true local variables or parameters.


Quote from: Sunray on 11/13/2016, 04:52 AMIn my game I don't use any explicit multiplications at all, so personally I don't want the 2k table in my ROM. But there are probably a bunch of implicit ones from array accesses though.
It would be interesting to streamline the library so that some functions aren't included if they're not used.

You could easily to a search for "jsr muls" and "jsr mulu" in your output file to see if and where they're used.

OTOH ... not sure why you're so worried about the 2KB table. Are you really bumping up against the 1MB ROM limit?


QuoteI use "16bitvalue >> 4" a lot though, I haven't looked into optimizing the generated code for that. Is that worth doing?
Uli's code in the new HuC is definitely faster than the old code for that, but a shift by 4 is still going through a function and a loop rather than being inlined as fast code, so you can definitely improve the speed if you want to.

Another big thing is whether you're shifting a signed or unsigned value ... the unsigned shift is faster.

Shifts by 4 could certainly be inlined if you do them a lot. The code will be a lot faster, but it'll cost you 11 bytes each time that you use it, so it's a tradeoff.

You can just modify the __asrwi and __lsrwi macros in huc.inc to try it and see how it works for you.

TurboXray

elmer: This was the very reason why I wanted to do a single inline #asm for HuC. So a macro could be used inside something like this:

 var = index[U_shift_left_int(idx,4)];

 With U_shift_left_int being a macro to inline asm code.

elmer

#126
Quote from: TurboXray on 11/13/2016, 04:10 PMelmer: This was the very reason why I wanted to do a single inline #asm for HuC. So a macro could be used inside something like this:

 var = index[U_shift_left_int(idx,4)];

 With U_shift_left_int being a macro to inline asm code.
That kind of capability would be nice ... I just have no current idea of how it would be implemented in practice, or how you'd know the current state of the expression parsing so that you didn't get in the way.

Did you get it working? I'd be happy to add the patch for it into the current code.

The easiest way to *currently* get some control of the code generated for those shifts would just be to add a #pragma or something like that to enable/disable the fast-inlined shift-by-4 code.

The current __asrwi and __lsrwi macros could be easily modified to look at a global symbol to decide what to do.

<EDIT>

I'm also a bit peeved at the restrictions of trying to keep everything working for HuCard development.

The 6502/6280 can sometimes really benefit from the self-modifying code that the CD format allows for.  ](*,)

DildoKKKobold

; load_vram(0x7E00,DogHead+0x200,0x80);
;                                    ^
;******  can't get farptr  ******

"#incspr(DogHead,"spr/spr_dog_head.pcx",0,0,2,9)"


getting my own farptr errors. These didn't occur before Cabbage's fix.

Also, the IRQ_TIMER and IRQ_VYSNC, I couldn't find them in any other file in Squirrel.
AvatarDildoKKKobold.jpg
For a good time with the legendary DarkKobold, email: kylethomson@gmail.com
Dildos provided free of charge, no need to bring your own! :lol:
DoxPhile .com / chat
IMG

TurboXray

A nice little macro would fix that: get_far_ptr(label,index)

 That aside, internally if HuC supported 24bit primitives - all labels could be kept as linear addresses. Then when used as a far pointer, simply converted to bank:local address on the fly. In the load_vram case, 0x200 would be added to the 24bit linear address of DogHead, then an internal macro would convert that to bank:local address. In this case, it would be a compile time calculation, but for something like DogHead+j it would still work as I described.

elmer

Quotegetting my own farptr errors. These didn't occur before Cabbage's fix.

Also, the IRQ_TIMER and IRQ_VYSNC, I couldn't find them in any other file in Squirrel.
Thanks for testing it out!  :D

Congratulations, you've definitely found a problem.

It looks like the "fix" that I put in for the farptr wasn't good enough to deal with Uli's "symbol+offset" optimization.  #-o

That's the one that was causing a lot of problems with cabbage's code.

I thought that I'd found where Uli was doing that, and that his code was incorrectly setting the symbol type to '0'.

I was wrong ... I've finally found exactly where he's doing the optimization, and it's an "uninitialized-variable" problem.

That's why it's tripping up on different symbols randomly. It certainly shows that the optimization is getting used in a lot of places, which is good.

I've fixed it now, but it's not the most "elegant" fix, so I'm going to see if I can switch it to use Bonknuts's idea of the macro instead.


Quote from: TurboXray on 11/14/2016, 01:12 AMA nice little macro would fix that: get_far_ptr(label,index)
Yep, Uli is not-really-creating a fake new-symbol with the "symbol+offset" as the new name.

It's certainly one way of doing it, but the macro way would be cleaner, if it can be done.


QuoteThat aside, internally if HuC supported 24bit primitives - all labels could be kept as linear addresses. Then when used as a far pointer, simply converted to bank:local address on the fly. In the load_vram case, 0x200 would be added to the 24bit linear address of DogHead, then an internal macro would convert that to bank:local address. In this case, it would be a compile time calculation, but for something like DogHead+j it would still work as I described.
Ahhh ... but that would be a huge change.

HuC doesn't really seem to deal with symbol values, just strings of text.

All of the actual numerical stuff for symbols is deferred until PCEAS.

TurboXray

#130
Quote from: elmer on 11/14/2016, 11:13 AMAhhh ... but that would be a huge change.

HuC doesn't really seem to deal with symbol values, just strings of text.

All of the actual numerical stuff for symbols is deferred until PCEAS.
Maybe not that big. If HuC isn't actually handling this, but passing it along to PCEAS, then why not add a linAddrOf() directive to PCEAS? Honestly, it would make dealing with arcard card defines much easier as well (of course we still need a .bssAC directive).

 But yeah, let PCEAS handle it with linear address support.

elmer

Quote from: TurboXray on 11/14/2016, 01:43 PMMaybe not that big. If HuC isn't actually handling this, but passing it along to PCEAS, then why not add a linAddrOf() directive to PCEAS? Honestly, it would make dealing with arcard card defines much easier as well (of course we still need a .bssAC directive).

But yeah, let PCEAS handle it with linear address support.
I look forward to receiving your patch to add this capability.  :wink:

In the meantime ... I'm still trying to find a "clean" solution for the current farptr problem.

Using the macro to do it has potential issues that will need some thinking about.

Uli's solution was the "safe" one, if a bit wasteful. I might have to keep on using it.

TurboXray

So if I do this for PCEAS, you'll handle the HuC side? Deal. Best deal ever.

elmer

Quote from: TurboXray on 11/14/2016, 04:10 PMSo if I do this for PCEAS, you'll handle the HuC side? Deal. Best deal ever.
Hahaha!  :lol:

That would be an overly optimistic, and wildly inaccurate interpretation of what I wrote.  :wink:

OldRover

Quote from: elmer on 11/02/2016, 10:57 PMIf anyone downloads this and gives it a try, even perhaps, you know, like the folks involved in a certain high profile Kickstarter ... then I'd love to get some feedback on whether there are any problems (I hope not).  :-"
Said folks will give this all a spin soon enough. ;)
Turbo Badass Rank: Janne (6 of 12 clears)
Conquered so far: Sinistron, Violent Soldier, Tatsujin, Super Raiden, Shape Shifter, Rayxanber II

elmer

#135
Quote from: OldRover on 11/14/2016, 07:38 PMSaid folks will give this all a spin soon enough. ;)
Excellent!  :D

I should have the current bug fixed soon ... I think that the quick-and-dirty fix is safer at this point than changing the macros to allow for an offset parameter. The behavior of those macros is relied upon in too much of the optimizer code.

The next question is ... are you open to a change in the HuC register usage?  :-k

HuC currently uses A:X for the 16-bit accumulator, and Y is used mostly for slow stack accesses (being set to the value 1).

Changing that to have the 16-bit accumulator in Y:A, and using X to be the data-stack-pointer offers a considerable performance and size benefit when matched with a zero-page stack.

That stack can be tiny (probably 8-bytes or less) if you're using all of the current HuC tricks to have global variables and avoid the stack ... but if you give it a 32-byte or 64-byte stack, then you can probably avoid needing to use most of those tricks and just rely upon the compiler to auto-magically use the stack for local variables, and have it run as fast as putting everything into zero-page.

Remember that on the PCE, the "zero-page,x" addressing mode is just as fast as the basic "zero-page" addressing mode.

It'll need some improvement to the HuC optimizer to take full advantage of the capability ... but the main assembly-language code generation and libraries are looking good so far.

elmer

Next question ... does anyone have a HuC project that they would be willing to share with me for testing?

I really need to see a reasonable-sized project in order to test the changes to HuC.

Uli's new testsuite is great ... but it has obviously missed a few problems with the new compiler that I've been having to fix.

If I make more-radical changes, then I'll need a better test.

DildoKKKobold

Quote from: elmer on 11/14/2016, 08:29 PMNext question ... does anyone have a HuC project that they would be willing to share with me for testing?

I really need to see a reasonable-sized project in order to test the changes to HuC.

Uli's new testsuite is great ... but it has obviously missed a few problems with the new compiler that I've been having to fix.

If I make more-radical changes, then I'll need a better test.
Elmer, greds and I chatted about it. We wouldn't mind sharing catastrophy, in its current state. That said, there are a few obvious and not so obvious caveats before we just dump our code and such to you.

Lets chat via PM?
AvatarDildoKKKobold.jpg
For a good time with the legendary DarkKobold, email: kylethomson@gmail.com
Dildos provided free of charge, no need to bring your own! :lol:
DoxPhile .com / chat
IMG

Gredler


elmer

Quote from: DildoKKKobold on 11/14/2016, 08:43 PMElmer, greds and I chatted about it. We wouldn't mind sharing catastrophy, in its current state. That said, there are a few obvious and not so obvious caveats before we just dump our code and such to you.

Lets chat via PM?
That would be absolutely perfect ... I hoped that you'd volunteer.  :D

I'll send you a PM.

Arkhan Asylum

Quote from: TheOldMan on 11/04/2016, 07:46 PM
QuoteMaybe if someone pings Arkhan, we can figure this mess out.
<lol> He won't know. I have the original sources for the squirrel player, anyway.
errr.

Those defines can be omitted, or renamed.   They're just convenience defines for calling psgOn in your code for readability from what I remember.

I am pretty sure Atlantean just does psgOn(1) and doesn't even use the define.

what *other* explosions happen?  Rename those defs to anything you want, or omit them.

I don't have the new fork of HuC, so I can't really comment on it.  I'm also balls deep in MSX stuff so I am not really itching to grab it and start swinging at it...

but if you copy paste errors, I can try to help. 
This "max-level forum psycho" (:lol:) destroyed TWO PC Engine groups in rage: one by Aaron Lambert on Facebook "Because Chris 'Shadowland' Runyon!," then the other by Aaron Nanto "Because Le NightWolve!" Him and PCE Aarons don't have a good track record together... Both times he blamed the Aarons in a "Look-what-you-made-us-do?!" manner, never himself nor his deranged, destructive, toxic turbo troll gang!

Arkhan Asylum

Quote from: elmer on 11/10/2016, 12:41 AMBut someone should have re-written that crazy ___case function *years* ago!!!  :shock:
It's coming right after that MOD player comes out next week in 2007 or 8. 
This "max-level forum psycho" (:lol:) destroyed TWO PC Engine groups in rage: one by Aaron Lambert on Facebook "Because Chris 'Shadowland' Runyon!," then the other by Aaron Nanto "Because Le NightWolve!" Him and PCE Aarons don't have a good track record together... Both times he blamed the Aarons in a "Look-what-you-made-us-do?!" manner, never himself nor his deranged, destructive, toxic turbo troll gang!

OldRover

Turbo Badass Rank: Janne (6 of 12 clears)
Conquered so far: Sinistron, Violent Soldier, Tatsujin, Super Raiden, Shape Shifter, Rayxanber II

elmer

#143
Quote from: TheOldMan on 11/04/2016, 12:12 AMSystem code and variables, however are supposed to use '__' or '_'  as a preface to indicate it's a system reserved value. :D
I agree ... at least within some reasonable limits.  :-k

Uli deciding to change the old System Card parameter names from "_al", "_bx", etc so that you could use variable names "al", "bx" etc in your HuC code, was the correct (IMHO) idea, but he went in the wrong direction (also IMHO).

He changed them to "al", "bx", etc in assembly language ... which just means that they conflict with a programmer's own usage of simple names for their own variables.

So I've changed them in the opposite direction ... the System variables are now "__al", "__bx", etc in assembly code, and can be referred to directly in HuC as "_al", "_bx", etc (since C prefixes an '_' onto all variables.

This makes it easier to differentiate System and HuC internal variables from C variables, from assembly-language variables.

elmer

Well, Catastrophy has shown that there's something going wrong somewhere in Uli's new optimizations in HuC.  :(

This C code fails ...

  const char a[6] = {60, 61, 62, 63, 64, 65, 66};

  main()
  {
    char x,y,z;
    x = 2;
    y = 3;
    z = a[x+y];
    if (z != 65)
      exit(1);
    return 0;
  }


This will take some serious debugging to find out what's going wrong!  ](*,)


*************************************

While I'm at it, it gives me an excuse to talk about *why* rearranging the register usage would be an improvement.

The example above generates assembly language code that uses the "__stbps" macro.

That macro stores a byte at the location whose address is on the top of the stack.

That's not uncommon when doing any array indexing or storage through pointers.

Here's what that macro does in Uli's HuC now ...

====================

 pha
 phx
 lda   [__stack]
 tax
 ldy   #1
 lda   [__stack],Y
 stx   <__ptr
 sta   <__ptr+1
 pla
 plx
 sta   [__ptr]
 sax
 inc   <__stack
 inc   <__stack

22 bytes, 62 cycles

====================


And it's even bigger and slower in the old HuC that doesn't have the "-msmall" small stack option!  :shock:


Here's the same thing with the rearranged registers and the zero-page stack that I'd like HuC to use ...

====================

 sta [__stack,x]
 inx
 inx

 4 bytes, 11 cycles

====================

OldMan

Quoteconst char a[6] = {60, 61, 62, 63, 64, 65, 66};
That should be a[7]. There are 7 values.

Quoteif (z != 65)
This may be your problem. The original HuC would see the 65 as an int, and then screw up the compare. IIRC, it would compare two bytes, using the byte after z as the high byte for the compare.

Not sure thats the problem, but you might want to check it. HuC really doesn't llike it when you mix types.

elmer

#146
Quote from: TheOldMan on 11/17/2016, 03:49 PMThat should be a[7]. There are 7 values.
Whoops ... that's just a typo in my post.  :oops:

The test array actually has 10 bytes, but I hand-edited it to make it smaller in the post.


QuoteThis may be your problem. The original HuC would see the 65 as an int, and then screw up the compare. IIRC, it would compare two bytes, using the byte after z as the high byte for the compare.
Good point, but AFAIK, it's a bit better about those comparisons now.

The actual problem is that the optimizer turns

z = a[x+y]    into    a[0] = z[x+y]

That's a big screwup!

<EDIT>

It's actually probably just a 1 line typo somewhere that causes the error to propagate into that terrible result.

OldMan

Quotez = a[x+y]    into    a[0] = z[x+y]
Looks like somebody got the macro parameters backwards... :)

Gredler

Quote from: elmer on 11/17/2016, 04:03 PMjust a 1 line typo somewhere that causes the error to propagate into that terrible result.
This is what gets me about programming, the room for error for an error prone person makes a difficult learning curve!

The above convo sounds awesome though, seems like progress is progressing!

Quote from: OldRover on 11/14/2016, 09:52 PMStoned alf.
That's not supposed to be alf, it's supposed to be a cat, my drawing skills need to improve like HuC is!

elmer

Quote from: TheOldMan on 11/17/2016, 04:27 PMLooks like somebody got the macro parameters backwards... :)
Hahaha, yes, that's exactly what it looks like, doesn't it!  :lol:

Actually, I've already found *where* the problem is ... but now I've got to find out *why* it's going wrong.

It's in an incredibly excellent new optimization that Uli added that will be critical (in the future) to getting good performance out of the zero-page stack.

It's moving the wrong address from the stack to the end of an assignment ... it's grabbing the address from one I-CODE too soon.

Now ... *why*???  :-k


Quote from: Gredler on 11/17/2016, 04:36 PMThis is what gets me about programming, the room for error for an error prone person makes a difficult learning curve!
IMHO, access to a good source-level debugger is crucial to learning as a beginner.

Unfortunately, there's very little that we can do about that on the PCE.  :(