RIP to BT Garner of MindRec.com... BT passed away early 2023 from health problems. He was one of the top PCE homebrew developers and founder of the OG Turbo List, then PCECP.com. Condolences to family and friends.
IMG
IMG
Main Menu

Xanadu II Translation Development Blog

Started by elmer, 08/31/2015, 11:50 AM

Previous topic - Next topic

0 Members and 48 Guests are viewing this topic.

elmer

IMG
IMG

This game is a beast!   ](*,)

Just in case I get hit by a bus, I'll try to document some of my hacking progress here.

Who knows, someone might even find this stuff interesting!  :roll:

-----------------------------

The data on the CD is split into a lot of separate "files" at fixed locations.

It looks like there's a simple structure to each "file" block.

-----------------------------


// Xanadu II META_BLOCK data file structure.
//
// The number of DATA_CHUNKS in a META_BLOCK is simply the 1st-offset / 4.
//
// DATA_B = 1 byte of data
// ZERO_B = 1 byte of zero
//

META_BLOCK = HEAD_CHUNK { DATA_CHUNK } ;

HEAD_CHUNK =
{
  DATA_B (* Byte offset to DATA_CHUNK from start of file : lo-byte   *)
  DATA_B (* Byte offset to DATA_CHUNK from start of file : hi-byte   *)
  DATA_B (* Bank offset to DATA_CHUNK from start of file : 8KB banks *)
  ZERO_B
} ;

DATA_CHUNK =
  DATA_B       (* PAYLOAD length in bytes + 1 : lo-byte *)
  DATA_B       (* PAYLOAD length in bytes + 1 : hi-byte *)
  PAYLOAD
  ;

// If 1st byte of PAYLOAD is $00 then data is compressed in NEW_FORMAT, else in OLD-FORMAT.

PAYLOAD = { DATA_B } ; (* Repeated for length of PAYLOAD *)

// Xanadu II Data Search v0.0.1 Sep  1 2015
//
// Processing File "../cd/02 Kaze no Densetsu Xanadu II (J).iso".
//
//   META_BLOCK at 0x0000d800, end 0x000155e8,  16 DATA_CHUNKs,   2 new format.
// * META_BLOCK at 0x00019800, end 0x0001d19c,  11 DATA_CHUNKs,   1 new format.
// * META_BLOCK at 0x0003b800, end 0x00045d4f,  26 DATA_CHUNKs,  13 new format.
//   META_BLOCK at 0x0004b800, end 0x00053fa4,  12 DATA_CHUNKs,   0 new format.
// * META_BLOCK at 0x00057800, end 0x0005fde6,  12 DATA_CHUNKs,   0 new format.
//   META_BLOCK at 0x0006b800, end 0x00088699,  68 DATA_CHUNKs,  39 new format.
//   META_BLOCK at 0x0009b800, end 0x000b2f4d,  89 DATA_CHUNKs,  45 new format.
//   META_BLOCK at 0x000bb800, end 0x000d76c9,  61 DATA_CHUNKs,  34 new format.
//   META_BLOCK at 0x000db800, end 0x000f0299,  55 DATA_CHUNKs,  30 new format.
//   META_BLOCK at 0x000fb800, end 0x0011b721,  81 DATA_CHUNKs,  56 new format.
//   META_BLOCK at 0x0011b800, end 0x0013078f,  51 DATA_CHUNKs,  26 new format.
//   META_BLOCK at 0x0013b800, end 0x00154d51,  63 DATA_CHUNKs,  42 new format.
//   META_BLOCK at 0x0015b800, end 0x001796d5,  70 DATA_CHUNKs,  51 new format.
//   META_BLOCK at 0x0017b800, end 0x00199535,  70 DATA_CHUNKs,  51 new format.
//   META_BLOCK at 0x0019b800, end 0x001b3118,  66 DATA_CHUNKs,  47 new format.
//   META_BLOCK at 0x001bb800, end 0x001c95a4,  34 DATA_CHUNKs,  21 new format.
//   META_BLOCK at 0x001db800, end 0x001ec8a5,  40 DATA_CHUNKs,  23 new format.
//   META_BLOCK at 0x001fb800, end 0x002125cd,  69 DATA_CHUNKs,  44 new format.
//   META_BLOCK at 0x0021b800, end 0x0022a504,  32 DATA_CHUNKs,  18 new format.
//   META_BLOCK at 0x0023b800, end 0x0025b3f4,  80 DATA_CHUNKs,  57 new format.
//   META_BLOCK at 0x0025b800, end 0x0026f2fd,  55 DATA_CHUNKs,  38 new format.
//   META_BLOCK at 0x0027b800, end 0x00289115,  30 DATA_CHUNKs,  17 new format.
//   META_BLOCK at 0x0029b800, end 0x002bb629, 140 DATA_CHUNKs, 120 new format.
//   META_BLOCK at 0x002bb800, end 0x002d8457, 109 DATA_CHUNKs,  62 new format.
//   META_BLOCK at 0x002db800, end 0x002ec373,  55 DATA_CHUNKs,  37 new format.
//   META_BLOCK at 0x002fb800, end 0x0030b140,  37 DATA_CHUNKs,  21 new format.
//   META_BLOCK at 0x0031b800, end 0x00334b7f,  57 DATA_CHUNKs,  36 new format.
//   META_BLOCK at 0x0033b800, end 0x0034f39b,  42 DATA_CHUNKs,  29 new format.
//   META_BLOCK at 0x0035b800, end 0x0036f703,  45 DATA_CHUNKs,  29 new format.
//   META_BLOCK at 0x0037b800, end 0x00396346,  75 DATA_CHUNKs,  58 new format.
//   META_BLOCK at 0x0039b800, end 0x003b8d6f,  72 DATA_CHUNKs,  54 new format.
//   META_BLOCK at 0x003bb800, end 0x003cf3ab,  43 DATA_CHUNKs,  27 new format.
//   META_BLOCK at 0x003db800, end 0x003e6baa,  27 DATA_CHUNKs,  14 new format.
// * META_BLOCK at 0x003e7800, end 0x003f298e,  27 DATA_CHUNKs,  14 new format.
//   META_BLOCK at 0x003fb800, end 0x0041b5a3,  81 DATA_CHUNKs,  56 new format.
//   META_BLOCK at 0x0041b800, end 0x0043b465,  86 DATA_CHUNKs,  58 new format.
//   META_BLOCK at 0x0043b800, end 0x0045adbe,  78 DATA_CHUNKs,  55 new format.
//   META_BLOCK at 0x0045b800, end 0x004705ef,  57 DATA_CHUNKs,  40 new format.
//   META_BLOCK at 0x0047b800, end 0x00488d75,  30 DATA_CHUNKs,  17 new format.
//   META_BLOCK at 0x0049b800, end 0x004b4fad,  84 DATA_CHUNKs,  67 new format.
//   META_BLOCK at 0x004bb800, end 0x004c8c82,  38 DATA_CHUNKs,  25 new format.
//   META_BLOCK at 0x004db800, end 0x004f40f3,  68 DATA_CHUNKs,  55 new format.
//   META_BLOCK at 0x004fb800, end 0x00514032,  60 DATA_CHUNKs,  43 new format.
//   META_BLOCK at 0x0051b800, end 0x0053296f,  52 DATA_CHUNKs,  30 new format.
//   META_BLOCK at 0x0053b800, end 0x00555c48,  69 DATA_CHUNKs,  45 new format.
//   META_BLOCK at 0x0055b800, end 0x0057a4b5,  88 DATA_CHUNKs,  65 new format.
//   META_BLOCK at 0x0057b800, end 0x0059743e, 128 DATA_CHUNKs, 109 new format.
//   META_BLOCK at 0x0059b800, end 0x005b375a,  66 DATA_CHUNKs,  46 new format.
//   META_BLOCK at 0x005bb800, end 0x005daa5b, 118 DATA_CHUNKs, 100 new format.
//   META_BLOCK at 0x005db800, end 0x005edc13,  38 DATA_CHUNKs,  21 new format.
//   META_BLOCK at 0x005fb800, end 0x0060e0c0,  40 DATA_CHUNKs,  23 new format.
//   META_BLOCK at 0x0061b800, end 0x0062df12,  40 DATA_CHUNKs,  21 new format.
//   META_BLOCK at 0x0063b800, end 0x006574ae,  78 DATA_CHUNKs,  55 new format.
//   META_BLOCK at 0x0065b800, end 0x00670a45,  48 DATA_CHUNKs,  32 new format.
//   META_BLOCK at 0x0067b800, end 0x00691c97,  44 DATA_CHUNKs,  27 new format.
//   META_BLOCK at 0x0069b800, end 0x006bb03d, 109 DATA_CHUNKs,  94 new format.
//   META_BLOCK at 0x006bb800, end 0x006d1e0c,  46 DATA_CHUNKs,  27 new format.
//   META_BLOCK at 0x006db800, end 0x006eda86,  39 DATA_CHUNKs,  22 new format.
//   META_BLOCK at 0x006fb800, end 0x00708c05,  29 DATA_CHUNKs,  15 new format.
//   META_BLOCK at 0x0071b800, end 0x00736a93,  76 DATA_CHUNKs,  53 new format.
//   META_BLOCK at 0x0073b800, end 0x0074e9b4,  45 DATA_CHUNKs,  29 new format.
//   META_BLOCK at 0x0075b800, end 0x00771d9b,  44 DATA_CHUNKs,  27 new format.
//   META_BLOCK at 0x0077b800, end 0x0079943b,  97 DATA_CHUNKs,  77 new format.
//   META_BLOCK at 0x0079b800, end 0x007b9d69,  86 DATA_CHUNKs,  66 new format.
//   META_BLOCK at 0x007bb800, end 0x007c6d66,  27 DATA_CHUNKs,  13 new format.
// * META_BLOCK at 0x007c7800, end 0x007d2ab5,  27 DATA_CHUNKs,  13 new format.
//   META_BLOCK at 0x007db800, end 0x007e6735,  27 DATA_CHUNKs,  13 new format.
// * META_BLOCK at 0x007e7800, end 0x007f2735,  27 DATA_CHUNKs,  13 new format.
//   META_BLOCK at 0x007fb800, end 0x00816028,  78 DATA_CHUNKs,  54 new format.
//   META_BLOCK at 0x0081b800, end 0x0082dc75,  45 DATA_CHUNKs,  26 new format.
//   META_BLOCK at 0x0083b800, end 0x0085839a,  65 DATA_CHUNKs,  40 new format.
//   META_BLOCK at 0x0085b800, end 0x0086bcf6,  38 DATA_CHUNKs,  22 new format.
//   META_BLOCK at 0x0087b800, end 0x0089a955, 124 DATA_CHUNKs, 106 new format.
//   META_BLOCK at 0x0089b800, end 0x008ad7f8,  52 DATA_CHUNKs,  31 new format.
//   META_BLOCK at 0x00921800, end 0x009294e1,  11 DATA_CHUNKs,   5 new format.
//   META_BLOCK at 0x00981800, end 0x00988e8c,  11 DATA_CHUNKs,   5 new format.
//   META_BLOCK at 0x009a1800, end 0x009a9032,  11 DATA_CHUNKs,   5 new format.
//   META_BLOCK at 0x009b9800, end 0x009c0c7c,  11 DATA_CHUNKs,   5 new format.
//   META_BLOCK at 0x009d1800, end 0x009d74bd,  11 DATA_CHUNKs,   5 new format.
//   META_BLOCK at 0x009e9800, end 0x009f1a39,  11 DATA_CHUNKs,   5 new format.
//   META_BLOCK at 0x00a01800, end 0x00a09687,  11 DATA_CHUNKs,   5 new format.
//   META_BLOCK at 0x00a19800, end 0x00a20a4d,  11 DATA_CHUNKs,   5 new format.
//   META_BLOCK at 0x00a31800, end 0x00a38b30,  11 DATA_CHUNKs,   5 new format.
//   META_BLOCK at 0x00a49800, end 0x00a51de7,  11 DATA_CHUNKs,   5 new format.
//   META_BLOCK at 0x00a61800, end 0x00a70dff,  29 DATA_CHUNKs,  14 new format.
//   META_BLOCK at 0x00a7d800, end 0x00a8a394,  28 DATA_CHUNKs,  13 new format.
//   META_BLOCK at 0x00a99800, end 0x00aa9fd9,  29 DATA_CHUNKs,  14 new format.
//   META_BLOCK at 0x00ab5800, end 0x00ac3197,  28 DATA_CHUNKs,  15 new format.
//   META_BLOCK at 0x00ad1800, end 0x00ae2ae6,  41 DATA_CHUNKs,  16 new format.
//   META_BLOCK at 0x00aed800, end 0x00aff421,  32 DATA_CHUNKs,  15 new format.
//   META_BLOCK at 0x00b09800, end 0x00b19ef9,  30 DATA_CHUNKs,  14 new format.
//   META_BLOCK at 0x00b25800, end 0x00b36efa,  32 DATA_CHUNKs,  15 new format.
//   META_BLOCK at 0x00b41800, end 0x00b4f5a9,  29 DATA_CHUNKs,  15 new format.
//   META_BLOCK at 0x00b5d800, end 0x00b6a379,  27 DATA_CHUNKs,  14 new format.
//   META_BLOCK at 0x00b79800, end 0x00b8cf39,  47 DATA_CHUNKs,  14 new format.
//   META_BLOCK at 0x00b95800, end 0x00ba7dec,  31 DATA_CHUNKs,  13 new format.
//   META_BLOCK at 0x00bb1800, end 0x00bc35c4,  45 DATA_CHUNKs,  18 new format.
//   META_BLOCK at 0x00bcd800, end 0x00be06eb,  50 DATA_CHUNKs,  48 new format.
//   META_BLOCK at 0x00c0b800, end 0x00c413e8, 181 DATA_CHUNKs, 180 new format.
//   META_BLOCK at 0x00c41800, end 0x00c50e3a,  18 DATA_CHUNKs,  17 new format.
//   META_BLOCK at 0x00c53800, end 0x00c6081a,  19 DATA_CHUNKs,   0 new format.
//   META_BLOCK at 0x00c6b800, end 0x00c735e8,  16 DATA_CHUNKs,   2 new format.
//   META_BLOCK at 0x00c8b800, end 0x00c98042,  21 DATA_CHUNKs,   0 new format.
// * META_BLOCK at 0x00ca3800, end 0x00cab5e8,  16 DATA_CHUNKs,   2 new format.
//   META_BLOCK at 0x00cc3800, end 0x00cf4460,  57 DATA_CHUNKs,   0 new format.
//   META_BLOCK at 0x00cfb800, end 0x00d0e34e,  33 DATA_CHUNKs,   0 new format.
// * META_BLOCK at 0x00d13800, end 0x00d1b5e8,  16 DATA_CHUNKs,   2 new format.
//   META_BLOCK at 0x00d33800, end 0x00d4d13c,  50 DATA_CHUNKs,   0 new format.
//   META_BLOCK at 0x00d6b800, end 0x00d8598b,  35 DATA_CHUNKs,   0 new format.
//   META_BLOCK at 0x00da3800, end 0x00db84c2,  36 DATA_CHUNKs,   0 new format.
// * META_BLOCK at 0x00dbb800, end 0x00dc35e8,  16 DATA_CHUNKs,   2 new format.
//   META_BLOCK at 0x00ddb800, end 0x00df4bcd,  30 DATA_CHUNKs,   0 new format.
//   META_BLOCK at 0x00e13800, end 0x00e47dfe,  70 DATA_CHUNKs,   0 new format.
//   META_BLOCK at 0x00e4b800, end 0x00e6e49f,  64 DATA_CHUNKs,   0 new format.
//   META_BLOCK at 0x00e83800, end 0x00ea1a16,  34 DATA_CHUNKs,   0 new format.
// * META_BLOCK at 0x00ea7000, end 0x00eaede8,  16 DATA_CHUNKs,   2 new format.
// * META_BLOCK at 0x00eb3000, end 0x00eb699c,  11 DATA_CHUNKs,   1 new format.

elmer

#1

//
// Xanadu II OLD_FORMAT for PAYLOAD.
//
// Simple compressed data, just a stream of "command" and "parameter"
// bytes.
//
// I suspect that this is used for graphics data.
//

Command $00     (+0) : STOP ... All Done, end of PAYLOAD!
     params          : -

Command $01-$1f (+0) : Copy ($01..$1f) bytes from file to output
     params          : -

Command $20-$3f (+1) : Copy ($0000..$1fff) bytes from file to output
     params          : count-lo

Command $40-$4f (+1) : Fill ($00..$0f + 4) bytes with value
     params          : value

Command $50-$5f (+2) : Fill ($0000..$0fff + 4) bytes with value
     params          : count-lo, value

Command $60-$7f (+0) : Copy ($01..$1f, or 256) more bytes from earlier
     params          : -

Command $80-$9f (+1) : Copy 4 bytes from earlier in output (offset $0000-$1fff)
     params          : offset-lo

Command $a0-$bf (+1) : Copy 5 bytes from earlier in output (offset $0000-$1fff)
     params          : offset-lo

Command $c0-$df (+1) : Copy 6 bytes from earlier in output (offset $0000-$1fff)
     params          : offset-lo

Command $e0-$ff (+1) : Copy 7 bytes from earlier in output (offset $0000-$1fff)
     params          : offset-lo


-----------------------------

Now this was an interesting find if you were wanting to hack newer Falcom titles, but it doesn't work with Xanadu 2 data.

http://www.pokanchan.jp/dokuwiki/software/falcnvrt/start

elmer

#2
Here's the "C" decompression code for the NEW_FORMAT.

It is easier to show the code than to figure out how to explain it!  :wink:

I've been using another variation on this same basic concept for about 25 years, now. It is simple to code, fast, and gives pretty decent compression (usually beating LZ4).

///
// DecompressNewFalcom() - Decompress a block in Falcom's Xanadu II NEW_FORMAT.
//
// This is basically the next step beyond a traditional 4/12 LZSS format.
//
// If you analyze the length/offset pairs and run them through Huffman
// compression you figure out that you can improve the compression by
// using variable bit-oriented encodings for the "length" and "offset".
//
// Falcom are using one of the many variants on that scheme with interleaved
// bit-oriented and byte-oriented "command" and "data" values.
//

uint8_t * pByteStream;
unsigned   uBitBuffer;
unsigned   uBufferLen;

//

unsigned ReadByte ( void )
{
   return *pByteStream++;
}

//

unsigned ReadBit ( void )
{
   if (uBufferLen == 0)
   {
      uBitBuffer  = ReadByte();
      uBitBuffer += ReadByte() << 8;
      uBufferLen  = 16;
   }

   unsigned uBit = uBitBuffer & 1;

   uBitBuffer >>= 1;
   uBufferLen  -= 1;

   return uBit;
}

//

unsigned ReadBits ( unsigned uCount )
{
   unsigned uValue = 0;

   while (uCount--)
   {
      uValue = (uValue << 1) + ReadBit();
   }

   return uValue;
}

//

unsigned ReadLength ( void )
{
   if (ReadBit() != 0) return 2;
   if (ReadBit() != 0) return 3;
   if (ReadBit() != 0) return 4;
   if (ReadBit() != 0) return 5;
   if (ReadBit() != 0)
   {
      return ReadBits(3) + 6;
   }
   else
   {
      return ReadByte() + 14;
   }
}

//

uint8_t * DecompressNewFalcom ( uint8_t * pInput, uint8_t * pOutput, uint8_t * pInputEnd )
{
   uint8_t * pWindow;
   unsigned   uOffset;
   unsigned   uLength;
   unsigned  uFlag;
   uint8_t   uByte;

   pByteStream = ++pInput; // Skip leading zero.

   uBitBuffer = ReadByte();
   uBufferLen = 8;

   for (;;)
   {
      // Sanity Check!

      if (pByteStream > pInputEnd)
      {
         printf( "The compressed data is corrupt!\n" );
         break;
      }

      // Is this a BYTE?

      if (ReadBit() == 0)
      {
         // Immediate byte
   
         *pOutput++ = ReadByte();

         continue;
      }

      // Get LZSS window offset.

      if (ReadBit() == 0)
      {
         // Get 8-bit offset (always COPY).

         uOffset = ReadByte();
      }
      else
      {
         // Get 13-bit offset.

         uOffset = ReadBits( 5 );
         uOffset = (uOffset * 256) + ReadByte();

         // uOffset = 0 : END
         // uOffset = 1 : FILL (i.e. RLE)
         // uOffset > 1 : COPY (i.e. LZSS)

         if (uOffset == 0) break;

         if (uOffset == 1)
         {
            // FILL (RLE)

            uFlag   = ReadBit();
            uLength = ReadBits( 4 );

            if (uFlag != 0)
            {
               uLength = (uLength * 256) + ReadByte();
            }

            uLength = uLength + 14;

            uByte = ReadByte();

            while (uLength--)
            {
               *pOutput++ = uByte;
            }

            continue;
         }
      }

      // COPY (LZSS)

      pWindow = pOutput - uOffset;
      uLength = ReadLength();

      while (uLength--)
      {
         *pOutput++ = *pWindow++;
      }

      continue;
   }

   return pOutput;
}

elmer

#3
The game seems to use a fairly static memory mapping ...

$2000-$3fff Zero-Page, Stack, Fixed code segment (file handling, decompression, font drawing, etc)
$4000-$9fff Overlay code (Main Menu, Game, etc)
$a000-$bfff Streamed chunks (HuC6280 code with-or-without script language data)
$c000-$dfff General switched-bank for decompressing/uploading graphics to VRAM
$e000-$ffff CD BIOS


All game text (that I've seen, so far) is within script-language "strings".

The scripting language is pretty sophisticated, and they seem to have been nice enough to try to keep it located at the end of overlays/streamed-chunks ... but all the addresses are hard-coded, so scripts could actually be anywhere.

The PC for the scripting language is in zero-page at $37,$38 (or $3a,$3b in Xanadu I).

Unfortunately, they were a bit "undisciplined" in how/where they set the script-pc so it's an unholy mess to be sure where scripts are referenced.

Each code overlay contains it's own version of the script interpreter, with functions to modify the script-pc in different places (and potentially with different byte-codes for the actual scripting language).

The streamed-chunks either modify the script-pc directly, or call functions in the overlay to do it for them (the locations of which depend upon which overlay the script-chunk is used by).

There appear to be 570 different streamed-chunks in the game, of which only 236 actually contain script-language (as so may contain text).

That probably just means that I've not identified all of the many functions that set the script-pc, yet!

[EDIT]

That's definitely the case ... I'm currently able to detect "script" in 534 out of the 570 streamed-chunks. Getting better!

SamIAm


TurboXray

Impressive work :D

 So the game keeps the blocks/files for that area/level compressed, and then on call decompresses them into memory and provides a pointer for the routine for access it? When I was looking over Xanadu I, it had this style of setup.

elmer

#6
Quote from: TurboXray on 09/11/2015, 02:05 PMSo the game keeps the blocks/files for that area/level compressed, and then on call decompresses them into memory and provides a pointer for the routine for access it? When I was looking over Xanadu I, it had this style of setup.
Yep, major code overlays get loaded at $4000-$9fff, but it looks like complete META_BLOCKs often get loaded into the rest of SCD RAM and then decompressed as needed.

The decompression code lives in permanent memory $2000-$3fff, and maps the 2 8KB banks from the META_BLOCK  into $8000-$9fff & $a000-$bfff, and then decompresses the 8KB DATA_CHUNK into a buffer at $c000-$dfff. It then restores $8000-$bffff back to their original banks (i.e. the game overlay code).

That decompressed data is then either uploaded from $c000 to VRAM, or mapped back into $a000 and used as "script".

From what I can see, Xanadu 1 uses the same META_BLOCK format, the same DATA_CHUNK format (with everything in old-style compression), the same memory layout, and the same scripting language (with a few less commands).

It looks like the big change between the two was that Xanadu 2 also allows META_BLOCKs to be "streamed" from CD into SCD RAM as needed allowing for much more expansive and detailed graphics in the levels.

So ... once Xanadu 2 is hacked, Xanadu 1 should be pretty easy. It would be really stupid not to do it after all the work that's going to go into Xanadu 2.  :wink:

NecroPhile

Quote from: elmer on 09/11/2015, 03:06 PMSo ... once Xanadu 2 is hacked, Xanadu 1 should be pretty easy. It would be really stupid not to do it after all the work that's going to go into Xanadu 2.  :wink:
My pants are tight!
Ultimate Forum Bully/Thief/Saboteur/Clone Warrior! BURN IN HELL NECROPHUCK!!!

esteban

IMGIMG IMG  |  IMG  |  IMG IMG

TurboXray

Quote from: guest on 09/11/2015, 03:18 PMMy pants are tight!
.. but that's ok. (Beastie Boys reference?)

esteban

IMGIMG IMG  |  IMG  |  IMG IMG

NecroPhile

Quote from: TurboXray on 09/11/2015, 06:04 PM.. but that's ok. (Beastie Boys reference?)
David Cross reference -
(at 1:33)
Ultimate Forum Bully/Thief/Saboteur/Clone Warrior! BURN IN HELL NECROPHUCK!!!

TurboXray

Xanandu 1 might be easier for testing, because the levels/areas are isolated (IIRC you can't go back). So you could do a linear progression of the game (translation/testing/etc).

elmer

Quote from: TurboXray on 09/11/2015, 07:33 PMXanandu 1 might be easier for testing, because the levels/areas are isolated (IIRC you can't go back). So you could do a linear progression of the game (translation/testing/etc).
Honestly ... this one is going to be an unpleasant slog. There are too many hard-coded script addresses in the code, the script and in various unidentified tables to make it anything else.

I'd rather get the "nastier" game out of the way first. It's easier to approach the steep hill at the start of a project than at the end.

That's unless I can just keep everything in it's current location and just add new "overflow" strings at the end of the existing "script" data. That would make the whole thing much easier.

I won't know that for a while, yet.

Definitely Xanadu 2 first ... but we can always hold back the release until Xanadu 1 is done, if that'll make people happier!  :wink:

It's not a matter of "figuring" things out anymore ... that's already been done. It's a really well written little scripting language. But it was obviously built around having a linker fix up the addresses. Coming at things "post-mortem", 20+ years later, we've lost a lot of original information.

It can still be done ... but it's going to be a little bit "fragile" and probably involve a lot of debugging.

That's not much fun.  :(

SamIAm

QuoteXanandu 1 might be easier for testing, because the levels/areas are isolated (IIRC you can't go back). So you could do a linear progression of the game (translation/testing/etc).
You can't go back in Xanadu II, either, if you mean what I think you mean. It's just that there are two chapters that take place in the same town. I think all the NPCs say different things, too.

Xanadu I should be a little easier because each chapter loads once, and that's it. Xanadu II loads different things as you move around each chapter.

But the script is already done for Xanadu II (at least, it's ready to go in for play-testing) so it's definitely the easier one to start with for me.  :P

If elmer gets the Xanadu I script extracted...that will be spectacular, because I've always wanted to translate that particular script, and I know it will be super-helpful to people who want to play the game.

elmer

I'm happy to report that the compression code for both the FALCOM1 and FALCOM2 algorithms is now working!  :D

They code still needs some refinement in order to handle RLE data, but what's there now will work quite happily in the game.

I've run some tests against 6 of the large code/data overlays that I think are used for the cutscenes.

I've included tests with my own SWD3 compression scheme, as that's what the all the code is based on.

2 things to note ...
  • Those "test" overlays must already contain a lot of compressed data.
  • There's still some room for me to improve my old SWD3 code!  :wink:

Test FALCOM1 ... original 294912 bytes, compressed 284241 bytes.
Test FALCOM2 ... original 294912 bytes, compressed 299736 bytes.
Test SWD3    ... original 294912 bytes, compressed 306785 bytes.

Test FALCOM1 ... original 229376 bytes, compressed 190377 bytes.
Test FALCOM2 ... original 229376 bytes, compressed 188628 bytes.
Test SWD3    ... original 229376 bytes, compressed 185190 bytes.

Test FALCOM1 ... original 229376 bytes, compressed 212515 bytes.
Test FALCOM2 ... original 229376 bytes, compressed 208487 bytes.
Test SWD3    ... original 229376 bytes, compressed 210073 bytes.

Test FALCOM1 ... original 229376 bytes, compressed 191549 bytes.
Test FALCOM2 ... original 229376 bytes, compressed 187772 bytes.
Test SWD3    ... original 229376 bytes, compressed 206422 bytes.

Test FALCOM1 ... original 229376 bytes, compressed 207758 bytes.
Test FALCOM2 ... original 229376 bytes, compressed 203940 bytes.
Test SWD3    ... original 229376 bytes, compressed 208467 bytes.

Test FALCOM1 ... original 229376 bytes, compressed 210301 bytes.
Test FALCOM2 ... original 229376 bytes, compressed 205523 bytes.
Test SWD3    ... original 229376 bytes, compressed 202607 bytes.

TurboXray

SWD3? So you're replacing the original compression scheme or that's your compressor to repack the files for the original decompressor?

elmer

--------------

If you look at Xanadu 1 (released 1994), the whole game is compressed with what I'm calling the "FALCOM1" compression.

FALCOM1 is basically a nice-and-simple mix of bytecodes for COPY/FILL/LZSS (so, at its heart, another LZSS varient).

In Xanadu 2 (released 1995), they're compressing some things with FALCOM1, and some things with a new "FALCOM2" compression.

FALCOM2 is still a mix of codes for COPY/FILL/LSZZ, but the actual encoding is an interleaved bitstream/bytestream with variable-length codes. It's much more sophisticated, and does a better job than FALCOM1 (except in a few rare edge cases).

From the dates, you can guess that they wrote FALCOM2 sometime in 1994/1995, and probably hadn't transitioned their entire toolchain over to using it, which would be why Xanadu 2 uses both.

The alternative is that they're choosing the best compressor on a chunk-by-chunk basis, but I find that explanation rather unlikely.

--------------

SWD was the not-quite-LZSS-style compressor that I wrote in the early 1990s when Unisys started litigating against people for using LZW.

I guess that I went through the usual series of compressors for those days, first Huffman, then LZW, then not-quite-LZSS (SWD1).

The earliest version of SWD1 that I still have archived goes back to 1992 when it was used on some Genesis and SNES games.

IIRC, SWD2 added interleaved bytes for the uncompressed COPY bytes, and then SWD3 in 1997 added the interleave-as-much-as-possible scheme that FALCOM2 is using.

SWD3 was used on a couple of N64 and GB/GBC games, but was made obsolete in the early 2000's by ZLIB.

--------------

So, now that the background is out of the way, the new FALCOM1 and FALCOM2 compressors that I've written are based on me updating my old SWD3 code.

Once I refactored the code so that the basic LZSS tree searching was generic enough, it's now really easy to code up any LZSS varient scheme pretty quickly.

I wanted to get to the point where it's going to be easy to try out LZ4's concept of LZSS-always-follows-COPY to see if I can get another couple of % of compression.

--------------

Where this is all leading, is I'd like to be able to rewrite Xanadu 2's entire data in just one compression format.

The could be FALCOM2, or it could be SWD3, or even a new SWD4.

The idea is that I want to free up some space in the "permanent" code at $2000-$3fff by removing one or both original decompressors, so that I have enough room to add some new font code, and perhaps even the font data itself.

The primary advantage of SWD3 over FALCOM2 is that the decompressor is smaller.

Now ... I don't know, yet, if I'll be able to do it ... but it seems like a good plan at the moment.

So ... TMI, or just plain TLDR?  :wink:

elmer

Quote from: elmer on 02/09/2015, 11:18 AMAnyway, the SWD source should be on github today and I'll send you both links.
So, Bonknuts ... I'm not actually sure if you ever looked at the old SWD3 compression code that I made available, but if you did, you'll be able to appreciate that the newly-refactored code is one heck of a lot easier to follow ...


// **************************************************************************
// * CompressFALCOM2 ()
// **************************************************************************
// * Compress data in Falcom's Xanadu 2 NEW format.
// **************************************************************************

uint8_t * CompressFALCOM2 (
  uint8_t * pSrcBuffer, unsigned uSrcLength,
  uint8_t * pDstBuffer, unsigned uDstLength )

{
  // Local Variables.

  int       iMatchLength;
  int       iMatchOffset;

  int       iSkipCount;
  int       iCopyCount = 0;
  uint8_t * pCopyArray = NULL;

  // Initialize the LZSS parameters for this compression scheme.

  InitLzss(2, 269, 0x1fff, 0);

  InitTree(pSrcBuffer);

  // Write the FALCOM2 header (there isn't one!).

  g_pDstBuffer = pDstBuffer;
  g_uDstLength = uDstLength;

  BitIO_Init();

  // Write 8-bits of zero to distinguish FALCOM2 from FALCOM1.

  BitIO_WordSend(8, 0);

  // Loop around encoding strings until the buffer is empty.

  uint8_t * pSrcFinish = pSrcBuffer + uSrcLength;

  for (;;)
  {
    // Update the window and calc how much data is left to compress.

    RmvString(pSrcBuffer - g_iLzssWindowLen);

    g_iThisMaxRepeat =
    g_iThisMatchSize = min(g_iLzssMaxRepeat, (pSrcFinish - pSrcBuffer));

    if (g_iThisMaxRepeat < g_iLzssBreakEven) break;

    // Add the current "string" to the LZSS tree, and find longest match.

    AddString(pSrcBuffer);

    // Convert g_iThisMatchSize into the real match length.

    iMatchLength = (g_iThisMaxRepeat - 1) - g_iThisMatchSize;

    // Output result of last string match.

    if (iMatchLength < g_iLzssBreakEven)
    {
      iSkipCount  = 0;
      iCopyCount += 1;
    }
    else
    {
      // First, write any COPY bytes.

      if (iCopyCount)
      {
        pCopyArray = pSrcBuffer - iCopyCount;

        while (iCopyCount--) TokenToBits_FAL2( 1, *pCopyArray++ );
      }

      iCopyCount = 0;

      // Then write the LZSS "match".

      iSkipCount   = iMatchLength - 1;
      iMatchOffset = (pSrcBuffer - g_pThisMatchTree->window);

      TokenToBits_FAL2( iMatchLength, iMatchOffset );
    }

    pSrcBuffer += 1;

    // Skip passed "matched" LZSS or RLE bytes, adding the strings to the tree.

    while (iSkipCount--)
    {
      RmvString(pSrcBuffer - g_iLzssWindowLen);

      g_iThisMaxRepeat =
      g_iThisMatchSize = min(g_iLzssMaxRepeat, (pSrcFinish - pSrcBuffer));

      AddString(pSrcBuffer);

      pSrcBuffer += 1;
    }

  } // End of "while (g_iThisMaxRepeat > 0)"

  // Encode the last COPY byte(s) (if there are any).

  pSrcBuffer += g_iThisMaxRepeat;
  iCopyCount += g_iThisMaxRepeat;

  if (iCopyCount)
  {
    pCopyArray = pSrcBuffer - iCopyCount;
    while (iCopyCount--) TokenToBits_FAL2( 1, *pCopyArray++ );
  }

  // Send EOF token.

  TokenToBits_FAL2(0, 0);

  // Flush out the last few bits.

  BitIO_WordFlush();

  // All done.

  return (g_pDstBuffer);
}

elmer

I know that this will bore anyone that doesn't care about data compression, but it's interesting (to me) that a small change in the LZSS window size and Match Offset encoding can totally shuffle the results around.

// SWD3 LZSS Match Offset encoding ...
//
//   $0000-$0020 : 00        x xxxx
//
//   $0021-$00A0 : 01      xxx xxxx
//
//   $00A1-$02A0 : 10   x xxxx xxxx
//
//   $02A1-$06A0 : 11  xx xxxx xxxx


// SWD4 LZSS Match Offset encoding ...
//
//   $0001-$0020 : 00        x xxxx
//
//   $0021-$0120 : 01     xxxx xxxx
//
//   $0121-$1120 : 1 xxxx xxxx xxxx



Making that change and testing the same 12 Xanadu 2 overlays as before gives ...

Test FALCOM1 ... 284241 bytes.
Test SWD4    ... 298888 bytes.
Test FALCOM2 ... 299736 bytes.
Test SWD3    ... 306785 bytes.

Test SWD4    ... 183938 bytes.
Test SWD3    ... 185190 bytes.
Test FALCOM2 ... 188628 bytes.
Test FALCOM1 ... 190377 bytes.

Test SWD4    ... 203654 bytes.
Test FALCOM2 ... 208487 bytes.
Test SWD3    ... 210073 bytes.
Test FALCOM1 ... 212515 bytes.

Test FALCOM2 ... 187772 bytes.
Test FALCOM1 ... 191549 bytes.
Test SWD4    ... 205852 bytes.
Test SWD3    ... 206422 bytes.

Test SWD4    ... 200557 bytes.
Test FALCOM2 ... 203940 bytes.
Test FALCOM1 ... 207758 bytes.
Test SWD3    ... 208467 bytes.

Test SWD4    ... 201652 bytes.
Test SWD3    ... 202607 bytes.
Test FALCOM2 ... 205523 bytes.
Test FALCOM1 ... 210301 bytes.

Test SWD4    ... 204856 bytes.
Test SWD3    ... 208391 bytes.
Test FALCOM2 ... 209128 bytes.
Test FALCOM1 ... 212447 bytes.

Test SWD4    ... 204738 bytes.
Test SWD3    ... 206652 bytes.
Test FALCOM2 ... 208568 bytes.
Test FALCOM1 ... 212772 bytes.

Test SWD4    ... 206013 bytes.
Test SWD3    ... 208234 bytes.
Test FALCOM2 ... 210217 bytes.
Test FALCOM1 ... 214201 bytes.

Test SWD3    ... 203045 bytes.
Test SWD4    ... 203110 bytes.
Test FALCOM2 ... 205951 bytes.
Test FALCOM1 ... 209932 bytes.

Test SWD4    ... 192798 bytes.
Test FALCOM2 ... 195037 bytes.
Test FALCOM1 ... 200050 bytes.
Test SWD3    ... 205955 bytes.

Test SWD3    ... 190289 bytes.
Test SWD4    ... 190366 bytes.
Test FALCOM2 ... 190741 bytes.
Test FALCOM1 ... 195937 bytes.

ccovell

Quote from: elmer on 09/14/2015, 04:35 PMSo ... TMI, or just plain TLDR?  :wink:
No way, this is interesting stuff.

dshadoff

QuoteSWD was the not-quite-LZSS-style compressor that I wrote in the early 1990s when Unisys started litigating against people for using LZW.
Heh, seems like you and I are probably about matched at being the old men of the board.

QuoteI know that this will bore anyone that doesn't care about data compression, but it's interesting (to me) that a small change in the LZSS window size and Match Offset encoding can totally shuffle the results around.
Not at all - I'm interested.

QuoteMaking that change and testing the same 12 Xanadu 2 overlays as before gives ...

Test FALCOM1 ... 284241 bytes.
Test SWD4    ... 298888 bytes.
Test FALCOM2 ... 299736 bytes.
Test SWD3    ... 306785 bytes.
.
.
.
...But you're overlooking one very important thing:

The data that you will be recompressing later will NOT be the same as the data in those blocks today, so a test on SWD3/SWD4 compressibility of existing data is not very relevant.

For one thing, text is usually encoded in these things as 2-byte SJIS for kanji, and 1-byte JIS for kana.  It doesn't compress anywhere near as well as English (though it is generally slightly more dense to begin with).

I'm not sure whether graphics and other data is mixed in with those blocks you're analyzing, but that may also be a factor in compressibility.

I would recommend getting one "average" block extracted, a "sample" preliminary English translation written up, and validate compression on that.  I put everything in quotation marks, since these are not going to be average or even close translations - but rather a representative test.

-Dave

elmer

Quote from: dshadoff on 09/14/2015, 09:33 PMHeh, seems like you and I are probably about matched at being the old men of the board.
Haha ... someone needs to remind these young'uns that us old folks aren't totally useless, yet!  :wink:


Quote...But you're overlooking one very important thing:

The data that you will be recompressing later will NOT be the same as the data in those blocks today, so a test on SWD3/SWD4 compressibility of existing data is not very relevant.
I'm definitely not overlooking that, I'm just enjoying tweaking the SWD parameters since it's so easy now.

At the end-of-the-day, the SWD3 encoding scheme is remarkable similar to the FALCOM2 encoding scheme, and I'm just having some fun.

The current test is totally bogus. From what I can tell, these test overlays probably already contain compressed data, so this is a massively unrealistic test, and only really shows degenerate worst-case performance.

The next thing to write will be a better test that actually decompresses the existing META_BLOCKs and re-compresses them.

I also still need to add RLE support to the FALCOM1 and FALCOM2 compressors, and at the same time, add "long" matches to SWD4 (which will do basically the same thing).

Xanadu 1 and 2 use a mix of 2-byte SJIS + 1-byte "common-glyphs".

I don't think that the "common-glyphs" are JIS, it looks like too-specialized a list, but I haven't confirmed that, yet. My memory of the kanji and JIS codes could be wrong.


QuoteI'm not sure whether graphics and other data is mixed in with those blocks you're analyzing, but that may also be a factor in compressibility.
Yep, META_BLOCKs contain graphics, script-data (incl text) and code.


QuoteI would recommend getting one "average" block extracted, a "sample" preliminary English translation written up, and validate compression on that.
Absolutely!

One step at a time. I'm just "blogging" about what I'm finding.  :)

elmer

Since we're talking about compressing the script code DATA_CHUNKs, here's the basic format of the scripting language.

Script code DATA_CHUNKs contain HuC6280 code, this script code, and data. The script code can actually be inlined inside regular HuC6820 code.

That annoying capability, together with the multitude of hard-coded address and uncertain indirect table accesses, all mix together to make this rather a PITA to hack.

There is no clear separation between code and script, and this may turn out to be unhackable in a practical sense (particularly with 570 of these script chunks to go through).  :(


==============================================================================
XANADU II - SCRIPT BYTECODE
==============================================================================
Script code $00 (+0)  _end_of_string
Script code $01 (+0)  _wait_for_keypress_then_end
Script code $02 (+0)  _wait_for_keypress
Script code $03 (+0)  _end_of_line
Script code $04 (+0)  _clear_dialog_box
Script code $05 (+7)  _conditional_jump_script_address
Script code $06 (+2)  _jump_script_address
Script code $07 (+2)  _call_script_address
Script code $08 (+1)  _wipe_out_text
Script code $09 (+0)  _set_9ca9
Script code $0a (+0)  _clr_9ca9
Script code $0b (+4)  _call_script_lookup_table
Script code $0c (+0)  _wait_for_keypress_then_clear
Script code $0d (+4)  _tst_2b03_x_beq_script_address
Script code $0e (+4)  _tst_2b03_x_bnz_script_address
Script code $0f (+2)  _set_bits_2b03_x
Script code $10 (+2)  _clr_bits_2b03_x
Script code $11 (+3)  _set_pen_call_script_address
Script code $12 (+3)  _set_pen_call_script_address_then_eol
Script code $13 (+2)  _move_cursor_yx
Script code $14 (+2)  _call_game_func_from_script
Script code $15 (+2)  _modify_script_variable
Script code $16 (+0)  _wait_for_keypress_then_eol
Script code $17 (+n)  _extended_codes ($00-$14)
Script code $18 (+0)  _set_pen_color_0
Script code $19 (+0)  _set_pen_color_1
Script code $1a (+0)  _set_pen_color_2
Script code $1b (+0)  _set_pen_color_3
Script code $1c (+0)  _set_pen_color_4
Script code $1d (+0)  _set_pen_color_5
Script code $1e (+0)  _set_pen_color_6
Script code $1f (+0)  _set_pen_color_7

Script code $20-$ff   printable glyph
==============================================================================


==============================================================================
XANADU II - MAP GLYPH CODE TO SJIS GLYPH (only for 12x12, 8x12 is different)
==============================================================================
N.B. Xanadu 2 and Xanadu 1 use identical lookup tables!
==============================================================================
Maps input byte $80-$98 -> 2-byte SJIS

$20 -> ($2862 + $00 = $2862)
$7f -> ($2862 + $be = $2920)
$99 -> ($2862 + $f2 = $2954)
$9f -> ($2862 + $fe = $2960)
$a0 -> ($2962 + $00 = $2962)
$ff -> ($2962 + $be = $2a20)

Xanadu 1 Table @ $28b8 :
Xanadu 2 Table @ $2862 :

0x8140, 0x8149, 0x8177, 0x8178, 0x81a8, 0x81a9, 0x81aa, 0x81ab,
0x819b, 0x819d, 0x81a0, 0x81a2, 0x81a4, 0x815c, 0x819e, 0x8199,
0x824f, 0x8250, 0x8251, 0x8252, 0x8253, 0x8254, 0x8255, 0x8256,
0x8257, 0x8258, 0x8163, 0x8164, 0x81a6, 0x8160, 0x8158, 0x8148,
0x82a0, 0x82a2, 0x82a4, 0x82a6, 0x82a8, 0x82a9, 0x82ab, 0x82ad,
0x82af, 0x82b1, 0x82b3, 0x82b5, 0x82b7, 0x82b9, 0x82bb, 0x82bd,
0x82bf, 0x82c2, 0x82c4, 0x82c6, 0x82c8, 0x82c9, 0x82ca, 0x82cb,
0x82cc, 0x82cd, 0x82d0, 0x82d3, 0x82d6, 0x82d9, 0x82dc, 0x82dd,
0x82de, 0x82df, 0x82e0, 0x82e2, 0x82e4, 0x82e6, 0x82e7, 0x82e8,
0x82e9, 0x82ea, 0x82eb, 0x82ed, 0x82f0, 0x82f1, 0x829f, 0x82a1,
0x82a3, 0x82a5, 0x82a7, 0x82e1, 0x82e3, 0x82e5, 0x82c1, 0x82aa,
0x82ac, 0x82ae, 0x82b0, 0x82b2, 0x82b4, 0x82b6, 0x82b8, 0xeefc,
0xeefc, 0xeefc, 0xeefc, 0xeefc, 0xeefc, 0xeefc, 0xeefc, 0xeefc,
0xeefc, 0xeefc, 0xeefc, 0xeefc, 0xeefc, 0xeefc, 0xeefc, 0xeefc,
0xeefc, 0xeefc, 0xeefc, 0xeefc, 0xeefc, 0xeefc, 0xeefc, 0xeefc,
0xeefc, 0x82ba, 0x82bc, 0x82be, 0x82c0, 0x82c3, 0x82c5, 0x82c7,
0x82cf, 0x8142, 0x8175, 0x8176, 0x8141, 0x8145, 0x8392, 0x8340,
0x8342, 0x8344, 0x8346, 0x8348, 0x8383, 0x8385, 0x8387, 0x8362,
0x815b, 0x8341, 0x8343, 0x8345, 0x8347, 0x8349, 0x834a, 0x834c,
0x834e, 0x8350, 0x8352, 0x8354, 0x8356, 0x8358, 0x835a, 0x835c,
0x835e, 0x8360, 0x8363, 0x8365, 0x8367, 0x8369, 0x836a, 0x836b,
0x836c, 0x836d, 0x836e, 0x8371, 0x8374, 0x8377, 0x837a, 0x837d,
0x837e, 0x8380, 0x8381, 0x8382, 0x8384, 0x8386, 0x8388, 0x8389,
0x838a, 0x838b, 0x838c, 0x838d, 0x838f, 0x8393, 0x834b, 0x834d,
0x834f, 0x8351, 0x8353, 0x8355, 0x8357, 0x8359, 0x835b, 0x835d,
0x835f, 0x8361, 0x8364, 0x8366, 0x8368, 0x8370, 0x8373, 0x8376,
0x8379, 0x837c, 0x836f, 0x8372, 0x8375, 0x8378, 0x837b, 0x82d2,
0x82d5, 0x82d8, 0x82db, 0x82ce, 0x82d1, 0x82d4, 0x82d7, 0x82da,
==============================================================================

SamIAm

Quote from: elmer on 09/15/2015, 01:55 PMSince we're talking about compressing the script code DATA_CHUNKs, here's the basic format of the scripting language.

Script code DATA_CHUNKs contain HuC6280 code, this script code, and data. The script code can actually be inlined inside regular HuC6820 code.

That annoying capability, together with the multitude of hard-coded address and uncertain indirect table accesses, all mix together to make this rather a PITA to hack.

There is no clear separation between code and script, and this may turn out to be unhackable in a practical sense (particularly with 570 of these script chunks to go through).  :(
Uh oh. Are you feeling pessimistic about it at this point? How much longer until you know if it's unhackable? :(

elmer

Quote from: SamIAm on 09/15/2015, 10:45 PMUh oh. Are you feeling pessimistic about it at this point? How much longer until you know if it's unhackable? :(
Haha ... I'm not pessimistic, yet. I just don't want to make people think that "it's-just-a-matter-of-time".

With all the inline-script and the table accesses, it's probably going to be safest to try to keep every piece of script starting exactly where it starts now, and then have them "jump" to "overflow" space if the new translations are longer than the old ones (which is very, very likely).

That's going to depend upon whether there is enough space in each script chunk (looks likely, so far), and how well everything compresses again in order to fit on the CD (thus my current obsession with the compression code).

A saner option might be to only do that for script chunks that cause trouble.

Either way, you're talking about some fairly complex custom-tool work.

Basically ... there are still a lot of "unknowns". We're not going to find out a lot of them until we're much further along.

There are good reasons why things started "breaking" when EsperKnight got stuff re-inserting into the game back in 2012/2013.

SamIAm

All right, that's good. I ask not only because I would be heart-broken if this turned out to be impossible (and I would), but because I would feel terrible if all this effort you've been putting in were in vain.

Keep fighting the good fight!

elmer

Following on from yesterday, here's a much better test of the compression.  :)

First of all, RLE/FILL "compression" has now been implemented on the FALCOM1 and FALCOM2 compressors, so that they're pretty much doing as well as they can.

SWD4 has been modified to handle "long" LZSS matches, which gives it similar capabilities to the other two.

Actually, that's a bit of a "win" for SWD4, since it can now handles long LZSS matches better than FALCOM1 and FALCOM2.

Here is a test with the 117 META_BLOCKs (containing a total of 5619 DATA_CHUNKs) that I've identified on the Xanadu 2 CD.

Each DATA_CHUNK is decompressed, and then re-compressed with the different compresssors.

The data is then decompressed again and checked to make sure that it matches, and that I haven't messed anything up.  :wink:


BLK $00d800  16 CHKs ( 32136 / 104960), Fal1  29851, Fal2  28259, Swd4  27511
BLK $019800  11 CHKs ( 14682 /  74752), Fal1  13839, Fal2  12582, Swd4  12328
BLK $03b800  26 CHKs ( 42163 / 123742), Fal1  39486, Fal2  36824, Swd4  36082
BLK $04b800  12 CHKs ( 34652 /  80890), Fal1  32424, Fal2  31004, Swd4  30008
BLK $057800  12 CHKs ( 34206 /  80410), Fal1  32007, Fal2  30614, Swd4  29632
BLK $06b800  68 CHKs (118017 / 328663), Fal1 116595, Fal2 110270, Swd4 107674
BLK $09b800  89 CHKs ( 95543 / 379309), Fal1  91707, Fal2  85034, Swd4  83184
BLK $0bb800  61 CHKs (114011 / 286011), Fal1 110462, Fal2 104885, Swd4 102774
BLK $0db800  55 CHKs ( 84303 / 254333), Fal1  82260, Fal2  78447, Swd4  77424
BLK $0fb800  81 CHKs (130363 / 330030), Fal1 127165, Fal2 121401, Swd4 119357
BLK $11b800  51 CHKs ( 85597 / 245597), Fal1  83983, Fal2  80039, Swd4  78129
BLK $13b800  63 CHKs (103383 / 277783), Fal1 104068, Fal2  99123, Swd4  97654
BLK $15b800  70 CHKs (122161 / 271474), Fal1 122143, Fal2 119055, Swd4 116976
BLK $17b800  70 CHKs (121745 / 271275), Fal1 121974, Fal2 118957, Swd4 116853
BLK $19b800  66 CHKs ( 96140 / 277360), Fal1  95517, Fal2  90947, Swd4  89213
BLK $1bb800  34 CHKs ( 56536 / 155385), Fal1  56478, Fal2  54107, Swd4  53104
BLK $1db800  40 CHKs ( 69557 / 180831), Fal1  68331, Fal2  65379, Swd4  64030
BLK $1fb800  69 CHKs ( 93231 / 272209), Fal1  91319, Fal2  86370, Swd4  84909
BLK $21b800  32 CHKs ( 60484 / 141896), Fal1  59357, Fal2  56619, Swd4  55538
BLK $23b800  80 CHKs (129556 / 335726), Fal1 131210, Fal2 123992, Swd4 122083
BLK $25b800  55 CHKs ( 80307 / 233746), Fal1  82202, Fal2  78762, Swd4  77099
BLK $27b800  30 CHKs ( 55393 / 147838), Fal1  55616, Fal2  53342, Swd4  52221
BLK $29b800 140 CHKs (129761 / 423581), Fal1 134560, Fal2 128675, Swd4 127112
BLK $2bb800 109 CHKs (117193 / 308241), Fal1 118403, Fal2 114808, Swd4 113539
BLK $2db800  55 CHKs ( 68137 / 188797), Fal1  67646, Fal2  64933, Swd4  63961
BLK $2fb800  37 CHKs ( 63586 / 173649), Fal1  63782, Fal2  61346, Swd4  60654
BLK $31b800  57 CHKs (102953 / 262028), Fal1 104137, Fal2  99915, Swd4  98074
BLK $33b800  42 CHKs ( 80543 / 190547), Fal1  83853, Fal2  80057, Swd4  78708
BLK $35b800  45 CHKs ( 81397 / 205559), Fal1  83247, Fal2  79854, Swd4  78715
BLK $37b800  75 CHKs (108932 / 303722), Fal1 112090, Fal2 106919, Swd4 104913
BLK $39b800  72 CHKs (119743 / 285443), Fal1 121991, Fal2 117705, Swd4 115610
BLK $3bb800  43 CHKs ( 80553 / 199276), Fal1  80202, Fal2  77686, Swd4  76112
BLK $3db800  27 CHKs ( 45832 / 126274), Fal1  45226, Fal2  43768, Swd4  43143
BLK $3e7800  27 CHKs ( 45292 / 126273), Fal1  44736, Fal2  43250, Swd4  42634
BLK $3fb800  81 CHKs (129981 / 328700), Fal1 126683, Fal2 121036, Swd4 118959
BLK $41b800  86 CHKs (129633 / 345309), Fal1 128780, Fal2 122215, Swd4 119987
BLK $43b800  78 CHKs (127978 / 333822), Fal1 129608, Fal2 122260, Swd4 120286
BLK $45b800  57 CHKs ( 85145 / 243184), Fal1  87370, Fal2  83771, Swd4  81973
BLK $47b800  30 CHKs ( 54465 / 145565), Fal1  54622, Fal2  52439, Swd4  51315
BLK $49b800  84 CHKs (103861 / 316767), Fal1 107355, Fal2 103001, Swd4 101418
BLK $4bb800  38 CHKs ( 54174 / 170327), Fal1  55353, Fal2  52368, Swd4  51026
BLK $4db800  68 CHKs (100187 / 273218), Fal1 104027, Fal2  98509, Swd4  95581
BLK $4fb800  60 CHKs (100042 / 258064), Fal1 101616, Fal2  97623, Swd4  95691
BLK $51b800  52 CHKs ( 94263 / 228393), Fal1  92668, Fal2  88265, Swd4  87048
BLK $53b800  69 CHKs (107178 / 277385), Fal1 105825, Fal2 100535, Swd4  98659
BLK $55b800  88 CHKs (125605 / 343422), Fal1 127549, Fal2 120043, Swd4 117933
BLK $57b800 128 CHKs (112958 / 395702), Fal1 115980, Fal2 107681, Swd4 106458
BLK $59b800  66 CHKs ( 97742 / 282299), Fal1 101377, Fal2  96609, Swd4  94937
BLK $5bb800 118 CHKs (126871 / 404564), Fal1 129726, Fal2 124624, Swd4 121988
BLK $5db800  38 CHKs ( 74543 / 197711), Fal1  74957, Fal2  71748, Swd4  70257
BLK $5fb800  40 CHKs ( 75728 / 180817), Fal1  74091, Fal2  71435, Swd4  70171
BLK $61b800  40 CHKs ( 75298 / 189088), Fal1  74536, Fal2  70936, Swd4  69542
BLK $63b800  78 CHKs (113370 / 318880), Fal1 115843, Fal2 109262, Swd4 107101
BLK $65b800  48 CHKs ( 86309 / 211542), Fal1  87805, Fal2  84286, Swd4  82623
BLK $67b800  44 CHKs ( 91023 / 200434), Fal1  91073, Fal2  88399, Swd4  86894
BLK $69b800 109 CHKs (128431 / 388810), Fal1 134333, Fal2 128397, Swd4 125629
BLK $6bb800  46 CHKs ( 91384 / 215513), Fal1  90738, Fal2  86942, Swd4  85311
BLK $6db800  39 CHKs ( 74140 / 176094), Fal1  72773, Fal2  69912, Swd4  68582
BLK $6fb800  29 CHKs ( 54103 / 136192), Fal1  53980, Fal2  52070, Swd4  51150
BLK $71b800  76 CHKs (110795 / 305754), Fal1 112685, Fal2 106517, Swd4 104385
BLK $73b800  45 CHKs ( 77990 / 201057), Fal1  79188, Fal2  76068, Swd4  74602
BLK $75b800  44 CHKs ( 91283 / 200485), Fal1  91371, Fal2  88729, Swd4  87198
BLK $77b800  97 CHKs (121333 / 340831), Fal1 124561, Fal2 119741, Swd4 117219
BLK $79b800  86 CHKs (123749 / 357193), Fal1 128921, Fal2 122206, Swd4 120659
BLK $7bb800  27 CHKs ( 46276 / 128389), Fal1  45615, Fal2  43296, Swd4  42963
BLK $7c7800  27 CHKs ( 45587 / 128376), Fal1  44970, Fal2  42617, Swd4  42296
BLK $7db800  27 CHKs ( 44691 / 128330), Fal1  44343, Fal2  41919, Swd4  41637
BLK $7e7800  27 CHKs ( 44691 / 128330), Fal1  44343, Fal2  41919, Swd4  41637
BLK $7fb800  78 CHKs (108116 / 339025), Fal1 110204, Fal2 104610, Swd4 102672
BLK $81b800  45 CHKs ( 74599 / 202875), Fal1  73912, Fal2  70720, Swd4  68997
BLK $83b800  65 CHKs (117268 / 270719), Fal1 112604, Fal2 107952, Swd4 106079
BLK $85b800  38 CHKs ( 66578 / 173030), Fal1  65038, Fal2  62433, Swd4  61302
BLK $87b800 124 CHKs (126573 / 423598), Fal1 131986, Fal2 123929, Swd4 122159
BLK $89b800  52 CHKs ( 73408 / 214494), Fal1  71951, Fal2  68429, Swd4  67084
BLK $921800  11 CHKs ( 31903 /  50190), Fal1  30335, Fal2  29016, Swd4  28317
BLK $981800  11 CHKs ( 30282 /  50250), Fal1  28443, Fal2  27026, Swd4  26299
BLK $9a1800  11 CHKs ( 30704 /  50714), Fal1  28845, Fal2  27404, Swd4  26659
BLK $9b9800  11 CHKs ( 29754 /  50675), Fal1  27832, Fal2  26344, Swd4  25665
BLK $9d1800  11 CHKs ( 23675 /  50668), Fal1  22001, Fal2  20721, Swd4  20169
BLK $9e9800  11 CHKs ( 33271 /  50600), Fal1  31069, Fal2  29868, Swd4  29150
BLK $a01800  11 CHKs ( 32325 /  50654), Fal1  30736, Fal2  29392, Swd4  28675
BLK $a19800  11 CHKs ( 29195 /  50675), Fal1  27606, Fal2  26376, Swd4  25680
BLK $a31800  11 CHKs ( 29422 /  50670), Fal1  26738, Fal2  25389, Swd4  24793
BLK $a49800  11 CHKs ( 34213 /  50638), Fal1  31693, Fal2  30353, Swd4  29520
BLK $a61800  29 CHKs ( 62801 / 135656), Fal1  60947, Fal2  58124, Swd4  57057
BLK $a7d800  28 CHKs ( 51948 / 135231), Fal1  50843, Fal2  48506, Swd4  48028
BLK $a99800  29 CHKs ( 67371 / 133491), Fal1  65509, Fal2  63748, Swd4  62418
BLK $ab5800  28 CHKs ( 55535 / 119050), Fal1  54953, Fal2  52932, Swd4  52193
BLK $ad1800  41 CHKs ( 70128 / 152329), Fal1  68245, Fal2  66051, Swd4  64870
BLK $aed800  32 CHKs ( 72545 / 152782), Fal1  70123, Fal2  67215, Swd4  65903
BLK $b09800  30 CHKs ( 67141 / 142584), Fal1  65537, Fal2  63174, Swd4  62208
BLK $b25800  32 CHKs ( 71226 / 153727), Fal1  69607, Fal2  67199, Swd4  66427
BLK $b41800  29 CHKs ( 56571 / 126640), Fal1  54882, Fal2  52881, Swd4  51801
BLK $b5d800  27 CHKs ( 51927 / 113543), Fal1  50107, Fal2  48409, Swd4  47577
BLK $b79800  47 CHKs ( 79391 / 181462), Fal1  76536, Fal2  73437, Swd4  72405
BLK $b95800  31 CHKs ( 75058 / 155482), Fal1  72784, Fal2  70114, Swd4  69454
BLK $bb1800  45 CHKs ( 72886 / 171103), Fal1  71697, Fal2  68215, Swd4  66870
BLK $bcd800  50 CHKs ( 77247 / 334769), Fal1  86828, Fal2  80564, Swd4  79402
BLK $c0b800 181 CHKs (219050 / 874272), Fal1 239061, Fal2 226151, Swd4 221642
BLK $c41800  18 CHKs ( 62926 / 120832), Fal1  65629, Fal2  64597, Swd4  63185
BLK $c53800  19 CHKs ( 53160 / 103360), Fal1  49296, Fal2  47661, Swd4  46336
BLK $c6b800  16 CHKs ( 32136 / 104960), Fal1  29852, Fal2  28262, Swd4  27513
BLK $c8b800  21 CHKs ( 51140 / 112128), Fal1  48184, Fal2  46493, Swd4  45498
BLK $ca3800  16 CHKs ( 32136 / 104960), Fal1  29852, Fal2  28262, Swd4  27513
BLK $cc3800  57 CHKs (199434 / 397056), Fal1 188925, Fal2 184372, Swd4 180648
BLK $cfb800  33 CHKs ( 76424 / 172544), Fal1  71708, Fal2  68987, Swd4  67235
BLK $d13800  16 CHKs ( 32136 / 104960), Fal1  29852, Fal2  28262, Swd4  27513
BLK $d33800  50 CHKs (104464 / 243040), Fal1  98147, Fal2  94999, Swd4  92681
BLK $d6b800  35 CHKs (106681 / 181696), Fal1 100150, Fal2  98901, Swd4  96782
BLK $da3800  36 CHKs ( 84970 / 174880), Fal1  79267, Fal2  77221, Swd4  75319
BLK $dbb800  16 CHKs ( 32136 / 104960), Fal1  29852, Fal2  28262, Swd4  27513
BLK $ddb800  30 CHKs (103193 / 184320), Fal1  96153, Fal2  94429, Swd4  92195
BLK $e13800  70 CHKs (214106 / 403328), Fal1 200108, Fal2 195444, Swd4 190255
BLK $e4b800  64 CHKs (142111 / 407712), Fal1 132204, Fal2 125784, Swd4 122085
BLK $e83800  34 CHKs (123210 / 199680), Fal1 115101, Fal2 113519, Swd4 110386
BLK $ea7000  16 CHKs ( 32136 / 104960), Fal1  29851, Fal2  28259, Swd4  27511
BLK $eb3000  11 CHKs ( 14682 /  74752), Fal1  13839, Fal2  12582, Swd4  12328

Found  117 total META_BLOCKs.
Found 5619 total DATA_CHUNKs.



The results ...

It looks like my compression code is often getting better results, even with FALCOM1 encoding scheme, than Falcom's original compression code. While that seems kinda nice ... it needs to be investigated.  :-k

FALCOM2 is always beating FALCOM1, and SWD4 is always beating FALCOM2.

Whatever else may be going on, the relative performance between the 3 compression schemes should be valid.  :D


The conclusion ...

The re-compressed data badly needs to be tested in the game.

These 3 META_BLOCKs need to be seriously examined to see why they're smaller on CD than I seem to be able to re-compress them ...  :-k

BLK $bcd800  50 CHKs ( 77247 / 334769), Fal1  86828, Fal2  80564, Swd4  79402
BLK $c0b800 181 CHKs (219050 / 874272), Fal1 239061, Fal2 226151, Swd4 221642
BLK $c41800  18 CHKs ( 62926 / 120832), Fal1  65629, Fal2  64597, Swd4  63185

SamIAm

Very interesting. I'll be curious to learn what it is that's causing the difference in compression that you mentioned at the end.

As I look through the script now, I see a great many lines that would not suffer greatly for having their character count reduced. We'll still probably want to have as much space as possible for the English text, but compromises can be made. It's just nice to know that there's a good programmer on the job.

Ted Woolsey, the guy who translated so many famous SNES RPGs from Squaresoft, lamented in an interview that his original draft of Chrono Trigger was cut down by as much as 60% in order to fit in the game...and yet fan hackers later discovered that with better compression and use of space, they could fit a lot more text in without expanding the ROM size. There were unusued assets in the game, too, which I'm not even sure if they stripped out.

That Woolsey was able to do a good job with that kind of limitation is really incredible.

NightWolve

#29
Quote from: elmer on 09/14/2015, 04:35 PMSo ... TMI, or just plain TLDR?  :wink:
It is designated a "blog" - the choice to be here and read any of it is the reader's, and no, I did read it despite the length so not TLDR for me... ;)

Gotta say, you're a rare breed - this work log is impressive and shows someone that can organize their code analysis very well!

That fix in the Zeroigar batch file regarding the FOR loop command and knowing the updated features to it for the NT version of the Command Console indicates you had access to Professional versions of Windows because the documentation for the batch file language is not included in Home versions of the Operating Systems. That's one thing I noticed, of course you may have just had an interest in it and picked up on things like that with Google and what not. ;)

Only reason I had some knowledge of that is from my old jobs, the access to the MSDN library I was given allowing me to take home many Windows Operating Systems like the Server and Professional versions, up to 2000/XP Professional, etc. Part of my work at times back then was testing executables on ALL available Windows versions and catching bugs which required having PCs with multiple partitions, Windows installs, and a boot manager allowing boot up to whichever version, etc. I remember once I even had to install a German version of Windows 98SE because our international clients were having trouble with an app the IT consultants had developed.

Gotta say, that big yearly MSDN Developers library case with ALL the CDs/DVDs of most all of Microsoft's products sure was a lot of fun! Of course if your company paid the $X,XXX subscription to be in it, you're given an account to log in the website to download almost anything you could think of also! This was a time of slow ISPs for residential connections and too high-priced business lines if you wanted something better, so the CD/DVDs were more valuable of course!

Anyway, did I read correctly back there somewhere, you say you wrote a compression algorithm that was used in professional games long ago ?? I do think a codec is something that separates the men from the boys in a certain way when it comes to software developers. It's a certain level of brain teaser and most coders will never get there... I think the best I ever got in that brain teaser dept was coding the binary tree sorting algorithm and a basic Pascal compiler back in the day when I was working on my Bachelors. ;) But no, a codec, still am not good enough for that...

P.S.

Did Falcom actually code for the PC Engine here ? All of the Ys games were actually developed by Hudson Soft under a Falcom partnership/license. Falcom provided the story and the music of course, but the rest was all Hudson Soft. It was even a great Hudson programmer's insistent on the idea to combine both Ys I&II as one game which worked out brilliantly!

Anyway, it appears the first Xanadu game was a partnership with Falcom and NEC (maybe just a developer/publisher team-up), but part II may actually all be Falcom by the looks of it. I thought they never actually coded for console systems and only really ever coded for the PC platform, starting with the Japanese PC-88 or whatever and then on to the Windows PC with their DirectX-based games (of which I hacked many :))...

elmer

#30
Quote from: NightWolve on 09/16/2015, 04:54 AMThat fix in the Zeroigar batch file regarding the FOR loop command and knowing the updated features to it for the NT version of the Command Console indicates you had access to Professional versions of Windows because the documentation for the batch file language is not included in Home versions of the Operating Systems.
Haha ... I have a hard time even finding the batch file documentation in Windows anymore. Microsoft have buried it deeper and deeper with every new release.

Just like the VBScript documentation ... where it's actually easier to find the documentation in old copies of Microsoft Office than it is to find it in new copies of Windows itself.

These days ... it's just quickest to go to http://ss64.com/

Anyway, "yes", I've been writing batch files for a long time, back to the DOS days when "those-in-the-know" replaced COMMAND.COM with JPSoft's wonderful 4DOS.COM.

4DOS still exists today in the form of TCC, which is my everyday-use replacement for the current Windows "Command Prompt".


QuoteGotta say, that big yearly MSDN Developers library case with ALL the CDs/DVDs of most all of Microsoft's products sure was a lot of fun! Of course if your company paid the $X,XXX subscription to be in it, you're given an account to log in the website to download almost anything you could think of!
If you ever want access again for cheap, just come up with a "business" name and apply for Microsoft's BizSpark program.  :wink:


QuoteAnyway, did I read correctly back there somewhere, you say you wrote a compression algorithm that was used in professional games long ago ?? I do think a codec is something that separates the men from the boys in a certain way when it comes to software developers.
I'm afraid that I'm not actually Lempel&Ziv smart!   :(

I just wrote various compressors based on other people's ideas and then used them in games because we needed to pack more data onto the disk/cart/CD.

If you look at the core concept of LZ77/LZSS compression, it is really easy to understand.

https://en.wikipedia.org/wiki/LZ77_and_LZ78

You can code up the algorithm with a brute-force search in only a few lines of C code.

It's only the optimizations that you can add to short-cut the search that make the code look complex and ugly.

That was why I posted the FALCOM2 compressor earlier ... so that people can see that the basic loop isn't that difficult to understand. The nasty search optimization is hidden in the 2 AddString() and RmvString() routines.

If you ever want to learn some more, I'd recommend Mark Nelson's "The Data Compression Book".

http://www.amazon.com/Data-Compression-Book-Mark-Nelson/dp/1558514341

My SWD compressor is really just LZSS, with the tree-search-optimization, and a custom static-encoding of output.

Wikipedia says "As of 2008, the most popular LZ77 based compression method is DEFLATE; it combines LZ77 with Huffman coding."

That's all that SWD is (and FALCOM2, as well) ... but ZLIB's dynamic Huffman coding of the match lengths and match offsets has been replaced with a static Huffman-like coding that was produced  by running lots of test data through a dynamic Huffman setup, and then by the hand optimizing the result to make decompression easy on 8-bit processors.

SWD is even named after Chapter 8 of Mark's book ... "Sliding Window Dictionary".


QuoteBut no, a codec, still am not good enough for that...
Don't be frightened by how ugly some of the stuff can look at first ... the core ideas to most of these things are pretty easy. It's just the nasty implementation details that can sometimes get in the way.

That's also why I posted the Zeroigar VWF code. So that you and others could take a look at it.

A lot of people seem to have some undue anxiety about writing a VWF routine that I just don't understand.

If you look at the inner-loop, it really isn't that difficult to trace through the data flow in your mind and see what it is doing. IMHO, it is actually easier to see that flow in the V810 code than it would be in 6280 code because all of the critical stuff can be kept in registers.

At the end-of-the-day, a VWF is really just another software sprite-drawing routine, and they're really not that tough.

Like most code ... it's the edge-cases that make things complex.

If you take a little time to trace through it, you'll have a lightbulb moment, and then be off writing your own. I'll give you the address in Zeroigar to put a breakpoint if you want to see it all "live".


QuoteDid Falcom actually code for the PC Engine here?
I believe so. To quote SamIAm, our local expert ...

Xanadu I, however, was created entirely by Falcom as an original PCE exclusive game. IIRC, it was the first game that they themselves made on any CD format. There was a huge buzz about it from the very first announcement, and happily, most would say that they lived up to expectations.


QuoteI thought they never actually coded for console systems and only really ever coded for the PC platform, starting with the Japanese PC-88 or whatever and then on to the Windows PC with their DirectX-based games (of which I hacked many :))...
Do you recognize the Xanadu 2 compression schemes?

I looked on RomHacking, and someone there was asking about hacking one of Falcom's PSP games, and it still seemed to be using the FALCOM2 compression scheme.

NightWolve

#31
Quote from: elmer on 09/16/2015, 04:58 PMDo you recognize the Xanadu 2 compression schemes?

I looked on RomHacking, and someone there was asking about hacking one of Falcom's PSP games, and it still seemed to be using the FALCOM2 compression scheme.
Well, here's a high-level C version of Hudson's codec I am using for Emerald Dragon thanks to David (dshadoff, one in the same in the thread). It's the same codec Hudson used in Ys IV, but with one line of code change and no processing of a 3-byte header in a text block, as ED text blocks are headerless. Coincidentally enough, he did similar compression tests back in 2004 when we were working on Ys IV to show improvements he made.

A bit of neat history here: So with the Ys IV project, I was left with just the decompression code from Neill Corlett and together we dumped the script. But, he quit the project and I took it over. I did at some point offer him $250 to help me complete it, along with a font hack, but while he said he respected the offer, he was just too interested in working on his current projects like the Playstation Sound Format stuff he was doing.

David however was good enough and studied the decompression code to reverse it. His first version boasted slightly better compression results. So he gave it to me, I got to work on converting it to a DLL so I could call it from VBA code in my Microsoft Access XP database "Translation Station" software... I screwed up though, and thought his code was bad... He designed it under the assumption of a command line tool and so all variable resets would occur upon start and exit. Since I made it into a DLL, all global variables remained static until application exit/unload so the 2nd and further calls to compress text blocks of the rest of the script resulted in garbage...

So yeah, just a dumb matter of zero'ing out variables per DLL call, but I missed it...  #-o That's what happens when you take code from multiple people without truly understanding it. Anyway, before realizing that, I asked him just for sanity's sake, to code it so that it produces exactly the same compressed output as the version Hudson used... I zero'ed in on his claims that he coded it slightly better and thought the problem might have something to do with that... He did so a few days or a week or so later and that's ultimately what I used with Ys IV. I realized what my mistake was, but since all the text blocks fit with the pure Hudson version, I went with that! However, 11 years later with Emerald Dragon, I used his better version because we're definitely gonna need it, at least for one of the biggest text blocks and it seems to work fine, so!!!

So yeah, here's what that code looks like for comparison:

/*
Credit & Thanks goes to Neill Corlett for providing notes and for
reverse-engineering the Hudson Ys IV text decompression algorithm!!!!
And David Shadoff who reversed it for recompression, completing the codec!
DLL interface mod by: Nicolas Livaditis (NightWolve)

Version: Emerald Dragon 1/26/2015
Code is updated from Ys IV to specifically work for Emerald Dragon
1-2 lines of code changes, minor! Also, text blocks don't have
3 bytes of header info - code removed to handle that!

For decompress, append "+= 0x11" after below line:
    winmptr[state] = src[si++];
winmptr[state] += 0x11;

For compress, find first line 2 times, append "-= 0x11":
    dest[di] = (uint8)pos;
dest[di++] -= 0x11;
*/
uint8 window[256]; /* holds sliding window data */
uint8 winptr;      /* input pointer into window buffer */
int   winptr_max;  /* max value of winptr (up to 256) */
int   winmptr;     /* output pointer from window buffer */

/* src     = pointer within source buffer
 * pos     = return value for index within window with best fit
 * maxmtch = max # of bytes to allow for checking
 * ret val = # of bytes matched at pos (0 = no matches; >=2 is match)
 
 Version 2.0 = David's better version!
*/

__forceinline int check_match(uint8 *src, uint32 *pos, int maxmtch) {
int len = 0;
int best = 0;
int temp = 0;
int i;

for ( i = winptr; i >= 0; i-- ) {
temp = is_match(src, (uint8)i, maxmtch);
if ( temp > 1 && temp > len ) {
best = i;
len = temp;
}
}
if ( winptr_max == 256 ) {
for ( i = 255; i > winptr+1; i-- ) {
temp = is_match(src, (uint8)i, maxmtch);
if ( temp > 1 && temp > len ) {
best = i;
len = temp;
}
}
}
*pos = best;
return len;
}

/***************************************************************************/
/* Version 2.0: David's better version! */

__forceinline int is_match(uint8 *src, uint8 win_index, int maxmtch) {
int i = 0, meet_head = 0;
uint8 test_byte;

while (1) {
if ( i >= maxmtch )
break;
if ( (i+win_index) == winptr )
meet_head = 1;
if ( meet_head == 1 )
test_byte = window[winptr];
else
test_byte = window[(uint8)(i+win_index)];
if ( *(src+i) != test_byte ) {
if (meet_head == 1)
i--;
break;
}
if ( meet_head )
break;
i++;
}
return i;
}

/***************************************************************************/

/*
 * ED format:
 * cc    = control byte
 *         each bit describes a byte in the upcoming stream
 *         1 = take byte as literal
 *         0 = use byte to describe a copy from previous sliding window;
 * xx    = byte
 *
 *
 */

DllExport long EncodeLZSS(LPBYTE src, LPBYTE dest, uint32 *srcl_out, uint32 *dstl_out) {
int control, control_loc, dup, i, j;
int len_loc, dest_window, iter;
uint32 pos, srcl, si, di;
uint8 len, len_half, db;

fmemzero(window, 256);
__asm {
XOR EAX, EAX
MOV len_loc, EAX
MOV dest_window, EAX
MOV iter, EAX
MOV len, AL
MOV len_half, AL
MOV db, AL
MOV srcl, EAX
MOV winptr, AL
MOV winmptr, EAX
MOV winptr_max, EAX
}
si = di = 0;
srcl = *srcl_out;
for ( ;; ) {
control = 0x00;
dup = 0;
control_loc = di++;
for ( i = 0; i < 8; i++ ) {
#if _TRACE
fprintf(stdout, "si = %X, di = %X, ", si, di);
#endif
if ( si >= srcl )
break;
//* Search for duplication */
dup = check_match(&src[si], &pos, 16);
if ( dup >= (srcl - si) )
dup = (srcl - si);
if ( dup < 2 ) {
#if _TRACE
fprintf(stdout, "literal  = %02X", src[si]);
#endif
control |= (0x80 >> i);
window[winptr++] = src[si];
winptr_max++;
if ( winptr_max > 256 )
winptr_max = 256;
dest[di++] = src[si++];
} else {
for ( j = 0; j < dup; j++ ) {
db = window[winptr++] = src[si++];
winptr_max++;
if ( winptr_max > 256 )
winptr_max = 256;
#if _TRACE
fprintf(stdout, "%02X continuing match\n", db);
fprintf(stdout, "si = %X, di = %X, ", si, di);
#endif
}
dup -= 2;
if ( !len_half ) {
len = ((dup << 4) & 0xF0);
len_half = 1;
// Emerald Dragon upgrade from Ys IV
dest[di] = (uint8)pos;
dest[di++] -= 0x11; //ED diff
len_loc = di++;
#if _TRACE
fprintf(stdout, "duplicate, loc = %02X, len = %02X ", pos, len);
fprintf(stdout, "(first half)");
#endif
} else {
len |= (dup & 0x0F);
len_half = 0;
// Emerald Dragon upgrade from Ys IV
dest[di] = (uint8)pos;
dest[di++] -= 0x11; //ED diff
dest[len_loc] = len;
#if _TRACE
fprintf(stdout, "duplicate, loc = %02X, len = %02X ", pos, len);
fprintf(stdout, "(second half) - loc = %X, byte = %02X", len_loc, len);
#endif
}
}
}
#if _TRACE
fprintf(stdout, "fixing control loc %X = %02X\n", control_loc, control);
#endif
dest[control_loc] = control;
if ( si >= srcl )
break;
}
dest[len_loc] = len;
if ( dstl_out )
*dstl_out = di;
return 0;
}

My __asm block is just quick zero'ing and could be replaced with variable = 0; of course to regain ANSI compiling compatibility, along with the fmemzero macro I wrote.

fmemzero translates to this:

// One of the fastest ASM functions to zero out memory
// For arrays/structs (value-defined, not via pointer) - Careful, use of wrong one will crash!
#define fmemzero(aBuffer, dwBytes)\
__asm CLD \
__asm XOR   EAX, EAX \
__asm LEA   EDI, aBuffer \
__asm MOV   ECX, dwBytes \
__asm MOV   BL, CL \
__asm AND   BL, 3 \
__asm SHR   ECX, 2 \
__asm REP   STOSD \
__asm MOV   CL, BL \
__asm REP   STOSB

Instead of 1,000 x86 Assembly instructions that the MS VC++ compiler will link your code to if you use functions like memset() or the WinAPI of ZeroMemory(), I do it the 80386 way cause by gosh, I remember my 8086 Assembly classes from the 90's (I first learned in 16-bit) and every 32-bit (or better) Intel processor or compatible since the 70's will support those particular x86 instructions!! :)

Another good one I'm proud of is memcpy which easily converts to strcpy as well. The x86 that the VC++ Compiler links into your executable is humongous, but yet it can be done with a handful of built-in Intel x86 instructions from day one!

// One of the fastest ASM functions to copy one buffer to another
// Use SIZE asm operand if need be for count: e.g. SIZE char_array
#define fmemcpy(dest, src, count)\
__asm CLD \
__asm MOV   ESI, src \
__asm MOV   EDI, dest \
__asm MOV   ECX, count \
__asm MOV   AL, CL \
__asm AND   AL, 3 \
__asm SHR   ECX, 2 \
__asm REP   MOVSD \
__asm MOV   CL, AL \
__asm REP   MOVSB

That's the right way to do it, using the Source and Destination index registers for operations they were meant to perform... If you disassemble what the compiler produces, you'd see it does basic register-indirect mode moves with registers holding the memory address. Something like this, just as a quick example I did on the fly here:

  MOV EDX, src
  MOV EBX, dest
  MOV ECX, count
LOOP_IT:
  MOV AL, BYTE PTR [EDX]
  MOV BYTE PTR [EBX], AL
  INC EDX
  INC EBX
  DEC ECX
  CMP ECX, 0
  JNZ LOOP_IT

Basic as can be. Neither do I recall seeing any cleverness to use say DWORD PTR and move 4 bytes at a time 32-bit style, and then catch the remaining 3 or less bytes if Count is not an even multiple, but yeah! And by the time you actually get to that code, you will have passed hundreds of instructions doing checks of sorts whose value is questionable, but I get that many coders aren't careful and so they need library functions with some universal aspect to them with OS safeguards, etc.

elmer

Thanks for sharing that. Nice!  :)

A classic brute-force search. Easy to write, and easy to understand!  :wink:

So, if I'm getting this right ...

YsIV was using an LZSS-varient with a 4-bit length (2..17), and an 8-bit offset (1..256).

Emerald Dragon is basically the same, but changes the 8-bit offset to (17..272).

Errr ... that's a bit silly of Hudson IMHO since, if nothing else, it kills the LZSS look-forward into the uncompressed data that allows it to compress RLE and other simple pattern-based data.

elmer

#33
Quote from: SamIAm on 09/16/2015, 02:43 AMVery interesting. I'll be curious to learn what it is that's causing the difference in compression that you mentioned at the end.
Me, too!


QuoteAs I look through the script now, I see a great many lines that would not suffer greatly for having their character count reduced. We'll still probably want to have as much space as possible for the English text, but compromises can be made.
That's really good to know.

I think that most of the  script chunks have plenty of spare space, but there are some that look pretty full. There's also the worry of expanding the size of the META_BLOCK on CD and overwriting something else, but we'll only deal with that when we have to!

-----------------

Back to the investigations ...

One of the "big ideas" in the currently-popular LZ4 compression scheme is that you can COPY multiple bytes at a time, rather than using a bit for each one to say that it's a COPY and not an LZSS match.

This also allows LZ4 to guarantee that a COPY is followed by an LZSS match, which saves another bit.

Unfortunately, if an LZSS match directly follows another LZSS match, then 4 bits are wasted to indicate a COPY length of zero.


So I decided to look at the Xanadu 2 data and get some stats on how often an LZSS match is followed by another LZSS or a COPY.


After one LZSS, number of times COPY is next ...   867,693
After one LZSS, number of times LZSS is next ... 1,290,909



I also decided to take a look at how often each COPY length crops up, and to see if Huffman encoding them would help.


COPY   1 occurs 291,422 times.
COPY   2 occurs 152,959 times.
COPY   3 occurs 101,741 times.
COPY   4 occurs  70,629 times.
COPY   5 occurs  48,244 times.
COPY   6 occurs  37,597 times.
COPY   7 occurs  27,232 times.
COPY   8 occurs  23,353 times.
COPY   9 occurs  16,682 times.
COPY  10 occurs  16,159 times.
COPY  11 occurs  12,960 times.
COPY  12 occurs  11,062 times.
COPY  13 occurs   8,580 times.
COPY  14 occurs   8,237 times.
COPY  15 occurs   5,806 times.
COPY >15 occurs  40,045 times.

Number Of Copies 872,708, Total Bytes Copied 3,882,406.


COPY Length   Huffman Encoding

   1 00
   2 01
   3 100
   4 1010
   5 1011
 >15 1100
   6 11010
   7 11011
   8 11100
   9 111010
  10 111011
  11 111100
  12 111101
  13 111110
  14 1111110
  15 1111111

elmer

#34
Quote from: NightWolve on 09/16/2015, 08:34 PMWell, here's a high-level C version of Hudson's codec I am using for Emerald Dragon thanks to David (dshadoff, one in the same in the thread). It's the same codec Hudson used in Ys IV, but with one line of code change and no processing of a 3-byte header in a text block, as ED text blocks are headerless.
OK, for anyone that finds that all the window code in NightWolve's example just a little confusing ...

I thought that I'd just knock-up the brute-force version of the search code in a form that's easier to understand.

No "window" to update, and nothing that needs to be reset in a DLL. LZSS is very simple!  :wink:

#if defined( forEmeraldDragon )
 const unsigned g_uMinimumOffset  = 17;
 const unsigned g_uMaximumOffset  = 272;
#else
 const unsigned g_uMinimumOffset  = 1;
 const unsigned g_uMaximumOffset  = 256;
#endif

const unsigned g_uMaximumLength   = (15 + 2);

unsigned g_uBestMatchLength = 0;
unsigned g_uBestMatchOffset = 0;

// Parameters ...
//
// uint8_t * pSourceStarts /* Ptr to 1st byte of data in the array */
// uint8_t * pSourceFinish /* Ptr to 1st byte beyond last of array */
// uint8_t * pSourceBuffer /* Ptr to current byte of data in array to compress */

void FindBestMatch( uint8_t * pSourceStarts,  uint8_t * pSourceFinish, uint8_t * pSourceBuffer )
{
  g_uBestMatchLength = 0;
  g_uBestMatchOffset = 0;

  for (unsigned uTestOffset = g_uMinimumOffset; uTestOffset <= g_uMaximumOffset; uTestOffset += 1)
  {
    uint8_t * pTestString = pSourceBuffer;
    uint8_t * pStringStop = min( pSourceBuffer + g_uMaximumLength, pSourceFinish );
    uint8_t * pLzssWindow = pSourceBuffer - uTestOffset;

    if (pLzssWindow < pSourceStarts) break;

    while (pTestString < pStringStop)
    {
      if (*pTestString != *pLzssWindow) break;
      pTestString++;
      pLzssWindow++;
    }

    unsigned uMatchLength = pTestString - pSourceBuffer;

    if (uMatchLength > g_uBestMatchLength)
    {
      g_uBestMatchLength = uMatchLength;
      g_uBestMatchOffset = uTestOffset;
    }
  }
}


[EDIT]

Made even simpler at the potential cost of a compiler-optimization opportunity.

dshadoff

The difference between the original Hudson compression and the way I wrote it was:
- Hudson used the "first match" of greater than x characters (probably 2)
- I kept brute-forcing for all matches in the buffer, and used the "best match"

- So, if the search string was "grandmother"
- and if the buffer contained:
gracious grandiose grandmother

Hudson's search would reference "gracious" for 3 characters, whereas my search would get the complete match.

...Hard to believe that such a rookie mistake would be in there, but you'd be surprised at what people use as their "trusted" code.

-Dave

elmer

Quote from: dshadoff on 09/17/2015, 09:31 AMThe difference between the original Hudson compression and the way I wrote it was:
- Hudson used the "first match" of greater than x characters (probably 2)

...Hard to believe that such a rookie mistake would be in there, but you'd be surprised at what people use as their "trusted" code.
Wow, that's a pretty silly mistake for them to make!  #-o

It looks like your code also stops the matching when you reach the current string position.

Is that another decision that they made that you've had to "simulate" in order to get exactly the same output stream and keep NightWolve happy?

It would certainly explain why they changed the minimum offset to 17 in the Emerald Dragon version of the codec.

dshadoff

Quote from: elmer on 09/17/2015, 11:55 AMWow, that's a pretty silly mistake for them to make!  #-o

It looks like your code also stops the matching when you reach the current string position.

Is that another decision that they made that you've had to "simulate" in order to get exactly the same output stream and keep NightWolve happy?

It would certainly explain why they changed the minimum offset to 17 in the Emerald Dragon version of the codec.
I didn't have access to the Hudson encoder - I just had their decoder, which I wrote backwards in order to make the encoder.  Any remaining mistakes are my own =)

I found it odd that my encoder compressed the same expanded data better than the data in the game itself, which confused me at first - but then I "suboptimized" the code in the way I described, and the data block lengths matched so I concluded that was the problem.

NightWolve

#38
Nicely written, reminds me of Adaptec programmers (and some Microsoft ones) and their code samples in the ASPISDK: perfect indentation, elegant use of "C", disciplined use of variable naming, etc.

I've come to hate C++ and I think they should've probably just went right to Java for object-oriented enhancement rather than having given birth to that abomination... ;)

"C" is what you want when you want to keep the language close to the target machine language you're compiling to I think. Fast and compact with a good enough high-level aspect to making building/maintaining/organizing easier as opposed to going full-on ASM and not caring about some level of possible portability to other OS platforms if interested.

I just can't in good conscience use C++ after disassembling executables with IDA Pro or DASM and seeing what the compilers produced to implement it eons ago... 10,000 instructions for this, that or the other ?? Screw that! And all the crazy syntax and notation schemes they kept coming up with in the language... Whatever... I'm happy to have avoided it in the business world for the most part. I'd rather use Visual Basic's object-oriented style if I'm looking for faster development time for an app at the cost of performance.

Anyway, as to your code sample, I can replace the call to "dup = check_match(&src[si], &pos, 16);" with your best match version and you say I can then eliminate the 256-byte sliding window buffer ? I'd say finish your version of what EncodeLZSS() should look like also. I'll have to go through breaking working code to try to upgrade to your FindBestMatch(), work I don't need. But yeah, definitely interested in a more optimized and/or simplified expression of the algorithm.

elmer

#39
For historical amusement only, here's the 8086 assembly language version of the brute-force search from the 1993 version of the SWD compressor. Yay, rah-rah for that old 16-bit MS-DOS!  :wink:

                ASSUME  CS:_TEXT,DS:DGROUP

_TEXT           SEGMENT BYTE PUBLIC 'CODE'

_DoSearch       PROC    NEAR

                PUSH    BP
                MOV     BP,SP
                PUSH    SI
                PUSH    DI

                MOV     AX,DS
                MOV     ES,AX

                CLD

#Search1st:     MOV     DI,WORD PTR DGROUP:_SearchPtr
                MOV     AL,BYTE PTR DGROUP:_StringChr

                MOV     CX,WORD PTR DGROUP:_SearchLen
                OR      CX,CX
                JZ      #End

                REPNE   SCASB

                JNE     #End

#Found1st:      MOV     WORD PTR DGROUP:_SearchLen,CX
                MOV     WORD PTR DGROUP:_SearchPtr,DI

                MOV     DX,DI

                MOV     SI,WORD PTR DGROUP:_StringPtr

                MOV     CX,WORD PTR DGROUP:_StringLen
                OR      CX,CX
                JZ      #FoundAll

                XOR     AL,AL

                REPE    CMPSB

                JNE     #Search1st

#FoundAll:      MOV     AX,WORD PTR DGROUP:_StringLen
                MOV     BX,WORD PTR DGROUP:_StringMax

                CMP     AX,BX
                JE      #NewString

#EnlargeString: CMPSB
                JNE     #NewString

                INC     AX

                CMP     AX,BX
                JNE     #EnlargeString

#NewString:     MOV     WORD PTR DGROUP:_StringLen,AX
                MOV     WORD PTR DGROUP:_BestYetLen,AX
                MOV     WORD PTR DGROUP:_BestYetPtr,DX

                CMP     AX,BX
                JNE     #Search1st

#End:           DEC     WORD PTR DGROUP:_BestYetPtr

                POP     DI
                POP     SI
                POP     BP
                RET

_DoSearch       ENDP


It was much faster than the C version back then, but was later blown-away by the more sophisticated tree search technique that Haruhiko Okumura used in the famous LZSS.exe DOS compressor (from 1989 ... I was way late to the table in adding a "tree" search!).

It's Okumura's old public-domain source code that people still go to today, and walk away totally confused, and thinking that LZSS must be complex and scary.  #-o

It isn't.

It is the optimizations, like the tree that makes the search blindingly fast, that are scary, but they're not a part of the core LZSS algorithm, and aren't really needed on today's fast processors for small datasets like PCE games.

You can remove those old optimizations and just end up left with a clear-and-simple brute-force search, and still produce exactly the same compressed output.

But yet, you still see hangovers from Okumura's old source, like the ring-buffer that's in Dave's YsIV compressor, that really aren't needed anymore in a modern 32-bit LZSS compressor, and that only serve to complicate the code and obfuscate problems.

I literally only removed the ring-buffer from my SWD compressor a few days ago, so I totally understand why it's still there in a lot of code, and I'm not in any position to point fingers.

I'm just still trying to get the algorithm clarified to the point that NightWolve will have that "lightbulb" moment and realize just how simple it all is, and that he really doesn't need my help to modify the ED compression code.

I'm just not very good as a "teacher".  :(

SamIAm

#40
Got me to read about Harukiho Okumura. :)

Keep posting!

elmer

#41
Now that I'm pretty-much-done with the compression side of things, I thought that I'd report the results.

First of all, the reorganization of my compression code has allowed me to write an "SWD5" codec that incorporates the same flow as LZ4 (or FALCOM1 actually, just to show that there really is nothing new about LZ4).

If you're not familiar with the difference, let me try to explain.

-----------------------

Traditional LZSS compressors use a one-bit flag to say if the next thing in the compressed data is either a single byte to copy to the output, or a "match" with some data that's appeared earlier in the decompressed output.

The "matches" are where all the compression comes in. That extra 1-bit flag actually increases the size of the data by 1/8, which is why "uncompressible" files can be up to 1/8 larger when compressed with a traditional LZSS compressor.

LZ4 and FALCOM1 do things a little differently, in that instead of using a 1-bit flag to indicate that 1 byte should be copied from the compressed data to the output data, they instead store the number of bytes to copy from the compressed data to the output data.

That's it ... that's the big difference. It's not difficult to understand, even though it has a subtle-but-significant effect on the way that the code works.

So the next question is, "Does this improve the compression?".

Well, from my tests, the answer is "yes, but not by much".

The big advantage is that if you've got a lot of "uncompressible" bytes to copy, then storing the number can be a win.

It certainly reduces the "worst case" situation from adding 1/8 of the original file size, to just adding 2 or 4 bytes.

But in practical conditions in a game, you're not dealing with lots of "uncompressible" data, because if you find a "difficult" file, then you can just store it uncompressed.

-----------------------

So now there's SWD4 (using the traditional 1-bit flag) and SWD5 (using a count of "uncompressible" bytes to copy).

Here's how they stack up against FALCOM2 on some of the Xanadu 2 data ...

BLK $00d800  16 CHKs ( 32136 / 104960)  Fal2  28259  Swd4  27347  Swd5  27124
BLK $019800  11 CHKs ( 14682 /  74752)  Fal2  12582  Swd4  12215  Swd5  12140
BLK $03b800  26 CHKs ( 42163 / 123742)  Fal2  36836  Swd4  35828  Swd5  35580
BLK $04b800  12 CHKs ( 34652 /  80890)  Fal2  31004  Swd4  29886  Swd5  29665
BLK $057800  12 CHKs ( 34206 /  80410)  Fal2  30614  Swd4  29511  Swd5  29290
BLK $06b800  68 CHKs (118017 / 328663)  Fal2 110270  Swd4 106985  Swd5 106152
BLK $09b800  89 CHKs ( 95543 / 379309)  Fal2  85034  Swd4  82264  Swd5  81596


From those results it looks like SWD5 is "better" than SWD4, but the difference is tiny, an approximately 0.25% improvement.

It also looks like SWD5 is "better" than FALCOM2, but once again, the difference is hardly significant with an approximately 1% improvement.

So, my conclusion from all this is that while I just about win against Falcom's data-compression programmer in the bragging-rights contest ... it's not by enough of a margin to really matter.

IMHO, Falcom did a really good job with the FALCOM2 compression code back in 1994/1995!  :wink:

There's no reason to switch Xanadu 2 from FALCOM2 to SWD4/SWD5 from a "compression" point-of-view. It will only make sense if the decompression code is a significantly smaller, and I actually need to free up that memory.

-----------------------

For anyone that's curious in seeing how SWD4/5 stacks up against LZ4, here are the results from running the graphics test files from this site ...

http://www.brutaldeluxe.fr/products/crossdevtools/lz4/

Fal1  6,490  Fal2  5,844  Swd4  5,546  Swd5  5,556  LZ4  6,508  ANGELFISH.BIN
Fal1 23,379  Fal2 20,771  Swd4 20,617  Swd5 20,665  LZ4 23,520  ASTRONUT.BIN
Fal1 15,087  Fal2 14,027  Swd4 13,844  Swd5 13,863  LZ4 14,802  BEHEMOTH.BIN
Fal1  3,406  Fal2  2,519  Swd4  2,446  Swd5  2,449  LZ4  2,803  BIG.BIN
Fal1  9,329  Fal2  8,003  Swd4  7,933  Swd5  7,962  LZ4  8,865  BUTTERFLY.BIN
Fal1  7,039  Fal2  6,206  Swd4  6,102  Swd5  6,113  LZ4  6,654  CD.BIN
Fal1 18,368  Fal2 16,874  Swd4 16,335  Swd5 16,369  LZ4 18,876  CLOWN.BIN
Fal1  7,982  Fal2  7,215  Swd4  7,235  Swd5  7,260  LZ4  7,662  COGITO.BIN
Fal1 14,871  Fal2 12,748  Swd4 12,560  Swd5 12,581  LZ4 15,300  COTTAGE.BIN
Fal1 13,389  Fal2 11,873  Swd4 11,711  Swd5 11,733  LZ4 13,102  FIGHTERS.BIN
Fal1 13,431  Fal2 12,260  Swd4 11,936  Swd5 11,954  LZ4 13,220  FLOWER.BIN
Fal1 10,198  Fal2  9,029  Swd4  8,804  Swd5  8,813  LZ4  9,972  JAZZ.BIN
Fal1 14,834  Fal2 13,403  Swd4 13,182  Swd5 13,206  LZ4 14,810  KNIFE.BIN
Fal1 19,485  Fal2 17,932  Swd4 17,727  Swd5 17,730  LZ4 20,261  LORI.BIN
Fal1  8,724  Fal2  8,143  Swd4  7,893  Swd5  7,905  LZ4  8,643  MAX.BIN
Fal1 17,921  Fal2 15,114  Swd4 14,798  Swd5 14,803  LZ4 18,474  OWL.BIN
Fal1 20,625  Fal2 18,564  Swd4 18,285  Swd5 18,307  LZ4 20,595  RED.DRAGON.BIN
Fal1 15,496  Fal2 13,993  Swd4 13,483  Swd5 13,500  LZ4 16,306  TAJ.BIN
Fal1 12,907  Fal2 11,205  Swd4 11,128  Swd5 11,164  LZ4 12,551  TUT.BIN

elmer

Quote from: NightWolve on 09/17/2015, 07:46 PMI've come to hate C++ and I think they should've probably just went right to Java for object-oriented enhancement rather than having given birth to that abomination... ;)
C++ definitely sucks in practice ... but then, there's a lot of people that think the same about Java.

[RANT] I do NOT want a "virtual environment" chewing up my processor power just because some idiots don't know how to call free()! [END RANT]

C was designed by engineers to solve a problem.

C++ was designed by an academic, and later passed off to a committee of "architecture astronauts".

I've met a few programmers over the years that will proudly over-complicate the simplest task to the point of writing a fragile, un-maintainable mess.

I suspect that there's a lot of people like that on the C++ Standards Committee.

I really enjoyed watching Scott Meyer's talk at the D conference last year, it's a fascinating talk about some of the strange language design edge-cases in C++ ...
QuoteAnyway, as to your code sample, I can replace the call to "dup = check_match(&src[si], &pos, 16);" with your best match version and you say I can then eliminate the 256-byte sliding window buffer?
It's certainly possible to rewrite the code that you've got there so use a simpler version of the "brute-force" search.


QuoteI'll have to go through breaking working code to try to upgrade to your FindBestMatch(), work I don't need.
No problem, the point wasn't to get you to use my code ... it was to get you to want to understand the code that you're using.

We've both got plenty of work to keep us busy!  :wink:

Speaking of which ... it was really nice to find out today that the Xanadu 2 "Prologue" cinema is just written in the regular game engine, and uses script blocks that I've already identified.

I think that that's going to be the best point-of-attack in trying to get the first script chunks translated and re-inserted into the game!

TurboXray

Offtopic:

 I learned C++ first, or C with OOP (back in '96), and then switched over to straight C. To this day, I still don't know what all the fuss is about with classes and objects. I should know this, being a computer science major, but I won't be taking any CS courses into after my gen ed are out of the way.

 The only thing that annoyed me with C (C99), was that I couldn't created a struct, which contained an array and a set of pointers with offsets into that array. The runtime thingy in C, or whatever it's called, won't initialize the pointers. I don't see why not; those pointers should be relative to the array in which they all belong to (the struct).

Dicer

I have no idea what most of this means, but it's interesting reading...

My programming skills stopped at Basic and Cobol :/

elmer

Quote from: TurboXray on 09/24/2015, 12:38 PMOfftopic:
Not really, I'd say ... it's a "blog" about the programmer's side of trying to get a translation done.  :)

I'm trying to give people an insight into what-it-is that the programmer has to do to get the "translator's" work into a game.

That might give people a better idea of why most translations need a programmer's help, and why these things often get stalled or abandoned.

IMHO anything programming-related is fair-game. Even if it doesn't directly apply to a game, it'll probably apply to the tools that may need to be written.


QuoteI learned C++ first, or C with OOP (back in '96), and then switched over to straight C. To this day, I still don't know what all the fuss is about with classes and objects. I should know this, being a computer science major, but I won't be taking any CS courses into after my gen ed are out of the way.
I'm sure that the you'll be taught all sorts of wonderful stories about how OOP is wonderful, and how to program "properly" ... usually by someone that's never worked on a large software project, or had to maintain someone else's code.

Be skeptical!  :wink:

IMHO, there are some really good things that you can do with OOP techniques, but if you get "religious" about it, you can easily over complicate things.

This site is a fun read after you've had someone evangelizing the benefits of C++ to you ... http://yosefk.com/c++fqa/

Personally, I mostly write C code with the C++ compiler, but there are just some things that are much easier to express with classes.

But, like most old game devs, I avoid std:: and templates like the plague!


QuoteThe only thing that annoyed me with C (C99), was that I couldn't created a struct, which contained an array and a set of pointers with offsets into that array. The runtime thingy in C, or whatever it's called, won't initialize the pointers. I don't see why not; those pointers should be relative to the array in which they all belong to (the struct).
I'd be curious to see what you were trying to accomplish with that struct?

TailChao

Quote from: elmer on 09/24/2015, 09:25 PMNot really, I'd say ... it's a "blog" about the programmer's side of trying to get a translation done.  :)
Just wanted to chime in and say I'm really enjoying this.

It's extremely helpful to "compare notes" with how others structure their games, especially in regards to scripting and compression.

dshadoff

Quote from: elmer on 09/24/2015, 09:25 PMPersonally, I mostly write C code with the C++ compiler, but there are just some things that are much easier to express with classes.
Same here.
A good enough programmer is going to write code that has a beginning, a middle and an end, and is divided into all the necessary pieces neatly - no matter what language.  Kind of like the dream that C++ tries to sell.

On the other hand, a careless C++ programmer will write mangled crap anyway, no matter how much of the Kool-Aid they drank.

QuoteBut, like most old game devs, I avoid std:: and templates like the plague!
Or the clap.  "STD" only meant "sexually transmitted disease" when I was younger.  Then C++ came along and made it mean something equally unpalatable.

-Dave

NightWolve

Quote from: dshadoff on 09/25/2015, 08:39 PMOr the clap.  "STD" only meant "sexually transmitted disease" when I was younger.  Then C++ came along and made it mean something equally unpalatable.

-Dave
HAHAHAHAHAHA!

flame

Quote from: elmer on 09/16/2015, 04:58 PMI looked on RomHacking, and someone there was asking about hacking one of Falcom's PSP games, and it still seemed to be using the FALCOM2 compression scheme.
I guess I did that. It's like halfway between a straight copy of the MIPS code and a functional copy of the algorithm. I'm not smart enough to do anything more. I worked on Nayuta which used a little more complicated version of what you're calling FALCOM2. Trails in the Sky 3 PC was using straight FALCOM2.

I don't claim to be a good programmer. I can only do simple stuff. I like Python and can't understand C code. It has all those brackets {} and things. Also you have to understand C builtin functions and I'm not sure I do. I get that Python is slow. Unless you're doing a decryption algorithm though, it doesn't have to be fast, at least for Romhacking work.

Beginning, middle and end: My programs tend to follow: input, data processing, output. Is that what's meant?