Translation Patch – POST 1 – High-level Overview

Started by dshadoff, 12/03/2018, 09:37 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

dshadoff

So you want to create a translation patch, and you'd like to know how somebody does such a thing ?

I'll be putting together a few articles on the subject; I'm hoping that the forum-post format (thread per major subject) will allow a lively supplemental discussion to follow each post and explore a little deeper into areas in which people are interested. In the event that an initial post on a topic is too large, I'll try to break it into sections and post consecutively to the same thread.

Today, I'll just give an overview...forgive me if today's post seems too short on details (even though it's a pretty big post); the next post will provide plenty of details.

If you're thinking of doing a translation patch, you'll need to consider all the following:
  • Choosing a game to translate
  • Somebody to do the translation
  • Somebody to do the technical work
  • Since these two roles are rarely performed by the same person, you'll need to have a few things in common:
  • A method of communication
  • Keeping track of versions, and ensuring the back-and-forth remains cohesive (i.e. version control/management)

Before you even start, you should know that translations are marathons, not sprints. For a CD-ROM scale game, be prepared that months or even years may elapse between the start of the project and when you're ready to release.


The Game:

Choosing the game is all about trade-offs... the factors are basically:
  • Desirability of the game
    • Is anybody (especially yourself) going to actually play it ?
    • Are you going to get sick of it during play-test ?
  • Complexity – which can mean:
    • Size of the script
    • Complexity of the language in use (technical terms ? poetry ? jokes ?)
    • Whether the script is easily identified, and/or stored in compressed format, etc.
  • Whether there is enough space... and by this, I mean two things:
    • Whether English will fit comfortably on the screen. For example, it's easier to fit "FIGHT" (or similar word) into a choice box if the original Japanese is "たたかう", rather than "戦う". Again, English takes more characters to do the same job, but each character may be narrower.
    • Whether there appears to be space (i.e. in memory) to reinsert the script into the game itself. The more compressed it is, the more difficult it could be. Raw ASCII English (1-byte per character) and raw SJIS Japanese (2-bytes per character) are going to be relatively similar in size, because of the density of the languages themselves (note that French/Spanish may require more space to convey the same thoughts). If the Japanese stores kana as 1-byte, or if there is an efficient compression for the Japanese, you will need to worry about whether there is extra unused space available in memory (and/or on disk). Hopefully, you won't have to create a compression mechanism for a game which isn't currently stored as compressed text.

By the above standards, HuCards are probably going to be simpler projects than CDROMs...but you may not enjoy HuCards as much as CDROMs. On the other hand, CDROMs tend to use the System Card routines for printing kanji, so it may be technically easier to find the print function and the script on a CDROM game, because they are less likely to use compression or a non-standard character set than a HuCard game.

My example will be a CDROM game, so I'll show some techniques which are specific to that format.


The Programmer

The programmer doesn't need to know much Japanese, but it definitely helps. They should be able to recognize hiragana, katakana, and kanji on sight. An important factor is whether they can identify whether a word (or sentence/phrase) is complete, or whether some other odd encoding is in place. When I first looked at Emerald Dragon, I located partial words in the data track, but couldn't figure out why entire sentences couldn't be found...until I realized that there was compression involved.

The programmer should be familiar with the memory-mapping model of the PC Engine, assembly language, and 8-bit programming in general.

This shouldn't need to be mentioned, but the programmer should also be comfortable with hexadecimal, since they will be looking at hexadecimal listings in order to figure out where pointers are and where they point to, what the special codes mean, and so on.

The programmer will need to find and/or write various tools, because a lot of this work is so specialized that there aren't many tools flexible enough to do what you need them to.

Of course, it may be the case that more than one programmer exists on the project, but that would be distinctly rare. Such a team would be better-equipped to suffer the loss of team members, but communication and division of responsibilities would need to be considered.

Special notes to the programmer(s):
Even games which appear to play without issues will possess bugs which you may uncover. Games with apparent bugs – even rare ones – should be treated as poison.

You will likely find oddities when you are doing your text extract program; because of this, you will likely spend time cross-checking mis-referenced and unreferenced text, trying to infer why these issues exist. As a result, don't expect your extract program(s) to simply run one pass, and be complete. In the text extract post, I'll provide examples with some conclusions I have drawn.


The Translator

The translator needs to be able to deal with the fact that there are special characters in the script which will need to remain, because they are there for a purpose:
  • Change text color
  • Wait for a key
  • Speed up/slow down text
  • Short form for a character's or item's name
  • Switch between hiragana/katakana
  • Etc.

The translator will ideally be able to write well (i.e. literary, not just literate) in the target language, to be succinct where needed in order to make the text fit, and to maintain the correct "voice" for particular characters (i.e. a Southern accent on a character should remain consistent throughout the game). As such, the translator will in most cases need to manage which lines are being uttered by which characters, and what 'persona' those characters will project.


A brief note on choosing team members

Of course, skill is important, but first and foremost, they should be both patient and committed to the project. As I mentioned, any such project is a long-term project, and there will definitely be interruptions (i.e. "life"). Everybody must be in it for the same reasons, and persistent if a project is going to succeed.


Assets to translate

There's almost certainly going to be more material requiring translation than it first seems. Also, there's a difference between making a patch "good enough to make a game playable", and making a full translated version (this is another decision to be made, and team members should be in agreement).

Here is a list of necessary assets, and a list of easy-to-overlook ones:
  • Text – the script of the game which includes the narrative and the dialogue
  • Choice boxes (fight/run/status/etc.) – These are generally stored separately from the text, and are often stored in kana rather than kanji
  • Lists – you can consider these 'pronouns', as they are often in separate lists in order to save space. These may include items, magic spells, bestiary, potential fight outcomes (i.e. 'took a hit'/'missed'/'gained a level'), etc.
  • Scenery graphics which include signs/graffiti/etc.

Things which may not be top-of-mind include (leaving any of these out may make the translation seem less enjoyable – and sometimes even less playable):
  • Title screen graphics
  • Ending credits (could be text, and/or BG graphics and/or sprite graphics)
  • Incidental "text stored as graphics" (think 1960's Batman – "BAM !", "POW !" balloons). These graphics (often sprites) become more common in later games for the PC Engine (1993/1994 and onward)
  • Voices stored in ADPCM during normal gameplay (which often – but not universally - have accompanying text).
  • Narrative/dialogue in cinema scenes which normally doesn't have accompanying text; these could be stored as ADPCM or redbook audio. For any speech which doesn't have accompanying text, you'll need to consider whether (a) It's good enough to have a written translation in a textfile, (b) you want to dub a version, or (c) you believe that it's technically possible and even easier to subtitle any graphics (note: this last choice is not likely)

A few hints:
  • Text is easier to manage than graphics (which often needs an artist to modify).
  • Graphics is easier to deal with than audio. Audio doesn't just require a translated script, but also performers, sound mixing expertise, and... if you're a perfectionist, some additional programming skills to alter the pace of the lip movement. Or conversely, performers who can match lip movement. But anybody who has ever watched a dubbed live-action movie knows that this just can't be expected in a dub.


Programming Tools Needed

As I mentioned before, you may end up writing your own tools. I prefer to use off-the-shelf software where possible, but it isn't always possible...

You will need:
  • Hex editor which is capable of displaying SJIS. Even the best choices here have issues, like when the first byte of a 2-byte SJIS character occurs at the end of a line – so you can't clearly see the character.
    • Personally, I use a really old copy of Ultraedit that I bought 10 or 15 years ago, on a Windows 7 virtual machine, using Applocale to set the locale to Japan. Windows 10 doesn't seem to like Applocale, so if you use this tool, you will have to change the whole machine's locale to Japan (which probably isn't as bad as you think).
  • Debugging emulator which is capable of properly playing the game.
    • These days, Mednafen is pretty much ideal for this; it was really hard to get anything done before this. For the truly ambitious, you can also modify the source code to add your own features.
  • Programming tools, including an IDE. This is really more about what you're comfortable using. Keep in mind that Microsoft tools get refreshed every 3 years or so, and they pretty much always require some investment in updating your code in order to continue using their 'new, improved' tools. This can really creep up on you if you don't refresh your computer often – and suddenly you find yourself stuck with a huge amount of work modifying your code if you're 2 (or more) major versions behind.
    • I personally use Eclipse/CDT because I am comfortable with 'C' and like the IDE. It's also free and quite functional.
  • Graphics browsers and extractors, for graphic assets
    • There was once a program called TMOD2 which I had written PC Engine sprite and background plugins for, but the program won't run on modern Windows systems. You may be able to find other graphics browsers which can view tile formats, but I am currently not aware of any which can view sprites or accepts plugins for your own code for sprite viewing.
    • There are graphics extractors, but they can be esoteric... I am using FEIDIAN for my current project.
  • If you are planning to extract/re-implant ADPCM, I wrote a couple of tools for the Emerald Dragon project. But I have also heard that a program called 'SOX' is just as good (or perhaps better). I won't be able to provide any good advice on recording or mixing audio in general though – I'll leave that to somebody with more expertise than myself.

You'll also need to consider file formats for extract of the script, listing pointers, disk locations, etc.
...Well, this is one of those things where you're going to have to think about what works best for you, and it will probably depend on the script itself.

I originally started doing this by extracting each text block to a separate textfile, storing it in the original SJIS format, but I found some problems doing it this way, such as:
  • Not everybody can view SJIS easily anymore – it's gone out of favor since about 10 years ago, being replaced with UTF-8.
  • When the script was extracted as multiple files, it generally turned out to be a large number (> 60). These rapidly became unmanageable between translator and programmer.
  • While the extracted text files were strictly formatted with line separators, special codes, etc., more than one of my translator counterparts had difficulty maintaining that strict formatting, leaving a lot of work to clean up those files for reinsertion.

In the 'extract' post of this series, I'll show a different way that I decided to use on my current project.


Keeping Things in Sync

Of course, communication is the most important aspect here – the programmer and translator need to be on the same wavelength, and the tools need to built to support the workflow.

But there's still the problem of keeping track of where you are in the overall project, and being able to start up again after a gap in the schedule. For this, you'll need a couple more things:
  • Comments in the code, and notes files, detailing what you have completed, what is pending, and some challenges faced/how you overcame them. Human memory isn't perfect (as I can attest to).
  • You should keep "snapshot" backups of your work, at point in time which are meaningful (i.e. "finished first pass of city #1 'Ergot' text and bestiary").
    • Rather than keep a series of ZIP files with dates in them, it's a better idea to get used to a version control system as used by programmers. Git seems to be the current popular version, but others include Subversion, CVS, SCCS and RCS. Don't forget to clearly comment every commit, and keep a TODO list.
    • If you use Git, it is pretty easy to keep all of the files (programs, scripts, assets) in sync between more team members, so if you are thinking about a larger project with people who can work together, this is probably the way to go.

So, the above summarizes the basic 'first principles' in terms of starting and managing the project.

Of course, when we talk about Project Management, talk inevitably turns to the things which PMs are expected to manage:
  • expectations
  • scope
  • budgets
  • dependencies
  • delivery dates
  • quality
Since we're talking about something which would be made available for free, expectations are directly related to any advertising which the team may have done, and budgets are effectively zero (or whatever you are willing to contribute from your own pocket). So... not much to manage there.

Scope and quality are determined by the participants (see above, assets to translate, and 'are you going to get sick of it during play-test ?')

...Which leaves us with dependencies and delivery dates. These are determined by the availability of the participants, and the overall size of the project.
  • The technical part can vary so much that it boggles the mind
  • Play-testing is going to be related to the number of bugs, the size of the script, and how long the game usually takes to play. A digital comic, while it may have a similar (or even larger) script compared to an RPG, is almost certainly going to be faster to play test, because there is very little complexity - it's all about adjusting text and formatting, not about actually progressing in the game in order to see the line of script.
  • Translation of the script, however, seems to be the one component which should be possible to estimate. Of course, one 2500-line script can be a lot harder than another 2500-line script, but I wish I could give some sort of loose guess as to the size of the effort - if only to help people realize which projects are achievable with the resources they have. Maybe we can see some discussion on this point below.

I hope I have given you some things to think about in this introduction. Please provide feedback, comments, questions and discussion of this post as replies to this thread, below.

Upcoming topic: isolating the main print function and text

Continued: Part II