In my endeavor to make an open source1 RTS engine capable of playing EA SAGE engine games, such as The Battle for Middle Earth series, I will be reverse engineering the binary files involved2. I intend to outline the structure of such files as I work on them so that anyone else interested doesn’t have to do it all over again and also because there will probably be some things that don’t make sense to me, or I simply can’t find the purpose of. Please feel free to pitch in when such a situation arises. 🙂
.BIG files are storage containers, sometimes holding gigabytes3 of game assets. It is actually a rather simple format that doesn’t compress the files at all4. Here is a screenshot of one of BFME2’s smaller .BIG files in a hex editor:
The file can be split into two parts. First comes the header, which identifies and enumerates all the files contained in the archive. Second comes all the data contained in the files themselves. These are just lumped on one after another with no metadeta in between, so the only part that is really of interest is the header.
The first 4 bytes of the header are really simple. I’ve seen two types, “BIG4” and “BIGF”. All this does is tell you that this is, in fact, a .big file. At the moment I don’t know if there are any differences between the two. The next 4 bytes are the size of the file as a little-endian 32-bit unsigned integer, e.g. 0B 00 00 00 equals 11. All other numbers in the header are big-endian, e.g. 00 00 00 0B equals 11.5 Next comes the number of files contained in the .big file and the address at which the header ends.
After this come the individual file descriptors. These consist of 4 bytes for the address in the archive at which the file data starts6, 4 bytes holding the length of the file7, and a null terminated string for the file path. Immediately after the termination character of the string comes the next entry. After the last entry there is 4 bytes of text in the form of “LXXX”, where XXX is some number or other, usually 200 something. After that it’s just null data for the rest of the header8.
The rest of the file then should be raw file data, with each file entry in the header owning the data between its start addresses and its start address + it’s length, including the byte at the start address. As already stated there is no metadeta, and the files are typically lumped one after the other. As such the start address + the length of one file is typically the start address of the next.
Opening and reading from .BIG files is basically the only thing I have done for the engine so far, other than a logging system and the start of a parser for the text .INI files used to define game data. I may not get much further on the project, but at very least I will have learned some things. I only recently made the code public, and I was hesitant to do so at such an early stage, but I have been maintaining a commit streak on GitHub9, and putting it there means I’ll be more likely to devote time to it.
UPDATE: Thanks to this one person writing a .BIG explorer program, I found out what the first 4 bytes after the “BIG4″/”BIGF” are for. Post has been modified to describe it.
- My early work on the engine is available on GitHub, under the MIT licence.
- Or finding someone else who already did…
- Hence “big” 🙂
- It’s a good thing too, back in 2010 when I reverse engineered this format to make a basic command-line extractor was the first time I tried doing anything with a hex editor, and it wasn’t all that long after I first started programming. If it was anything complicated I probably would’ve given up on it. And then where would I be?
- In case you’re wondering, this is the number of files contained in the file open in the screenshot.
- In bytes from the beginning of the archive file.
- Also in bytes, but from the start address rather than the beginning of the archive file.
- This isn’t typically very long, just a byte or two.
- 640+ days, not too shabby if I do say so myself…
- I think I overdid the footnotes… And this one isn’t even referenced in the article!