The .BIG File Format

In my endeavor to make an open source1 RTS engine capable of playing EA SAGE engine games, such as The Battle for Middle Earth series, I will be reverse engineering the binary files involved2. I intend to outline the structure of such files as I work on them so that anyone else interested doesn’t have to do it all over again and also because there will probably be some things that don’t make sense to me, or I simply can’t find the purpose of. Please feel free to pitch in when such a situation arises. 🙂

.BIG files are storage containers, sometimes holding gigabytes3 of game assets. It is actually a rather simple format that doesn’t compress the files at all4. Here is a screenshot of one of BFME2’s smaller .BIG files in a hex editor:BFME2's English.big in okteta.

The file can be split into two parts. First comes the header, which identifies and enumerates all the files contained in the archive. Second comes all the data contained in the files themselves. These are just lumped on one after another with no metadeta in between, so the only part that is really of interest is the header.

The first 4 bytes of the header are really simple. I’ve seen two types, “BIG4” and “BIGF”. All this does is tell you that this is, in fact, a .big file. At the moment I don’t know if there are any differences between the two. The next 4 bytes are the size of the file as a little-endian 32-bit unsigned integer, e.g. 0B 00 00 00 equals 11. All other numbers in the header are big-endian, e.g. 00 00 00 0B equals 11.5 Next comes the number of files contained in the .big file and the address at which the header ends.

After this come the individual file descriptors. These consist of 4 bytes for the address in the archive at which the file data starts6, 4 bytes holding the length of the file7, and a null terminated string for the file path. Immediately after the termination character of the string comes the next entry. After the last entry there is 4 bytes of text in the form of “LXXX”, where XXX is some number or other, usually 200 something. After that it’s just null data for the rest of the header8.

The rest of the file then should be raw file data, with each file entry in the header owning the data between its start addresses and its start address + it’s length, including the byte at the start address. As already stated there is no metadeta, and the files are typically lumped one after the other. As such the start address + the length of one file is typically the start address of the next.

Opening and reading from .BIG files is basically the only thing I have done for the engine so far, other than a logging system and the start of a parser for the text .INI files used to define game data. I may not get much further on the project, but at very least I will have learned some things. I only recently made the code public, and I was hesitant to do so at such an early stage, but I have been maintaining a commit streak on GitHub9, and putting it there means I’ll be more likely to devote time to it.

UPDATE: Thanks to this one person writing a .BIG explorer program, I found out what the first 4 bytes after the “BIG4″/”BIGF” are for. Post has been modified to describe it.

‘Till later.

-chip

Footnotes:

  1. My early work on the engine is available on GitHub, under the MIT licence.
  2. Or finding someone else who already did…
  3. Hence “big” 🙂
  4. It’s a good thing too, back in 2010 when I reverse engineered this format to make a basic command-line extractor was the first time I tried doing anything with a hex editor, and it wasn’t all that long after I first started programming. If it was anything complicated I probably would’ve given up on it. And then where would I be?
  5. In case you’re wondering, this is the number of files contained in the file open in the screenshot.
  6. In bytes from the beginning of the archive file.
  7. Also in bytes, but from the start address rather than the beginning of the archive file.
  8. This isn’t typically very long, just a byte or two.
  9. 640+ days, not too shabby if I do say so myself…
  10. I think I overdid the footnotes… And this one isn’t even referenced in the article! o_O

6 thoughts on “The .BIG File Format

  1. Hi, David.

    I’ve recently been struggling with a specific .BIG file and came across your blog post in my search for a solution. Specifically, I’m trying to extract the music from a .BIG file from the game “WCW Mayhem” for the PS1.

    I have tried various programs that are supposed to be able to open .BIG files with no luck. At best, one program (PSound) was able to see there were files in the .BIG, but I couldn’t do anything with them. I know it’s all possible, as others have posted the music rips online, but I would like to be able to do it myself with my own game.

    I have no experience with programming, so admittedly the code stuff goes over my head, but I seem to be able to understand most of your post, which makes me feel like the solution to my problem is probably a lot easier than I’ve been making it out to be. I just don’t know where to start.

    If you could please lend a hand in helping me figure this all out, I would absolutely appreciate it.

    Thanks!

    Like

    1. It’s possible that it isn’t the same format even though it has the same extension. The simplest way to tell is to check what the first four bytes of the file are. Open the file up in a hex editor and see if it’s either “BIG4” or “BIGF”, if it isn’t the file has a different format with the same extension.

      Like

      1. Hey, David. Many thanks for replying, and sorry for taking so long to get back to you!

        I seem to have made progress, but I’m still a bit stuck.

        I opened the .BIG file in a hex editor, and it is indeed a “BIGF” file. Strangely though, “BIGF” is not the first four bits; it is preceded by 24 other bits. Everything that follows “BIGF” is similar to your example (clear file names, followed by all the individual file data). It’s really just the extra bits at the beginning that differs.

        When I made a copy of the .BIG file and removed the extra bits, one of the programs I had tried was now able to read and extract the files contained within. All 26 .ASF files are there, they all seem to have accurate file names and sizes…all good! The only problem is, I can’t open any of the files. I know I SHOULD be able to open ASF files in Windows, but none of the media programs I tried work.

        So, I’m still stuck. I’m assuming the extra bits are supposed to be there, but then it means no program can open the .BIG. Removing the extra bits lets programs open the file, but then the extracted files don’t seem to work.

        Any help or advice would be greatly appreciated! Thanks again!

        Like

      2. I am not familiar with asf files, although from what I saw on google & wikipedia it’s a container format. The codec used inside the container may not be one of the standard codecs. Either that or it’s a completely different format that happens to have the same extension. To check for that the wikipedia page for asf files (https://en.wikipedia.org/wiki/Advanced_Systems_Format) gives a magic number that should be at the start of each file.

        Like

  2. Thanks. I’m still confused over the extra bytes at the beginning of the .BIG file, but I think what I need to focus on instead is just tinkering with things as I have them since I’m much closer to a solution than when I started looking for one.

    Thanks again for all your help. I hope I don’t have to bug you anymore.

    Like

  3. Hey David,

    I’m pleased to say that I figured it all out! (Albeit on a different game, but they are both “BIGF” so I anticipate it will work the same.)

    Basically, instead of using any programs that were supposed to work on .BIG files, I took a little time to learn about scripts and utilized one to extract the ASF files. Then, all I had to do to get those to play was download the K-Lite codec pack and…voila! All the music is there and it all plays as it should.

    I want to thank you again for helping me last month. It helped direct me onto the right path to figuring it out, and now that I have I’m stoked.

    Thanks again!

    Liked by 1 person

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s