Difference between revisions of "Packages"

From SimsWiki
Jump to: navigation, search
(Auto-inserted from WakkaWikki)
 
m (Massivly cleaned. Could someone who knows this better please clean up some more?)
Line 1: Line 1:
{{OldWikiEntry}} ====Package File Format====
+
This article describes the specifics of the package format used in [[SimCity 4]] and [[The Sims 2]].
  
This will be Broken into 3 parts. DATS/Holes/Records, Compression, and DIR files
+
=Format=
 
+
(see [[DBPF]])
 
+
===Format of Simcity4 Archive data files ===
+
 
+
 
+
Most simcity Data is as you probably know stored in various types of archive files that all have the same internal format such as DAT, SC4, SC4lot, etc. This format is proprietary to maxis and has been used in the sims online as well as this one.
+
Header (96 bytes)
+
File 1
+
File 2
+
File n
+
Index Entry 1 (20 bytes)
+
Index Entry 2
+
Index Entry n
+
 
+
Header first, individual files following with no filenames and a small file header area at the beginning of each followed by an index of all the files in the archive at 20 bytes per entry.
+
 
+
Header:
+
 
+
Offset 00 - Identifier (DBPF - the type of dat)  
+
Offset 04 - Version Major (1 in SC4/Ts2 Dats)
+
Offset 08 - Version Minor (0 in sc4 dats, 1 in most TS2 packages)
+
Offset 12 - 3 DWORDS of unused data. RESERVED for maxis use.
+
Offset 24 - Date Created in Hex (Unused in Version 1.1)
+
Offset 28 - Date Modified in Hex (Unused in Version 1.1)
+
Offset 32 - Index Type/Version (Always 7 in SC4/TS2 dats)
+
Offset 36 - Number of entries in the index
+
Offset 40 - Location of first index entry
+
Offset 44 - Size of index
+
Offset 48 - Number of Hole entries in the Hole Record
+
Offset 52 - Location of the hole Record
+
Offset 56 - Size of the hole Record
+
Offset 60 - Number of Instances in Index (0x01 or 0x02; Version 1.1+ in TS2 only)
+
Offset 64 - 32 Bytes reserved for Future Use in other versions.
+
 
+
File: (File starts at offset 0 if uncompressed or 9 if compressed with this header)
+
 
+
Offset 00 - Size of file
+
Offset 04 - Compression signature (if file is compressed or not)
+
Offset 06 - Uncompressed filesize (Big Endian)
+
Offset 09 - Start of compressed/uncompressed file data
+
 
+
 
+
The size of an index entry with 1 Instance is 20 bytes, like this:
+
 
+
Offset 00 - Type ID (main type of file. picture, texture, model etc)
+
Offset 04 - Group ID (group of the file by purpose or type)
+
Offset 08 - Instance ID (marker used by the format for finding a specific file)
+
Offset 12 - Location of the file in the archive
+
Offset 16 - Size of the file
+
 
+
New 2 Instance Index Entry 24 bytes
+
 
+
Offset 00 - Type ID (main type of file. picture, texture, model etc)
+
Offset 04 - Group ID (group of the file by purpose or type)
+
Offset 08 - Instance ID (marker used by the format for finding a specific file)
+
Offset 12 - Instance2/Resource ID
+
Offset 16 - Location of the file in the archive
+
Offset 20 - Size of the file
+
 
+
 
+
One of the problems with this dat format is that filenames and filetypes are not preserved. Therefore the only way of knowing the type of file is by its IDS and then you don't have a true extension. It is for this reason that some of the filetypes to follow have names created by us instead of the true Maxis names. They simply aren't available.
+
 
+
A Hole record contains the location and size of all holes in a DAt file. Its format is as follows
+
 
+
(repeating)
+
DWORD - Hole Location
+
DWORD - Hole size
+
 
+
a Hole is simply garbage data that can be filled in with useful data at a later point 
+
 
+
 
+
===Compression===
+
  
 +
=Compression=
 
The idea behind the compression is to reuse previously decoded strings. For example, if the word "heureka" occurs twice in a file, the second occurence would be encoded by pointing to the first.  
 
The idea behind the compression is to reuse previously decoded strings. For example, if the word "heureka" occurs twice in a file, the second occurence would be encoded by pointing to the first.  
  
 
The compression is done by defining control characters that tells three things:  
 
The compression is done by defining control characters that tells three things:  
 
+
#How many characters of plain text that follows that should be appended to the output.  
How many characters of plain text that follows that should be appended to the output.  
+
#How many characters that should be read from the already decoded text (and appended to the output)  
2) How many characters that should be read from the already decoded text (and appended to the output)  
+
#At which offset in the already decoded text to read the characters.  
3) At which offset in the aldready decoded text to read the characters.  
+
  
 
Thus, the algorithm to decompress these files goes like this:  
 
Thus, the algorithm to decompress these files goes like this:  
  
Read file size at offset 0  
+
Read file size at offset 0  
Seek to offset 9  
+
Seek to offset 9  
while not end of file is reached do  
+
while not end of file is reached do  
{  
+
{  
- Read next control character.  
+
- Read next control character.  
- (Depending on control character read 0-3 more bytes that are a part of the control character.)  
+
- (Depending on control character read 0-3 more bytes that are a part of the control character.)  
- Figure out how many characters that should be read and from where by inspecting the control character.  
+
- Figure out how many characters that should be read and from where by inspecting the control character.  
- Read 0-n characters from source and append them to the output.  
+
- Read 0-n characters from source and append them to the output.  
- Copy 0-n characters from somewhere in the output to the end of the output.  
+
- Copy 0-n characters from somewhere in the output to the end of the output.  
}  
+
}  
  
There are 4 types of control characters which are used with different restrictions of how many characters that can be read and from how far behind these can be read. The following conventions are used to describe them:  
+
There are 4 types of control characters which are used with different restrictions of how many characters that can be read and from how far behind these can be read. The following conventions are used to describe them:
  
CC length - Length of control character.  
+
;CC length
Num plain text - Number of chars immediately after the control character that should be read and appended to output.  
+
:Length of control character.
Num to copy - Number of chars that should be copied from somewhere in the already decoded output and added to the end of the output.  
+
;Num plain text
Copy offset - Where to start reading characters when copying from somewhere in the already decoded output.  
+
:Number of chars immediately after the control character that should be read and appended to output.
This is given as an offset from the current end of the output buffer, i.e. an offset of 0 means that you should copy the last character in the output and append it to the output. And offset of 1 means that you should copy the second-to-last character.  
+
;Num to copy
byte0 - first byte of control character.  
+
:Number of chars that should be copied from somewhere in the already decoded output and added to the end of the output.
Bits - Bits of the control character. p = num plain text, c = num to copy, o = copy offset, i = identifier.  
+
;Copy offset
 +
:Where to start reading characters when copying from somewhere in the already decoded output.
 +
:This is given as an offset from the current end of the output buffer, i.e. an offset of 0 means that you should copy the last character in the output and append it to the output. And offset of 1 means that you should copy the second-to-last character.
 +
;byte0
 +
:first byte of control character.
 +
;Bits
 +
:Bits of the control character.
 +
:*p - num plain text
 +
:*c - num to copy
 +
:*o - copy offset
 +
:*i - identifier.
  
 
Note: It can sometimes be confusing when a control character states that you should copy for example 10 characters 5 steps from the end of the output. Clearly, you cannot read more than 5 characters before you reach the end of the buffer. The solution is to read and write one character at the time. Each time you read a character you copy it to the end thereby increasing the size of the output. By doing this, even offset 0 is possible and would result in duplicating the last character a number of times. This is utilized by the compression to recreate repeating text, for example bars of repeating dashes
 
Note: It can sometimes be confusing when a control character states that you should copy for example 10 characters 5 steps from the end of the output. Clearly, you cannot read more than 5 characters before you reach the end of the buffer. The solution is to read and write one character at the time. Each time you read a character you copy it to the end thereby increasing the size of the output. By doing this, even offset 0 is possible and would result in duplicating the last character a number of times. This is utilized by the compression to recreate repeating text, for example bars of repeating dashes
 
 
'''0xE0 - 0xFF'''
 
 
CC length: 1 byte
 
Num plain text: ((byte0 & 0x1F) < < 2 ) + 4
 
Num to copy: 0
 
Copy offset: -
 
 
Bits: 111ppppp
 
Num plain text limit: 4-128
 
Num to copy limit: 0
 
Maximum Offset: -
 
  
 
This is the simplest form of control character. The only thing it does is telling how many plain text characters that follows. The formula for this is: (C - 0x7F) * 4. Thus a value of 0xE0 means that you should read 4 characters of plain text and append to the output.
 
This is the simplest form of control character. The only thing it does is telling how many plain text characters that follows. The formula for this is: (C - 0x7F) * 4. Thus a value of 0xE0 means that you should read 4 characters of plain text and append to the output.
  
 +
==0x00 - 0x7F==
  
'''0x00 - 0x7F'''
+
CC length: 2 bytes
 
+
Num plain text: byte0 & 0x03
CC length: 2 bytes  
+
Num to copy: ( (byte0 & 0x1C) > > 2) + 3
Num plain text: byte0 & 0x03  
+
Copy offset: ( (byte0 & 0x60) < < 3) + byte1 + 1
Num to copy: ( (byte0 & 0x1C) > > 2) + 3  
+
Copy offset: ( (byte0 & 0x60) < < 3) + byte1 + 1
+
 
+
Bits: 0oocccpp oooooooo
+
Num plain text limit: 0-3
+
Num to copy limit: 3-11
+
Maximum Offset: 1023
+
  
 +
Bits: 0oocccpp oooooooo
 +
Num plain text limit: 0-3
 +
Num to copy limit: 3-11
 +
Maximum Offset: 1023
  
'''0x80 - 0xBF'''
 
  
CC length: 3 bytes
+
==0x80 - 0xBF==
Num plain text: ((byte1 & 0xC0) > > 6 ) & 0x03
+
Num to copy: (byte0 & 0x3F) + 4
+
Copy offset: ( (byte1 & 0x3F) < < 8 ) + byte2 + 1
+
  
Bits: 10cccccc ppoooooo oooooooo
+
CC length: 3 bytes
 +
Num plain text: ((byte1 & 0xC0) > > 6 ) & 0x03
 +
Num to copy: (byte0 & 0x3F) + 4
 +
Copy offset: ( (byte1 & 0x3F) < < 8 ) + byte2 + 1
  
Num plain text limit: 0-3  
+
Bits: 10cccccc ppoooooo oooooooo
Num to copy limit: 4-67  
+
Num plain text limit: 0-3
Maximum Offset: 16383  
+
Num to copy limit: 4-67
 +
Maximum Offset: 16383
  
  
'''0xC0 - 0xDF'''
+
==0xC0 - 0xDF==
 +
This format differes depending on the game.
  
CC length: 4 bytes  
+
===SimCity 4===
Num plain text: byte0 & 0x03  
+
CC length: 4 bytes
Num to copy: ( (byte0 & 0x1C) < < 6 )  + byte3 + 5  
+
Num plain text: byte0 & 0x03
Copy offset: (byte1 < < 8) + byte2  
+
Num to copy: ( (byte0 & 0x1C) < < 6 )  + byte3 + 5
 +
Copy offset: (byte1 < < 8) + byte2
  
Bits: 110cccpp oooooooo oooooooo cccccccc  
+
Bits: 110cccpp oooooooo oooooooo cccccccc
 +
Num plain text limit: 0-3
 +
Num to copy limit: 5-2047
 +
Maximum Offset: 65535
  
Num plain text limit: 0-3
+
===Sims 2===
Num to copy limit: 5-2047
+
CC length: 4 bytes
Maximum Offset: 65535
+
Num plain text: byte0 & 0x03
 +
Num to copy: ( (byte0 & 0x0C) < < 6 )  + byte3 + 5
 +
Copy offset: ((byte0 & 0x10) < < 12 ) + (byte1 < < 8 ) + byte2 + 1
  
Note: Sims2 uses a slightly different variation here:
+
Bits: 110occpp oooooooo oooooooo cccccccc
CC length: 4 bytes
+
Num plain text limit: 0-3
Num plain text: byte0 & 0x03
+
Num to copy limit: 5-1028
Num to copy: ( (byte0 & 0x0C) < < 6 )  + byte3 + 5  
+
Maximum Offset: 131072
Copy offset: ((byte0 & 0x10) < < 12 ) + (byte1 < < 8 ) + byte2 + 1
+
  
Bits: 110occpp oooooooo oooooooo cccccccc
 
  
Num plain text limit: 0-3
+
==0xE0 - 0xFF==
Num to copy limit: 5-1028
+
Maximum Offset: 131072
+
  
 +
CC length: 1 byte
 +
Num plain text: ((byte0 & 0x1F) < < 2 ) + 4
 +
Num to copy: 0
 +
Copy offset: -
  
 +
Bits: 111ppppp
 +
Num plain text limit: 4-128
 +
Num to copy limit: 0
 +
Maximum Offset: -
  
===Directory Files explained ===
+
=Directory Files=
 +
Directory files are one of the newer filetypes I found about the beginning of May. Their purpose is to spead up the loading of a DAT file by showing exactly what is compressed inside it. They are directories of all the compressed files in an archive. This luckily makes their structure fairly simple.
  
Directory files are one of the newer filetypes I found about the beginning of May. Their purpose is to spead up the loading of a DAT file by showing exactly what is compressed inside it. They are directories of all the compressed files in an archive. This luckily makes their structure fairly simple. 4 DWORDS repeated over and over
+
;repeated
 +
;DWORD
 +
:Type ID of the file
 +
;DWORD
 +
:Group ID of the file
 +
;DWORD
 +
:Instance ID of the file
 +
;DWORD
 +
:Instance2/Resource ID (only in new index format dbpfs)
 +
;DWORD
 +
:Size of the decompressed file in Hex
  
(Repeated Chunk)
+
These files are found as DIR/Directory by the reader and are automatically modified by it during changes to dat files.
  
DWORD - Type ID of the file  
+
It is HIGHLY reccommended that you modify the Directory after editing any compressed file before saving the dat if you are working manually in hex or making your own program.
DWORD - Group ID of the file
+
DWORD - Instance ID of the file
+
DWORD (only in new index format dbpfs) - Instance2/Resource ID
+
DWORD - Size of the decompressed file in Hex
+
  
These files are found as DIR/Directory by the reader and are automatically modified by it during changes to dat files.  
+
See [[E86B1EEF]].
  
It is HIGHLY reccommended that you modify the Directory after editing any compressed file before saving the dat if you are working manually in Hex or making your own program.
+
{{OldWikiEntryCleaned}}
  
If identifying Directory files manually by Hex, their Type ID in the index will be (EF 1E 6B E8)  [[Category:Modding]]
+
[[Category:Modding]]
 +
[[Category:InternalFormats]]

Revision as of 04:44, 12 July 2006

This article describes the specifics of the package format used in SimCity 4 and The Sims 2.

Contents

Format

(see DBPF)

Compression

The idea behind the compression is to reuse previously decoded strings. For example, if the word "heureka" occurs twice in a file, the second occurence would be encoded by pointing to the first.

The compression is done by defining control characters that tells three things:

  1. How many characters of plain text that follows that should be appended to the output.
  2. How many characters that should be read from the already decoded text (and appended to the output)
  3. At which offset in the already decoded text to read the characters.

Thus, the algorithm to decompress these files goes like this:

Read file size at offset 0 
Seek to offset 9 
while not end of file is reached do 
{ 
	- Read next control character. 
	- (Depending on control character read 0-3 more bytes that are a part of the control character.) 
	- Figure out how many characters that should be read and from where by inspecting the control character. 
	- Read 0-n characters from source and append them to the output. 
	- Copy 0-n characters from somewhere in the output to the end of the output. 
} 

There are 4 types of control characters which are used with different restrictions of how many characters that can be read and from how far behind these can be read. The following conventions are used to describe them:

CC length
Length of control character.
Num plain text
Number of chars immediately after the control character that should be read and appended to output.
Num to copy
Number of chars that should be copied from somewhere in the already decoded output and added to the end of the output.
Copy offset
Where to start reading characters when copying from somewhere in the already decoded output.
This is given as an offset from the current end of the output buffer, i.e. an offset of 0 means that you should copy the last character in the output and append it to the output. And offset of 1 means that you should copy the second-to-last character.
byte0
first byte of control character.
Bits
Bits of the control character.
  • p - num plain text
  • c - num to copy
  • o - copy offset
  • i - identifier.

Note: It can sometimes be confusing when a control character states that you should copy for example 10 characters 5 steps from the end of the output. Clearly, you cannot read more than 5 characters before you reach the end of the buffer. The solution is to read and write one character at the time. Each time you read a character you copy it to the end thereby increasing the size of the output. By doing this, even offset 0 is possible and would result in duplicating the last character a number of times. This is utilized by the compression to recreate repeating text, for example bars of repeating dashes

This is the simplest form of control character. The only thing it does is telling how many plain text characters that follows. The formula for this is: (C - 0x7F) * 4. Thus a value of 0xE0 means that you should read 4 characters of plain text and append to the output.

0x00 - 0x7F

CC length: 2 bytes
Num plain text: byte0 & 0x03
Num to copy: ( (byte0 & 0x1C) > > 2) + 3
Copy offset: ( (byte0 & 0x60) < < 3) + byte1 + 1
Bits: 0oocccpp oooooooo
Num plain text limit: 0-3
Num to copy limit: 3-11
Maximum Offset: 1023


0x80 - 0xBF

CC length: 3 bytes
Num plain text: ((byte1 & 0xC0) > > 6 ) & 0x03
Num to copy: (byte0 & 0x3F) + 4
Copy offset: ( (byte1 & 0x3F) < < 8 ) + byte2 + 1
Bits: 10cccccc ppoooooo oooooooo
Num plain text limit: 0-3
Num to copy limit: 4-67
Maximum Offset: 16383


0xC0 - 0xDF

This format differes depending on the game.

SimCity 4

CC length: 4 bytes
Num plain text: byte0 & 0x03
Num to copy: ( (byte0 & 0x1C) < < 6 )  + byte3 + 5
Copy offset: (byte1 < < 8) + byte2
Bits: 110cccpp oooooooo oooooooo cccccccc
Num plain text limit: 0-3
Num to copy limit: 5-2047
Maximum Offset: 65535

Sims 2

CC length: 4 bytes
Num plain text: byte0 & 0x03
Num to copy: ( (byte0 & 0x0C) < < 6 )  + byte3 + 5
Copy offset: ((byte0 & 0x10) < < 12 ) + (byte1 < < 8 ) + byte2 + 1
Bits: 110occpp oooooooo oooooooo cccccccc
Num plain text limit: 0-3
Num to copy limit: 5-1028
Maximum Offset: 131072


0xE0 - 0xFF

CC length: 1 byte 
Num plain text: ((byte0 & 0x1F) < < 2 ) + 4
Num to copy: 0 
Copy offset: - 
Bits: 111ppppp 
Num plain text limit: 4-128 
Num to copy limit: 0 
Maximum Offset: - 

Directory Files

Directory files are one of the newer filetypes I found about the beginning of May. Their purpose is to spead up the loading of a DAT file by showing exactly what is compressed inside it. They are directories of all the compressed files in an archive. This luckily makes their structure fairly simple.

repeated
DWORD
Type ID of the file
DWORD
Group ID of the file
DWORD
Instance ID of the file
DWORD
Instance2/Resource ID (only in new index format dbpfs)
DWORD
Size of the decompressed file in Hex

These files are found as DIR/Directory by the reader and are automatically modified by it during changes to dat files.

It is HIGHLY reccommended that you modify the Directory after editing any compressed file before saving the dat if you are working manually in hex or making your own program.

See E86B1EEF.

This article is imported from the old MTS2 wiki. It's original page, with comments, can be found at http://old_wiki.modthesims2.com/Packages

Personal tools
Namespaces

Variants
Actions
Navigation
game select
Toolbox