Game Card PNG Specification
A ero.dance Game Card is a standard PNG image with embedded binary data stored in private ancillary chunks. All private chunks are inserted immediately before the PNG IEND chunk. The canonical file extension is .ero.png but the .ero subdomain is not required for the format itself.
1. PNG Chunk Binary Layout
Every private chunk follows the standard PNG chunk format:
┌──────────────┬──────────────┬────────────────┬───────────────┐
│ Length (4 B) │ Type (4 B) │ Data (N B) │ CRC32 (4 B) │
│ big-endian │ ASCII │ │ over Type+ │
│ uint32 │ │ │ Data │
└──────────────┴──────────────┴────────────────┴───────────────┘| Field | Size | Description |
|---|---|---|
| Length | 4 bytes | Number of bytes in the Data field (big-endian uint32) |
| Type | 4 bytes | ASCII chunk name (Latin-1, 4 characters) |
| Data | N bytes | Chunk payload (length may be 0) |
| CRC32 | 4 bytes | CRC-32 computed over Type + Data (big-endian uint32) |
All multi-byte integers are big-endian (network byte order).
2. Chunk Table
| Chunk Name | Encoding | Description | Required |
|---|---|---|---|
bpMx | gzip-compressed binary (Uint8Array) | 3D model data (BPMX format) | No |
bvMd | gzip-compressed binary (Uint8Array) | Animation data (BVMD format) | No |
weBm | raw binary (Uint8Array, no compression) | Audio data (WebM container) | No |
auRl | UTF-8 string (typically a YouTube URL) | Audio URL reference | No |
erOv | JSON string literal (UTF-8) | Card format version (semver) | Yes |
uiNf | JSON object (UTF-8) | Creator information | No |
fbTn | JSON array (UTF-8) | Favourited morph buttons | No |
moAi | JSON object (UTF-8) | AI model metadata & voice sample index | No |
vcAu | var-length array of raw binary Uint8Array entries | Voice clone audio samples (WebM format) | No |
Chunk naming convention: The first byte's bit 5 controls the ancillary-bit (lowercase = ancillary, uppercase = critical). All Game Card chunks are ancillary (lowercase first byte). See §8 – Chunk Naming for details.
Pre-0.0.5 chunk names (historical, no longer used)
Before card version 0.0.5, chunk names used a different letter-case convention. These old names are no longer written and can be safely ignored by modern implementations. For reference only:
| Current name (v0.0.5+) | Old name (pre-0.0.5) |
|---|---|
bpMx | bPMX |
bvMd | bVMD |
weBm | webM |
auRl | aURL |
erOv | eroV |
uiNf | uInf |
fbTn | fBtn |
moAi | moAi |
vcAu | vcAu |
3. Detailed Chunk Specifications
3.1 bpMx — 3D Model Data
| Field | Value |
|---|---|
| Encoding | gzip-compressed binary |
| Content | A 3D model in the BPMX format |
Readers must decompress with gzip. The BPMX binary follows after decompression. Gzip magic bytes in the raw chunk data: 1F 8B.
Example hex (first bytes of compressed chunk data):
1F 8B 08 00 ... → gzip header3.2 bvMd — Animation Data
| Field | Value |
|---|---|
| Encoding | gzip-compressed binary |
| Content | An animation in the BVMD format |
Same compression scheme as bpMx. Decompress with gzip first.
3.3 weBm — Audio Data
| Field | Value |
|---|---|
| Encoding | raw binary (no compression) |
| Content | Audio in WebM container format |
The chunk data is a complete WebM file. No decompression needed. The WebM container magic bytes 1A 45 DF A3 should appear at the start of the data.
3.4 auRl — Audio URL Reference
| Field | Value |
|---|---|
| Encoding | UTF-8 encoded string |
| Content | URL pointing to an audio source (e.g. YouTube) |
When written by the reference implementation, this is always a plain UTF-8 string (typically a YouTube URL). The data does not include a null terminator.
Example:
68 74 74 70 73 3A 2F 2F 77 77 77 2E 79 6F 75 74 75 62 65 2E 63 6F 6D 2F ...
→ "https://www.youtube.com/..."TIP
The serialization code also supports a var-length array format (see §4), but this path is not used in practice. Implementers may choose to detect and handle it for robustness.
3.5 erOv — Card Version
| Field | Value |
|---|---|
| Encoding | JSON string literal (UTF-8) |
| Content | Semantic version of the card format (semver) |
The data is the result of JSON.stringify(versionString). Because the input is a plain string like "0.0.5", the resulting bytes contain the JSON string literal including the surrounding double-quote characters (0x22).
To extract the version, apply JSON.parse() to the UTF-8 decoded bytes:
# Python example
raw_bytes = chunk_data # from the PNG chunk
version = json.loads(raw_bytes.decode('utf-8')) # → "0.0.5"Hex example:
22 30 2E 30 2E 35 22 → JSON literal "0.0.5"
│ │
└─ opening quote └─ closing quote (0x22)| Offset | Byte | Character |
|---|---|---|
| 0 | 0x22 | " |
| 1 | 0x30 | 0 |
| 2 | 0x2E | . |
| 3 | 0x30 | 0 |
| 4 | 0x2E | . |
| 5 | 0x35 | 5 |
| 6 | 0x22 | " |
Important
Always use JSON.parse() (or equivalent) to decode this chunk. The surrounding quotes are part of the stored data.
3.6 uiNf — Creator Information
| Field | Value |
|---|---|
| Encoding | JSON object (UTF-8) |
| Content | Information about the creator |
TypeScript interface:
interface UiNf {
auth: string; // creator name / author
ch: string[]; // social channel URLs (may be empty)
info: string; // free-text description / model info
}Example chunk data (after UTF-8 decode):
{
"auth": "ArtistName",
"ch": ["https://twitter.com/artist", "https://patreon.com/artist"],
"info": "Custom model with 12 morphs"
}3.7 fbTn — Favourite Morph Buttons
| Field | Value |
|---|---|
| Encoding | JSON array (UTF-8) |
| Content | List of favourited morph slider buttons |
TypeScript interface (array of objects):
interface FbTnEntry {
name: string; // morph display name
action: string; // always "morph"
morph: number; // zero-based morph index in the model
}
type FbTn = FbTnEntry[];Example:
[
{ "name": "Smile", "action": "morph", "morph": 3 },
{ "name": "Blink", "action": "morph", "morph": 7 }
]3.8 moAi — AI Model Metadata
| Field | Value |
|---|---|
| Encoding | JSON object (UTF-8) |
| Content | AI/chat metadata for the model, including morph descriptions and voice sample index |
TypeScript interface:
interface MoAi {
name: string; // character name
gender: string; // character gender
info: string; // free-text character description
morphs?: MoAiMorph[]; // optional, list of morph descriptions for AI
voiceSamples?: MoAiVoiceSample[]; // optional, metadata index for vcAu audio samples
}
interface MoAiMorph {
index: number; // zero-based morph index
name: string; // morph name
desc: string; // natural-language description of what this morph does
}
interface MoAiVoiceSample {
index: number; // zero-based sample index (matches vcAu entry order)
fileName: string; // original file name (e.g. "sample_01.webm")
sampleName: string; // human-readable sample label
locale?: string; // BCP-47 language tag (e.g. "en-US")
}Example:
{
"name": "Hatsune Miku",
"gender": "female",
"info": "Virtual singer character",
"morphs": [
{ "index": 3, "name": "Smile", "desc": "A gentle smile expression" },
{ "index": 7, "name": "Wink", "desc": "Right eye wink" }
],
"voiceSamples": [
{
"index": 0,
"fileName": "voice_01.webm",
"sampleName": "Greeting",
"locale": "ja-JP"
},
{
"index": 1,
"fileName": "voice_02.webm",
"sampleName": "Question",
"locale": "ja-JP"
}
]
}Coupling with vcAu
When voiceSamples is present in moAi, the corresponding audio binaries are stored in the vcAu chunk. The index field in each voiceSamples entry maps directly to the array position in the var-length array inside vcAu. See §3.9.
3.9 vcAu — Voice Clone Audio Samples
| Field | Value |
|---|---|
| Encoding | var-length array of raw binary Uint8Array entries |
| Content | One or more WebM audio samples for voice cloning |
Each entry in the array is a complete WebM audio file (no compression applied to the audio data). The array is encoded using the var-length format described in §4.
Relationship to moAi: The metadata for each sample (file name, label, locale) is stored in the moAi chunk's voiceSamples array, not in vcAu. The index field in each voiceSamples entry corresponds to the position within the vcAu var-length array.
moAi.voiceSamples[0].index == 0 → vcAu var-length array entry 0
moAi.voiceSamples[1].index == 1 → vcAu var-length array entry 1
...4. Var-Length Array Encoding
Several chunks (vcAu, and optionally auRl) encode an array of binary items using a variable-length quantity (VLQ) prefix for each element.
Format
The data consists of N consecutive entries, each encoded as:
┌─────────────────────┬───────────────────────┐
│ Length prefix (VLQ) │ Content (length bytes)│
└─────────────────────┴───────────────────────┘VLQ Length Prefix
The length of each entry is encoded as a variable-length quantity (7-bit VLQ, little-endian):
- If the value fits in 7 bits (
0–127): encoded as a single byte. - If the value requires more than 7 bits: the high bit (bit 7) of each byte is set to
1to indicate continuation, and the remaining 7 bits carry the value, least-significant group first.
| Byte pattern | Meaning |
|---|---|
0xxxxxxx | Final byte; value ≤ 127 |
1xxxxxxx ... | More bytes follow; see below |
Decoding algorithm:
length = 0
shift = 0
do:
byte = read next byte
length |= (byte & 0x7F) << shift
shift += 7
while (byte & 0x80) != 0Encoding algorithm:
while length > 127:
emit byte: (length & 0x7F) | 0x80
length >>= 7
emit byte: (length & 0x7F)Examples
Single entry (256 bytes):
80 02 <256 bytes of content>
│ │
│ └─ value bits: 0000010 → 2
└─ continuation bit set, value bits: 0000000 → 0
Decoded length: (0 << 0) | (2 << 7) = 256Two entries (5 bytes and 12 bytes):
05 <5 bytes> 0C <12 bytes>5. Chunk Insertion Position
All private chunks are inserted immediately before the IEND chunk of the PNG file. When writing, any previously existing private chunks of the same type are removed first (to allow re-saving).
The PNG file structure looks like:
PNG Signature (8 bytes)
IHDR
...
IDAT (one or more)
bpMx ─┐
bvMd │
weBm │ ← all private chunks here
auRl │
erOv │
uiNf │
fbTn │
moAi │
vcAu ─┘
IEND6. Reading a Card
Quick identification
To check whether a PNG file is a Game Card, it is sufficient to scan for the presence of the erOv chunk. If it exists, the file is an ero.dance Game Card.
To fully extract all embedded data, follow these steps:
- Parse the PNG and iterate over all chunks.
- For each chunk, check the 4-byte type against the known names (see §2).
- For each recognized chunk, extract the data field (skip the 4-byte length prefix, 4-byte type, and 4-byte CRC suffix).
- Decode the data according to the chunk's encoding:
| Encoding | Decoding steps |
|---|---|
| gzip-compressed | Decompress with gzip → raw binary |
| raw binary | Use as-is |
| UTF-8 string | Decode bytes as UTF-8 |
| JSON string literal | Decode UTF-8, then JSON.parse() to get the value |
| JSON object / array | Decode UTF-8, then JSON.parse() to get the object |
| var-length array | Decode using the VLQ algorithm (see §4) |
- To reconstruct voice clone files, cross-reference
moAi.voiceSamples[]metadata with thevcAubinary entries by matching theindexfield to the array position.
7. Writing a Card — Quick Start
- Create or load the base PNG image.
- Strip any existing private chunks of the types you intend to write.
- Encode each data payload according to its chunk encoding.
- Build each private chunk:
[4-byte length BE][4-byte type ASCII][data][4-byte CRC32 BE].- CRC32 is computed over type + data (the 4 type bytes followed by the data bytes).
- Insert all private chunks immediately before the
IENDchunk.
8. Chunk Naming Conventions
PNG chunk names are 4-byte ASCII strings. The case of each letter carries metadata per the PNG specification:
| Bit position | Lowercase means | Uppercase means |
|---|---|---|
| 1st byte | Ancillary | Critical |
| 2nd byte | Private | Public |
| 3rd byte | (reserved) | |
| 4th byte | Safe to copy | Unsafe to copy |
Game Card chunk names follow the pattern: lowercase, lowercase, mixed, lowercase (e.g. bpMx, weBm). This marks them as ancillary (non-critical; viewers that don't understand them can safely display the image) and private.
9. Visual Card Markers
Game Cards embed colored indicator strips at the bottom edge of the image to show which data types are present:
| Color | Meaning |
|---|---|
| Blue | 3D model data (bpMx) |
| Green | Animation data (bvMd) |
| Red | Audio data (weBm/auRl) |
The strip height is 2 pixels. When multiple types are present, the bottom edge is divided into equal-width segments, one per type, with 2px spacing between them.
10. Reference CRC-32
The CRC-32 algorithm used is the standard PNG CRC-32 (ISO 3309 / ITU-T V.42), also known as CRC-32/ISO-HDLC. Polynomial: 0xEDB88320 (bit-reversed representation of 0x04C11DB7). The CRC is computed over the chunk type bytes concatenated with the chunk data bytes, and stored as a 4-byte big-endian unsigned integer.