Skip to content

Game Card PNG Specification

A ero.dance Game Card is a standard PNG image with embedded binary data stored in private ancillary chunks. All private chunks are inserted immediately before the PNG IEND chunk. The canonical file extension is .ero.png but the .ero subdomain is not required for the format itself.


1. PNG Chunk Binary Layout

Every private chunk follows the standard PNG chunk format:

┌──────────────┬──────────────┬────────────────┬───────────────┐
│ Length (4 B) │ Type  (4 B)  │ Data   (N B)   │ CRC32 (4 B)   │
│  big-endian  │  ASCII       │                │ over Type+    │
│  uint32      │              │                │ Data          │
└──────────────┴──────────────┴────────────────┴───────────────┘
FieldSizeDescription
Length4 bytesNumber of bytes in the Data field (big-endian uint32)
Type4 bytesASCII chunk name (Latin-1, 4 characters)
DataN bytesChunk payload (length may be 0)
CRC324 bytesCRC-32 computed over Type + Data (big-endian uint32)

All multi-byte integers are big-endian (network byte order).


2. Chunk Table

Chunk NameEncodingDescriptionRequired
bpMxgzip-compressed binary (Uint8Array)3D model data (BPMX format)No
bvMdgzip-compressed binary (Uint8Array)Animation data (BVMD format)No
weBmraw binary (Uint8Array, no compression)Audio data (WebM container)No
auRlUTF-8 string (typically a YouTube URL)Audio URL referenceNo
erOvJSON string literal (UTF-8)Card format version (semver)Yes
uiNfJSON object (UTF-8)Creator informationNo
fbTnJSON array (UTF-8)Favourited morph buttonsNo
moAiJSON object (UTF-8)AI model metadata & voice sample indexNo
vcAuvar-length array of raw binary Uint8Array entriesVoice clone audio samples (WebM format)No

Chunk naming convention: The first byte's bit 5 controls the ancillary-bit (lowercase = ancillary, uppercase = critical). All Game Card chunks are ancillary (lowercase first byte). See §8 – Chunk Naming for details.

Pre-0.0.5 chunk names (historical, no longer used)

Before card version 0.0.5, chunk names used a different letter-case convention. These old names are no longer written and can be safely ignored by modern implementations. For reference only:

Current name (v0.0.5+)Old name (pre-0.0.5)
bpMxbPMX
bvMdbVMD
weBmwebM
auRlaURL
erOveroV
uiNfuInf
fbTnfBtn
moAimoAi
vcAuvcAu

3. Detailed Chunk Specifications

3.1 bpMx — 3D Model Data

FieldValue
Encodinggzip-compressed binary
ContentA 3D model in the BPMX format

Readers must decompress with gzip. The BPMX binary follows after decompression. Gzip magic bytes in the raw chunk data: 1F 8B.

Example hex (first bytes of compressed chunk data):

1F 8B 08 00 ...    → gzip header

3.2 bvMd — Animation Data

FieldValue
Encodinggzip-compressed binary
ContentAn animation in the BVMD format

Same compression scheme as bpMx. Decompress with gzip first.

3.3 weBm — Audio Data

FieldValue
Encodingraw binary (no compression)
ContentAudio in WebM container format

The chunk data is a complete WebM file. No decompression needed. The WebM container magic bytes 1A 45 DF A3 should appear at the start of the data.

3.4 auRl — Audio URL Reference

FieldValue
EncodingUTF-8 encoded string
ContentURL pointing to an audio source (e.g. YouTube)

When written by the reference implementation, this is always a plain UTF-8 string (typically a YouTube URL). The data does not include a null terminator.

Example:

68 74 74 70 73 3A 2F 2F 77 77 77 2E 79 6F 75 74 75 62 65 2E 63 6F 6D 2F ...
→ "https://www.youtube.com/..."

TIP

The serialization code also supports a var-length array format (see §4), but this path is not used in practice. Implementers may choose to detect and handle it for robustness.

3.5 erOv — Card Version

FieldValue
EncodingJSON string literal (UTF-8)
ContentSemantic version of the card format (semver)

The data is the result of JSON.stringify(versionString). Because the input is a plain string like "0.0.5", the resulting bytes contain the JSON string literal including the surrounding double-quote characters (0x22).

To extract the version, apply JSON.parse() to the UTF-8 decoded bytes:

python
# Python example
raw_bytes = chunk_data  # from the PNG chunk
version = json.loads(raw_bytes.decode('utf-8'))  # → "0.0.5"

Hex example:

22 30 2E 30 2E 35 22   → JSON literal "0.0.5"
│                     │
 └─ opening quote      └─ closing quote (0x22)
OffsetByteCharacter
00x22"
10x300
20x2E.
30x300
40x2E.
50x355
60x22"

Important

Always use JSON.parse() (or equivalent) to decode this chunk. The surrounding quotes are part of the stored data.

3.6 uiNf — Creator Information

FieldValue
EncodingJSON object (UTF-8)
ContentInformation about the creator

TypeScript interface:

ts
interface UiNf {
  auth: string; // creator name / author
  ch: string[]; // social channel URLs (may be empty)
  info: string; // free-text description / model info
}

Example chunk data (after UTF-8 decode):

json
{
  "auth": "ArtistName",
  "ch": ["https://twitter.com/artist", "https://patreon.com/artist"],
  "info": "Custom model with 12 morphs"
}

3.7 fbTn — Favourite Morph Buttons

FieldValue
EncodingJSON array (UTF-8)
ContentList of favourited morph slider buttons

TypeScript interface (array of objects):

ts
interface FbTnEntry {
  name: string; // morph display name
  action: string; // always "morph"
  morph: number; // zero-based morph index in the model
}

type FbTn = FbTnEntry[];

Example:

json
[
  { "name": "Smile", "action": "morph", "morph": 3 },
  { "name": "Blink", "action": "morph", "morph": 7 }
]

3.8 moAi — AI Model Metadata

FieldValue
EncodingJSON object (UTF-8)
ContentAI/chat metadata for the model, including morph descriptions and voice sample index

TypeScript interface:

ts
interface MoAi {
  name: string; // character name
  gender: string; // character gender
  info: string; // free-text character description
  morphs?: MoAiMorph[]; // optional, list of morph descriptions for AI
  voiceSamples?: MoAiVoiceSample[]; // optional, metadata index for vcAu audio samples
}

interface MoAiMorph {
  index: number; // zero-based morph index
  name: string; // morph name
  desc: string; // natural-language description of what this morph does
}

interface MoAiVoiceSample {
  index: number; // zero-based sample index (matches vcAu entry order)
  fileName: string; // original file name (e.g. "sample_01.webm")
  sampleName: string; // human-readable sample label
  locale?: string; // BCP-47 language tag (e.g. "en-US")
}

Example:

json
{
  "name": "Hatsune Miku",
  "gender": "female",
  "info": "Virtual singer character",
  "morphs": [
    { "index": 3, "name": "Smile", "desc": "A gentle smile expression" },
    { "index": 7, "name": "Wink", "desc": "Right eye wink" }
  ],
  "voiceSamples": [
    {
      "index": 0,
      "fileName": "voice_01.webm",
      "sampleName": "Greeting",
      "locale": "ja-JP"
    },
    {
      "index": 1,
      "fileName": "voice_02.webm",
      "sampleName": "Question",
      "locale": "ja-JP"
    }
  ]
}

Coupling with vcAu

When voiceSamples is present in moAi, the corresponding audio binaries are stored in the vcAu chunk. The index field in each voiceSamples entry maps directly to the array position in the var-length array inside vcAu. See §3.9.

3.9 vcAu — Voice Clone Audio Samples

FieldValue
Encodingvar-length array of raw binary Uint8Array entries
ContentOne or more WebM audio samples for voice cloning

Each entry in the array is a complete WebM audio file (no compression applied to the audio data). The array is encoded using the var-length format described in §4.

Relationship to moAi: The metadata for each sample (file name, label, locale) is stored in the moAi chunk's voiceSamples array, not in vcAu. The index field in each voiceSamples entry corresponds to the position within the vcAu var-length array.

moAi.voiceSamples[0].index == 0  →  vcAu var-length array entry 0
moAi.voiceSamples[1].index == 1  →  vcAu var-length array entry 1
...

4. Var-Length Array Encoding

Several chunks (vcAu, and optionally auRl) encode an array of binary items using a variable-length quantity (VLQ) prefix for each element.

Format

The data consists of N consecutive entries, each encoded as:

┌─────────────────────┬───────────────────────┐
│ Length prefix (VLQ) │ Content (length bytes)│
└─────────────────────┴───────────────────────┘

VLQ Length Prefix

The length of each entry is encoded as a variable-length quantity (7-bit VLQ, little-endian):

  • If the value fits in 7 bits (0–127): encoded as a single byte.
  • If the value requires more than 7 bits: the high bit (bit 7) of each byte is set to 1 to indicate continuation, and the remaining 7 bits carry the value, least-significant group first.
Byte patternMeaning
0xxxxxxxFinal byte; value ≤ 127
1xxxxxxx ...More bytes follow; see below

Decoding algorithm:

length = 0
shift  = 0
do:
    byte = read next byte
    length |= (byte & 0x7F) << shift
    shift  += 7
while (byte & 0x80) != 0

Encoding algorithm:

while length > 127:
    emit byte: (length & 0x7F) | 0x80
    length >>= 7
emit byte: (length & 0x7F)

Examples

Single entry (256 bytes):

80 02 <256 bytes of content>
│  │
│  └─ value bits: 0000010 → 2
└─ continuation bit set, value bits: 0000000 → 0
Decoded length: (0 << 0) | (2 << 7) = 256

Two entries (5 bytes and 12 bytes):

05 <5 bytes> 0C <12 bytes>

5. Chunk Insertion Position

All private chunks are inserted immediately before the IEND chunk of the PNG file. When writing, any previously existing private chunks of the same type are removed first (to allow re-saving).

The PNG file structure looks like:

PNG Signature (8 bytes)
IHDR
...
IDAT (one or more)
bpMx  ─┐
bvMd   │
weBm   │  ← all private chunks here
auRl   │
erOv   │
uiNf   │
fbTn   │
moAi   │
vcAu  ─┘
IEND

6. Reading a Card

Quick identification

To check whether a PNG file is a Game Card, it is sufficient to scan for the presence of the erOv chunk. If it exists, the file is an ero.dance Game Card.

To fully extract all embedded data, follow these steps:

  1. Parse the PNG and iterate over all chunks.
  2. For each chunk, check the 4-byte type against the known names (see §2).
  3. For each recognized chunk, extract the data field (skip the 4-byte length prefix, 4-byte type, and 4-byte CRC suffix).
  4. Decode the data according to the chunk's encoding:
EncodingDecoding steps
gzip-compressedDecompress with gzip → raw binary
raw binaryUse as-is
UTF-8 stringDecode bytes as UTF-8
JSON string literalDecode UTF-8, then JSON.parse() to get the value
JSON object / arrayDecode UTF-8, then JSON.parse() to get the object
var-length arrayDecode using the VLQ algorithm (see §4)
  1. To reconstruct voice clone files, cross-reference moAi.voiceSamples[] metadata with the vcAu binary entries by matching the index field to the array position.

7. Writing a Card — Quick Start

  1. Create or load the base PNG image.
  2. Strip any existing private chunks of the types you intend to write.
  3. Encode each data payload according to its chunk encoding.
  4. Build each private chunk: [4-byte length BE][4-byte type ASCII][data][4-byte CRC32 BE].
    • CRC32 is computed over type + data (the 4 type bytes followed by the data bytes).
  5. Insert all private chunks immediately before the IEND chunk.

8. Chunk Naming Conventions

PNG chunk names are 4-byte ASCII strings. The case of each letter carries metadata per the PNG specification:

Bit positionLowercase meansUppercase means
1st byteAncillaryCritical
2nd bytePrivatePublic
3rd byte(reserved)
4th byteSafe to copyUnsafe to copy

Game Card chunk names follow the pattern: lowercase, lowercase, mixed, lowercase (e.g. bpMx, weBm). This marks them as ancillary (non-critical; viewers that don't understand them can safely display the image) and private.


9. Visual Card Markers

Game Cards embed colored indicator strips at the bottom edge of the image to show which data types are present:

ColorMeaning
Blue3D model data (bpMx)
GreenAnimation data (bvMd)
RedAudio data (weBm/auRl)

The strip height is 2 pixels. When multiple types are present, the bottom edge is divided into equal-width segments, one per type, with 2px spacing between them.


10. Reference CRC-32

The CRC-32 algorithm used is the standard PNG CRC-32 (ISO 3309 / ITU-T V.42), also known as CRC-32/ISO-HDLC. Polynomial: 0xEDB88320 (bit-reversed representation of 0x04C11DB7). The CRC is computed over the chunk type bytes concatenated with the chunk data bytes, and stored as a 4-byte big-endian unsigned integer.