Optical Disc and Disk Image Formats

ISO9660 UDF

Comprehensive reference for disk image formats covering ISO 9660, Joliet, Rock Ridge, UDF, El Torito, WIM, and hybrid disc structures. All struct layouts and edge cases documented here are verified against the Sector Sorcery parser implementation.

ISO 9660: Mastering the Standard

ISO 9660 (also known as ECMA-119) appears simple on the surface. However, real-world implementations reveal a rich ecosystem of vendor-specific extensions and creative solutions.

Note: ISO 9660 and ECMA-119 are technically the same standard, with ECMA-119 being freely available here.

Basic Structure

Every ISO starts with 16 sectors (32KB) of zeroes for system use. The real action begins at sector 16 with the Volume Descriptors:

Primary Volume Descriptor
-- At offset 0x8000 (sector 16 * 2048 bytes)

struct PrimaryVolumeDescriptor {
  type              : u8          -- 0x01 = Primary VD
  identifier        : char[5]     -- "CD001" (magic)
  version           : u8          -- always 1
  _unused           : u8
  system_id         : char[32]    -- system identifier (often garbage)
  volume_id         : char[32]    -- volume label
  _unused2          : u8[8]
  volume_space_size : both32      -- total sectors (LE + BE)
  _unused3          : u8[32]
  volume_set_size   : both16      -- usually 1
  volume_seq_number : both16      -- usually 1
  logical_block_size: both16      -- always 2048
  path_table_size   : both32
  path_table_le     : u32le       -- LBA of LE path table
  opt_path_table_le : u32le
  path_table_be     : u32be       -- LBA of BE path table
  opt_path_table_be : u32be
  root_directory    : DirectoryRecord -- 34 bytes, inline
  -- ... 2041 more bytes of metadata
}

-- "both32" = same value stored LE then BE (8 bytes total)
struct both32 {
  le : u32le
  be : u32be
}

Quirks and Edge Cases

> Insight #1: Both-Endian Numbers

ISO 9660 stores all multi-byte integers in BOTH little-endian and big-endian format. A 32-bit integer takes 8 bytes: 4 for LE, 4 for BE. This clever approach ensures compatibility across all architectures.

This is actually specified in ECMA-119 section 7.2.3 and 7.3.3 as "both-byte order" format.

0x00000050: 2A 00 00 00 00 00 00 2A 43 44 30 30 31 01 00 00 *....*CD001...
Same value (42) stored as both LE and BE
> Insight #2: Directory Record Padding

Directory records MUST start on even byte boundaries. If a record ends on an odd byte, add a padding byte. Implementations vary: some use 0x00, others repeat the last byte of the filename.

ECMA-119 section 9.1.12 specifies padding but not the value.

Directory Record
struct DirectoryRecord {
  length            : u8          -- total record length
  ext_attr_length   : u8          -- extended attribute length
  extent_lba        : both32      -- starting sector
  data_length       : both32      -- file size in bytes
  recording_date    : u8[7]       -- years-since-1900, month, day, h, m, s, tz
  flags             : u8          -- bit 1 = directory, bit 0 = hidden
  file_unit_size    : u8          -- interleave (usually 0)
  interleave_gap    : u8
  volume_seq        : both16
  name_length       : u8
  name              : char[name_length]
  _pad              : u8          -- if name_length is even (align to word)
  -- system use area follows (Rock Ridge lives here)
}

-- Next record starts at: offset + length
-- If length is odd, skip one padding byte (value undefined by spec)
! Case Study: PlayStation 1 Games

Many PS1 games implement copy protection by intentionally crafting malformed ISO 9660 structures. They reference non-existent sectors, create circular directory structures, or include file entries pointing to the disc's lead-out area.

Examples include LibCrypt, APv1/APv2 protection schemes. See PSXDev forums for documented cases.

Joliet: Unicode Done Right

Joliet adds Unicode support to ISO 9660 using UCS-2 encoding (not UTF-16!) with a byte order mark that many implementations ignore.

Joliet Supplementary Volume Descriptor
-- Joliet uses a Supplementary VD (type 0x02) with special escape sequences

struct JolietVolumeDescriptor : PrimaryVolumeDescriptor {
  type              : u8          -- 0x02 = Supplementary VD
  identifier        : char[5]     -- "CD001"
  version           : u8          -- 1
  _flags            : u8
  -- ...
  escape_sequences  : u8[32]      -- at offset 88
  -- Joliet level detection:
  --   bytes [0..2] = 0x25 0x2F 0x40  ->  Level 1 (%/@)
  --   bytes [0..2] = 0x25 0x2F 0x43  ->  Level 2 (%/C)
  --   bytes [0..2] = 0x25 0x2F 0x45  ->  Level 3 (%/E)
}

-- All strings in Joliet records use UCS-2 Big Endian encoding
-- Filename limits: Level 1-2 = 64 chars (128 bytes), Level 3 = 128 chars
-- Key insight: some tools count bytes, others count characters
> Filename Limits
  • Level 1: 64 characters
  • Level 2: 64 characters
  • Level 3: 128 characters

Key insight: That's 64 UNICODE characters, which means 128 bytes. Some implementations count bytes, others count characters.

See also: OSDev Joliet page.

Rock Ridge: POSIX Power

Rock Ridge adds POSIX attributes to ISO 9660 through System Use fields in directory entries. Implementation variations across systems create an interesting challenge for parser developers.

Field Signature Purpose Notes
SUSP Indicator SP Indicates SUSP in use First directory entry only
Rock Ridge ID RR Rock Ridge in use Version field varies
POSIX Name NM Real filename Can span multiple NM entries
Symlink SL Symbolic link target Component area parsing

UDF: Universal Disk Format

UDF aimed to replace ISO 9660 with a more flexible format. The result is a sophisticated system where different implementations support different feature subsets, creating compatibility challenges.

Version Compatibility

! UDF Version Matrix
  • 1.02: DVD-ROM (most compatible)
  • 1.50: DVD-R/RW (adds VAT)
  • 2.00: Added Named Streams
  • 2.01: Fixed 2.00 bugs
  • 2.50: Blu-ray (metadata partition)
  • 2.60: Blu-ray fixes

Windows XP reads up to 2.01, macOS has best support for 1.50, Linux varies by kernel version.

See UDF compatibility matrix for details.

Bridge Discs: Dual Format

Bridge discs contain both ISO 9660 and UDF filesystems. While the spec suggests they should share data, real-world implementations take creative approaches.

UDF Anchor Volume Descriptor Pointer
-- UDF Anchor at sector 256 (and last sector - 256)

struct AnchorVolumeDescriptorPointer {
  tag               : DescriptorTag  -- tag.id = 2
  main_vds_extent   : Extent         -- location + length of main VDS
  reserve_vds_extent: Extent         -- backup VDS location
}

struct DescriptorTag {
  id                : u16le        -- 1=PVD, 2=AVDP, 5=Partition, 6=LogicalVol
  version           : u16le
  checksum          : u8           -- sum of bytes 0..3 and 5..15
  serial            : u16le
  crc               : u16le
  crc_length        : u16le
  location          : u32le        -- sector number of this descriptor
}

struct Extent {
  length            : u32le        -- in bytes
  location          : u32le        -- sector number
}

-- Bridge disc detection: check for both "CD001" at sector 16
-- AND a valid AVDP tag (id=2) at sector 256

Hybrid Formats: Multi-Platform

Mac/PC hybrid discs, HFS+/ISO hybrids, and other multi-format discs demonstrate creative engineering to support multiple platforms seamlessly.

> Apple Partition Map

Mac hybrid discs start with an Apple Partition Map, followed by HFS+, with ISO 9660 structures carefully positioned to avoid conflicts. The first 16 sectors contain partition data instead of being empty as ISO 9660 expects.

The APM at sector 0 technically violates ISO 9660 but works in practice.

El Torito: Bootable CD Engineering

El Torito enables bootable CDs by embedding floppy or hard disk images. Implementation requires careful attention to detail for compatibility.

Boot Record: 00 43 44 30 30 31 01 45 4C 20 54 4F 52 49 54 4F .CD001.EL TORITO
El Torito Boot Record
-- Boot Record Volume Descriptor (type 0x00) among the VD set

struct ElToritoBootRecord {
  type              : u8          -- 0x00 = Boot Record
  identifier        : char[5]     -- "CD001"
  version           : u8          -- 1
  boot_system_id    : char[32]    -- "EL TORITO SPECIFICATION" (padded)
  _unused           : u8[32]
  catalog_sector    : u32le       -- LBA of boot catalog
}

struct BootCatalogEntry {
  boot_indicator    : u8          -- 0x88 = bootable, 0x00 = not
  boot_media_type   : u8          -- 0=no emulation, 1=1.2M floppy, 2=1.44M, 4=HDD
  load_segment      : u16le       -- 0x0000 = default 0x07C0
  system_type       : u8
  _unused           : u8
  sector_count      : u16le       -- sectors to load (512-byte)
  load_lba          : u32le       -- start sector of boot image
}
! UEFI Boot

UEFI bootable ISOs require an EFI System Partition formatted as FAT32 inside the ISO. This creates a FAT32 filesystem within an ISO 9660 filesystem, referenced by the El Torito boot catalog.

See Rod Smith's guide or xorriso documentation.

Windows Imaging Format

WIM files showcase elegant design: deduplicated file data, XML metadata, and solid compression.

WIM Header
struct WIMHeader {
  magic             : char[8]     -- "MSWIM\0\0\0"
  header_size       : u32le       -- size of this header
  version           : u32le
  flags             : u32le       -- see compression flags below
  chunk_size        : u32le       -- compression chunk (usually 32768)
  guid              : u8[16]
  part_number       : u16le
  total_parts       : u16le
  image_count       : u32le
  offset_table      : ResourceEntry -- offset table location
  xml_data          : ResourceEntry -- XML metadata location
  boot_metadata     : ResourceEntry
  integrity_table   : ResourceEntry
}

-- Compression type from flags bits [16..19]:
--   0 = None
--   1 = XPRESS  (fast, moderate ratio)
--   2 = LZX     (slow, best ratio)
--   3 = LZMS    (Win8+, solid compression)

struct ResourceEntry {
  size_and_flags    : u64le       -- low 56 bits = compressed size, high 8 = flags
  offset            : u64le       -- absolute file offset
  original_size     : u64le       -- uncompressed size
}
i Validation Strategy

The XML metadata in WIM files may contain inconsistencies. File counts, compression types, and timestamps should be verified against actual resource entries for robust parsing.

Further Reading

Specifications

Implementations

  • libcdio: GNU's CD-ROM I/O library
  • 7-Zip source: Igor Pavlov's implementation handles many edge cases
  • Linux kernel: fs/isofs/ and fs/udf/ for real-world implementations