[Simh] Of CD-ROMs

Timothe Litt litt at ieee.org
Mon Sep 3 09:31:42 EDT 2018


As is often the case, terminology seems to have begotten confusion. 
CD-ROM media have a  (usable) sector size of 2048 bytes + ECC/overhead. 
SCSI allows exposing this as variable logical block sizes.  Device
drivers may present the SCSI LB size to their clients - or they may
reblock the data into larger or smaller units.  CD drives generally use
the SCSI command set - though it may be transported over other buses -
from from ATAPI to USB.

For detailed information on CD-ROM operation, see the SCSI2 spec
(X3.131), specifically chapter 14.  A free copy is available at
http://www.staff.uni-mainz.de/tacke/scsi/SCSI2.html ( chapter 14:
http://www.staff.uni-mainz.de/tacke/scsi/SCSI2-14.html  You'll find
references to other chapters, which are necessary for a full
understanding.)  Or SCSI3 (X3.304-1997 in particular, though it
references another 17 documents).

Here are the highlights -- despite some simplifications/omissions I'm
afraid it's still rather long.

The terminology used for CD media differs from that used for magnetic
disks and is at best internally semi-consistent.  CD-ROM format is a
hybrid of audio and data storage - the latter having been grafted on top
of the former.  And a given disk may be pure audio, pure data, or a
mixture.  This note is directed toward pure data disks - mostly.

>
> CD-ROM is a unique SCSI device in the respect that some logical blocks
> on a disc may not be accessible by all commands. SEEK commands may be
> issued to any logical block address within the reported capacity of
> the disc. READ commands can NOT be issued to logical blocks that occur
> in some transition areas, or to logical blocks within an audio track.
> PLAY commands can NOT be issued to logical blocks within a data track.

("Track" isn't what you think of on a traditional hard disk - the set of
sectors on a single surface at a given head position.  It's more like
"partition" - or per its heritage, the "song" or "piece", or "track" of
an LP record.)

(CD-ROMs do not have geometry in the sense of traditional disk drives;
in this respect, they are more like MSCP devices.  OS drivers that
report geometry fabricate sectors/track, tracks/cyl, and cyl/volume -
there is no standard (or perfect) way to do this.  Results vary -
generally the reported medium size is more accurate than multiplying the
"geometry".  The OS driver may also lie about bytes/sector.  Ordinary
user mode programs live happily in these alternate realities, but they
create challenges for emulation.)

> The physical format defined by the CD-ROM media standards provides
> 2352 bytes per sector. For usual computer data applications, 2048
> bytes are used for user data, 12 bytes for a synchronization field, 4
> bytes for a sector address tag field and 288 bytes - the auxiliary
> field - for L-EC (CD-ROM data mode 1). In less critical applications,
> the auxiliary field may also be used for user data (CD-ROM data mode
> 2). A CD-ROM physical sector size is 2048, 2336 or 2340 bytes per
> sector. These values correspond to user data field only, user data
> plus auxiliary data, the 4 byte address tag plus user data plus
> auxiliary data.

(Mode 2 is intended for applications where a few error bits matter less
than capacity - e.g. audio or video streams.  I'm not aware of any OS
that uses it for data, though reading in Mode 2 may provide access to
the ECC bits.  I believe some "copy protection" schemes [used by games,
encyclopedias, etc]  wrote intentionally bad ECC bits so that only
proprietary software could [easily] read the disks; the OS would report
uncorrectable errors if read with its driver.)

> A CD-ROM small frame consists of: a) 1 synchronization pattern (24+3
> bits) b) 1 byte of sub-channel data (14+3 bits) c) 24 bytes of data
> (24 x (14+3) bits) d) 8 bytes of CIRC code (8 x (14+3) bits) Total:
> 588 bits.
>
> For data: the data bytes of 98 small frames comprise the physical unit
> of data referred to as a sector. (98 small frames times 24 bytes per
> small frame equal 2 352 bytes of data per sector.)
>
> A sector that contains CD-ROM data mode one data has the following format:
>
> a) 12 bytes Synchronization field b) 4 bytes CD-ROM data header
> Absolute M field in bcd format Absolute S field in bcd format Absolute
> F field in bcd format CD-ROM data mode field c) 2048 bytes User data
> field d) 4 bytes Error detection code e) 8 bytes Zero f) 276 bytes
> Layered error correction code
>
> A sector that contains CD-ROM Data Mode two data has the following format:
>
> a) 12 bytes Synchronization field b) 4 bytes CD-ROM data header
> Absolute M field in bcd format Absolute S field in bcd format Absolute
> F field in bcd format CD-ROM data mode field c) 2 336 bytes User data
> field (2048 bytes of mode 1 data plus 288 bytes of auxiliary data)
>
(This is what is physically recorded on the medium.  It's not
necessarily what the driver or user sees.)

> Logical addressing of CD-ROM information may use any logical block
> length. When the specified logical block length is an exact divisor or
> integral multiple of the selected number of bytes per CD-ROM sector,
> the device shall map (one to one) the bytes transferred from CD-ROM
> sectors to the bytes of logical blocks. For instance, if 2 048 bytes
> are transferred from each CD-ROM sector (specified by the CD-ROM
> density code value), and the logical block length is 512 bytes, then
> each CD-ROM sector shall map to exactly four logical blocks. This
> International Standard does not define the mapping of logical block
> lengths which do not evenly divide or are not exact multiples of the
> selected number of bytes per CD-ROM sector.

(SCSI maps the physical sectors to logical blocks.  Some drive firmware
hard-codes the logical block length; some allow it to be set by the
driver; some allow saving the setting in drive; and some have a jumper -
most commonly to select 512 or 2048 byes.  I've seen drives expose 128 -
to emulate floppies to 4096 byte logical blocks. In addition, device
drivers may also map the drive's logical block length to some other
block size that the file system or OS API prefers.  E.g. 2048 byte user
data LBNs to 512 bytes filesystem blocks, or 512 byte user data LBNs to
4096 byte file system blocks.  "You are in a maze of twisty passages.") 

>
> The physical format of CD-ROM and CD-DA media uses a smaller unit of
> synchronization than the more familiar magnetic or optical recording
> systems. The basic data stream synchronization unit is a small frame.
> This is not the same large frame (sector) as referred to in the MSF
> unit. Each small frame consists of 588 bits. A sector on CD-ROM media
> consists of 98 small frames.
> A CD-ROM small frame consists of: a) 1 synchronization pattern (24+3
> bits) b) 1 byte of sub-channel data (14+3 bits) c) 24 bytes of data
> (24 x (14+3) bits) d) 8 bytes of CIRC code (8 x (14+3) bits) Total:
> 588 bits.
>
> Data, sub-channel and CIRC bytes are encoded with an eight-to-fourteen
> bit code; then three merging bits are added. The merging bits are
> chosen to provide minimum low-frequency signal content and optimize
> phase lock loop performance.

(This is the next level of abstraction (or lies).  A "sector" isn't the
smallest addressable unit when audio is involved.)

(MSF is from the audio heritage of CD-ROMs - it's "Minute, Second,
Frame", where Frame is 1/75 sec.  But whether those have anything to do
with wall-clock times varies.  And MSF pointers for audio are inexact;
they have a tolerance of +/- one or more seconds.)

There is additional information, including a "table of contents" on the
CD - this is not included in the summary above.  The TOC describes the
track structure of the CD - where it starts, whether it's data or audio,
etc.  It can be read with special SCSI commands.  With multisession
disks, it gets more complicated.

Most of this applies, with some modifications, to writable media and
DVD-ROM.  In general, writing requires direct access to SCSI commands
(via a class driver or using IO that bypasses the OS).  The various
levels of abstraction/alternative realities come apart when trying to
write; also, since rewriting is impossible (or expensive), one generally
needs to build the complete filesystem/session before writing - this
requires more complexity (and memory) than most people are willing to
put in a driver.

Anyone interested in more detail - or nit-picking - should read the
specifications, including ECMA 130 (Data interchange on CD), ECMA119
(aka ISO9660 - the filesystem) + the OSTA UDF, Joliet & Rickridge specs
that extend it, IEEE P1281 (System Use sharing Protocol), and the SCSI 1
& 2 specs.  Still more information is contained in ECMA-394 (CD-R) and
the CDI specs.  And, of course, your favorite OS manuals.

Emulating this stack is a non-trivial project.



More information about the Simh mailing list