While @wargreen was finishing the mastering of an audio CD (with libre tools of course, except for one as we'll see), he discovered that cue2ddp, one of the only tools to generate DDP files (Disc Description Protocol, a proprietary format used to transmit the premaster of a disc to the manufacturer), did not allow setting the language of the CD's metadata (CD-Text).
Unfortunately, the DDP specification is proprietary and not freely available, so writing our own program to generate DDP files would be quite hard, and it would be difficult to ensure correctness. And patching cue2ddp is not possible because its source code is not available due to the licensing agreement with DCA, the DDP specification authors.
Not possible ? Who said that you need source code to patch ? :)
Long story short, to set cue2ddp's 1.1 language, write the bytes [0xc6, 0x40, 0x16, 0x??] at address 0x76c0, replacing ?? with the language code found in this table.
Note that this experimental binary patch is only valid for version 1.1 on the x86_64 architecture (ddptools-1.1-x86_64-elf.tar.gz). Replicating the process for other variants should be easy.
That's the gist of it, but it's a good opportunity to show how to get to this result.
- hexedit (or any other binary editor)
- radare2 (used for quick binary executable analysis, and to re-assemble binaries)
- Ghidra (used to decompile the binary)
When attacking this kind of problem, it's always good to a bit of exploratory work, and to find the appropriate documentation. Reading it is optional, but it can help.
The problem we want to solve is that we cannot define the language defined in the CD-Text (metadata) of the DDP file, and that it uses "English" by default.
CD-Text's format is documented thanks to GNU's libcdio library. From this documentation we can see that the CD-Text is structured in "packs", and that the language is defined in the last 8 bytes of the "block size information", which is a 36-byte block made up of the payloads of three 16-byte packs. We can also see that the language is defined as a single byte (0x09 for "English"), for each of the all 8 possible blocks, and we can see the table that specifies the language codes. So far so good.
From this, we can quickly look at the CD-Text from a DDP generated by cue2ddp, and observe that the last 16-byte pack contains our sequence of language codes (0x09 for the first block, and 0x00 for all others, in our case), prefixed by a 2-byte header, and suffixed by a 2-byte CRC checksum.
If we try to manually change the language code (to 0x0f, in our case), the language is properly read by tools that read DDP and CD-Text files, however as the checksum is not recomputed, that CD-Text file is invalid. We could of course use some tools to modify the CD-Text file itself, and then regenerate a DDP file using this new CD-Text, but that's not a... satisfying solution. We want satisfaction.
We tried changing the locale for cue2ddp, but whatever we did, the CD-Text would still be set to English. At that point, Luke would of course read the source, but it's hidden by the dark side of the force. So Luke reads the only source he's got : cue2cdd machine code, by opening the binary in Cutter (part of Radare2) to navigate the call graph and try to understand where are the cool things happening.
Fortunately, the program has been released with it symbols not stripped, so at least we can have some labels to our functions, and show a pretty call graph of the main function. The big box at the top is the beginning of the main() function, and it finishes in the small box at the bottom right, we all the lines converge.
We can see that there is a dense path of functions at the right : that's probably where the important work is done. All "shortcuts" to the end are usually related to error handling. There's also some boxes on the left, and if we zoom more we can see that they loop back at the beginning (there's a blue line that goes back up). Such a loop at the beginning of a main() function is usually related to argument parsing. Both of these intuitions can be quickly confirmed by looking at the symbols referenced in these blocks.
If we navigate a bit the right of the graph, we can quickly see some interesting functions :
We can see a call to a ddp_write() function, and then a branch that calls either of ddp_write_cdtext_from_file() or ddp_write_cdtext(). We're not interested in writing the CD-Text from an external file, so we look at ddp_write_cdtext(), see that it calls pq_write_cdtext(), which calls cdtw_open(), cdtw_put() and cdtw_close() (we've only noted the functions that "look interesting").
At this point, the disassembly can become a bit cumbersome, so it can be useful to use tools that can attempt decompiling, like Ghidra :
Ghidra has a powerful decompilation backend, but because the program is optimized by the compiler, what we get isn't always easy to understand. It also allows us to rename variables and redefine types, so that we can progressively make the program easier to understand.
In this case, we quickly observe that pq_write_cdtext() :
- takes a pointer to some data, and a file to write to
- reads some booleans from pq_has_title(), pq_has_performer(), pq_has_songwriter()
- calls cdtw_open()
- does the same dance for the title, performer and songwriter :
- read some data from "data_ptr"
- call cdtw_put() with the data read from "data_ptr"
- while some pointer's value is not equal to 0, call cdtw_put() with no data and increment the pointer
- calls cdtw_close()
Failing to decompile ?
The calls to cdtw_put() seem related to title/performer/songwriter, and if we correlate with our quick analysis of the CD-Text binary output, correspond to the beginning of the files. The block that contains the language is written after this kind of metadata, at the end of the CD-Text. So our best current candidate is cdtw_close(). Let's show it in Ghidra.
Woops. Something went wrong here, that code is absolutely unreadable. Why do we have undefined types ? And casting "stream" to "long" to cast it to "undefined4" after incrementing it ? What is that supposed to mean ?
It means that the compiler has properly done its job of being smart and annoying, and that it has noticed that the other arguments this function takes always have the same position relative to the "stream" argument. Because of this, and to save some space on the stack, it decided to always refer to the other arguments relative to "stream".
A satisfying solution
Even if this code is much less readable, we can still make out some things if we remind ourselves that "stream + some_number" refers to some other value that may not be related to the stream. We can also recognize some values from the CD-Text spec, especially the pack type that we're interested in : 0x8f, we can see some calls to cdtw_write() and cdtw_read(), and we can see that cdtw_write() is called with arguments obtained from "stream + 1" and "stream", so we can suppose that "stream + 1" points to some buffer that we want to encode and write to the stream.
We can also observe some pattern repeated three times :
- Call cdtw_clear()
- Set the value at "stream + 1" to 0x8f
- Call cdtw_write()
0x8f is a block header, so it makes sense for it to be defined at the first byte of the buffer sent to cdtw_write()
This reminds us of something : according to the CD-Text format documentation, the block where the language is defined is defined as 3 packs that all have the type 0x8f, each carrying a 12-byte payload.
As the language is defined in the last 8 bytes, we know that it will be set in the last of these 3 packs, so we can now focus on the code used to write that pack :
So we have 4 lines that write to the buffer (lines 46 to 49), before we call the last cdtp_write(), that will write out a new pack to the CD-Text :
- line 46 is the pack header
- line 47 and 48 are static values that do not match the default language code
also, "stream + 0xd" and "stream + 0x11" are set for all packs, so it's probably not the language code
- line 40 is value obtained from "stream + 0xf"
So the only thing we can see here that is specific to this pack is writing the character at "stream + 0xf" to "stream + 0x16". I have no idea of the meaning of this operation, but we can suppose that it's related to the language code, as it's the only interesting thing we wrote in the last pack.
At this point, we can add a breakpoint on the assembly code corresponding to this line (at virtual address 0x4076c0), and try to observe the value retrieved from "stream + 0xf", which will be stored in R8B (the lower byte of the R8 register ), as the assembly listing shows.
And... we get 0x09, so it really starts to look that is the line defining the language, and this means that we know know how to "fix" the problem.
Indeed, if we edit (in either Cutter or Ghidra) "mov byte [rax + 0x16], R8B" by "mov byte [rax + 0x16], 0xf" (replace 0xf by your language code), we now get a cue2ddp that sets the language of the CD-Text to French !
As we know know the address of this instruction in cue2ddp 1.1 x86_64, we can now edit it directly with simple tools :
echo -ne '\xc6\x40\x16\x0f' | dd of=cue2ddp-fr seek=30400 bs=1 conv=notrunc
- each 2 bytes are reversed, because on this platform, assembly is encoded in little-endian words.
- the address in the file is 0x76c0, which is mapped to the 0x4076c0 in the process address space.
Now some other things that could be done are adding some assembly to request the language at runtime, by finding some free space where we could add our instructions, and jump to that. But let's not overengineer things, shall we ? ;)
This is why we need open standards, and libre software.
And this is how we can still react to not having the right to improve our tools.
NOTE: Even if the checksums look correct, we cannot guarantee that DDP files produced by this patched version of cue2ddp will be correct. Obviously, this is a fun experiment with reverse engineering closed source software, and not a very good solution to this problem. It works for us(TM).
- rada.re, for radare2, and its interface Cutter
- the NSA, for Ghidra
- cue2dpp's developer
- @wargreen for agreeing to test my dirty hacks ;)