Palm Bible Plus PDB Format

Yohanes Nugroho, December 2010

This document contains the description of the Palm Bible+ PDB Format. The description for PDB format itself can be found at the following URLS:

A PDB format is just a sequence of blocks called records. Inside the records it can contain anything. This document describes the format that is used by PalmBible+ for storing the compressed bible data. This document assumes that you can already read PDB format, and you can read any record randomly.

Version Info

The first record (record number 0) contains version info and information about books:

size	type	content
16 bytes	array of char	of version name
128 bytes	array of char	version info (usually contains copyright info)
1 byte	char	contains separator character (or '\0' if it there is no separator character)
1 byte	byte	contains version attribute. First bit contains information about copy protection (1 means copy protected), second bit contains information about byteshifted version (1 = not byte shifted, used for Shift JIS), third bit indicates alignment (1 is right aligned, 0 is left aligned).
2 bytes	word	record location of word index
2 bytes	word	number of records for word list
2 bytes	word	total number of books

The following bytes in record 0 contains information about books (book info):

size	type	content
2 bytes	word	book number
2 bytes	word	location of book record
2 bytes	word	number of records for this book
8 bytes	array of char	short name of this book (e.g GEN for Genesis)
32 bytes	array of char	full name of this book (e.g Genesis)

Word Index

Word index records can be located by reading record location of word index in the version info. The first record of word index records contains information about the next records.

Record 0

First 2 bytes contains the total number of indices (we will call this totalIndicesCount. After that there is an array of index info (i will call this index info array) containing totalIndicesCount entries. Each entry contains:

size	type	content
2 bytes	short	word length
2 bytes	short	total number of words that have length of `word length`
1 byte	boolean	is this word contains compressed value or normal word. A normal word will point to array of characters, a compressed value will point to array of numbers (each of the number in the array will point to a word)
1 byte	char	ignored value

Record 1-totalIndicesCount

Each of the record contains data corresponding to the element in the index info array. So, record 1 of word index (i.e: record location of word index + 1) contains index information in index info array element 0.

The compressor that creates the bible plus PDB will create a list of words in increasing word length. This list is then divided into several collection of records. Each record collection contains words of same length. The words are numbered sequentially from the first to the last record (the numbering is implicit, it is not stored).

example:

Word length: 1
Word count: 3
Compressed: false
words:
# Word
0 a
1 ?
2 !
Word length: 2
Word count: 4
Compressed: false
words:
# Word
3 an
4 as
5 by
6 us

A word list record can also contain compressed record, this record doesn't contain string, but numbers that refers to other records. Example:

Word length: 2
Word count: 2
Compressed: true
words:
# Word
7 <4 0>  ---> by a
8 <6 1>  ---> us?

Book Record

To open a book, open the records at location pointed by location of book record and read all the records for that book (see enumber of records for this book in book info). The first record contains information about the book.

size	type	content
2 bytes	short	total number of chapters for this book (`totalChapter`)
2 bytes * totalChapter	array of short	total accumulated verse per chapter, starting from chapter 2 (`totalVersesAcc`)
4 bytes * totalChapter	array of int	total accumulated characters per chapter, starting from chapter 2 (`totalChapterCharsAcc`)
the rest of the record	array of short	total accumulated chapter per verse (`totalVerseCharsAcc`)

Explanation: for every book, all of the verse data (the words in the verse) are stored in array of words, each word represented by a number. There is no delimiter to indicate where is the start of a chapter or a verse:

For example, the record will contain numbers like these:

 words           50 2 21 12 15 21 32  75  32  21 33  32  45 64

The numbers may represent chapter/verse like this:

 chapter         0                     1 
 verse           0       1        2    0      1      2     3
 words           50 2 21 12 15 21 32  75  32  21 33  32 45 64 
 Word position   0  1 2  3  4  5  6   7   8   9  10  11 12 13

It means that chapter 0 verse 0 contains the number 50 2 21, chapter 1 verse 0 contains the number 75 32. There are three arrays that will help us locate a verse:

 chapter              0                     1 
 totalChapterCharsAcc 0                     7
 totalVersesAcc       0                     3
 totalVerseCharsAcc   0       3        6    0       2      4     6
 verse                0       1        2    0       1      2     3
 words                50 2 21 12 15 21 32   75  32  21 33  32 45 64
 Word position        0  1 2  3  4  5  6    7   8   9  10  11 12 13

The array totalChapterCharsAcc indicates where is the word position for this chapter. Since chapter 0 (the first chapter) always starts from 0, in the readl PDB file is not stored in the array (the array starts from second chapter, but to make the example easy, I am displaying it). To go to a particular verse, we can look up on totalVerseCharsAcc, we add the verse location to the location that we get from totalChapterCharsAcc. But as you can see this array is linear per verse we don't know where chapter N starts, unless we know the total number of verses from chapter 0 to chapter N - 1. That is where we user the array totalVersesAcc. This array indicates the total number of verses from all previous chapters.

So how do we know where to start reading if we want to read a particular chapter/verse?

For chapter 1 verse 1 (the first verse in the book) it is clear that it must start in the offset 0.
For other chapters, we need to lookup in the array totalChapterCharsAcc to find the start of that chapter
After we know where to read for that chapter, we still need to skip some verses. To skip some verses we need to lookup in the array totalVerseCharsAcc:
- for first chapter we can just lookup at totalVerseCharsAcc,
- for other chapters, we need to lookup totalVersesAcc first and add the verse number to it. Using the example: to lookup the start of chapter 1 verse 2, we need to lookup totalChapterCharsAcc, which is 7. Then we need to lookup in totalVersesAcc for index 1, and it is: 3, so we will lookup at totalVerseCharsAcc[3 + 2] (2 is the verse number) and we got the number 4, which means that the start of the verse is 7 (from totalChapterCharsAcc) + 4 which is 11

To find the number of words in the verse, we can substract the numbers in the next totalVerseCharsAcc with the current one (in this case: 6 - 4 = 2 word numbers).

Decompressing words

Once we know the location of the words, we can read the words as numbers, then we can translate the numbers to strings. First we will need to know in which record we should read the word. We can check the number of words in the word index to find which word list we should use. After that we can find the word that we want. Using the previous word list as example:

Word length: 1
Word count: 3
Compressed: false
words:
# Word
0 a
1 ?
2 !
Word length: 2
Word count: 4
Compressed: false
words:
# Word
3 an
4 as
5 by
6 us

Word length: 2
Word count: 2
Compressed: true
words:
# Word
7 <4 0>  ---> by a
8 <6 1>  ---> us?

If we have word number 5, we can look that the first collection of words only contains 3 words, so we go to the next word list which contains 4 words, which means that the word number 5 should be inside there. We can substract with the previous word count list 5 - 3 = 2, so the word is in position 3 in second word list, which is the word by.

We also check if the table is in fact a compressed table. If compressed is true, then we need to do lookup on each word in the table

Links: bible-pdb