Palm Bible Plus PDB Format

Yohanes Nugroho, December 2010

This document contains the description of the Palm Bible+ PDB Format. The description for PDB format itself can be found at the following URLS:

A PDB format is just a sequence of blocks called records. Inside the records it can contain anything. This document describes the format that is used by PalmBible+ for storing the compressed bible data. This document assumes that you can already read PDB format, and you can read any record randomly.

Version Info

The first record (record number 0) contains version info and information about books:

size type content
16 bytes array of char of version name
128 bytes array of char version info (usually contains copyright info)
1 byte char contains separator character (or '\0' if it there is no separator character)
1 byte byte contains version attribute. First bit contains information about copy protection (1 means copy protected), second bit contains information about byteshifted version (1 = not byte shifted, used for Shift JIS), third bit indicates alignment (1 is right aligned, 0 is left aligned).
2 bytes word record location of word index
2 bytes word number of records for word list
2 bytes word total number of books

The following bytes in record 0 contains information about books (book info):

size type content
2 bytes word book number
2 bytes word location of book record
2 bytes word number of records for this book
8 bytes array of char short name of this book (e.g GEN for Genesis)
32 bytes array of char full name of this book (e.g Genesis)

Word Index

Word index records can be located by reading record location of word index in the version info. The first record of word index records contains information about the next records.

Record 0

First 2 bytes contains the total number of indices (we will call this totalIndicesCount. After that there is an array of index info (i will call this index info array) containing totalIndicesCount entries. Each entry contains:

size type content
2 bytes short word length
2 bytes short total number of words that have length of word length
1 byte boolean is this word contains compressed value or normal word. A normal word will point to array of characters, a compressed value will point to array of numbers (each of the number in the array will point to a word)
1 byte char ignored value

Record 1-totalIndicesCount

Each of the record contains data corresponding to the element in the index info array. So, record 1 of word index (i.e: record location of word index + 1) contains index information in index info array element 0.

The compressor that creates the bible plus PDB will create a list of words in increasing word length. This list is then divided into several collection of records. Each record collection contains words of same length. The words are numbered sequentially from the first to the last record (the numbering is implicit, it is not stored).

example:

Word length: 1
Word count: 3
Compressed: false
words:
# Word
0 a
1 ?
2 !
Word length: 2
Word count: 4
Compressed: false
words:
# Word
3 an
4 as
5 by
6 us

A word list record can also contain compressed record, this record doesn't contain string, but numbers that refers to other records. Example:

Word length: 2
Word count: 2
Compressed: true
words:
# Word
7 <4 0>  ---> by a
8 <6 1>  ---> us?

Book Record

To open a book, open the records at location pointed by location of book record and read all the records for that book (see enumber of records for this book in book info). The first record contains information about the book.

size type content
2 bytes short total number of chapters for this book (totalChapter)
2 bytes * totalChapter array of short total accumulated verse per chapter, starting from chapter 2 (totalVersesAcc)
4 bytes * totalChapter array of int total accumulated characters per chapter, starting from chapter 2 (totalChapterCharsAcc)
the rest of the record array of short total accumulated chapter per verse (totalVerseCharsAcc)

Explanation: for every book, all of the verse data (the words in the verse) are stored in array of words, each word represented by a number. There is no delimiter to indicate where is the start of a chapter or a verse:

For example, the record will contain numbers like these:

 words           50 2 21 12 15 21 32  75  32  21 33  32  45 64

The numbers may represent chapter/verse like this:

 chapter         0                     1 
 verse           0       1        2    0      1      2     3
 words           50 2 21 12 15 21 32  75  32  21 33  32 45 64 
 Word position   0  1 2  3  4  5  6   7   8   9  10  11 12 13

It means that chapter 0 verse 0 contains the number 50 2 21, chapter 1 verse 0 contains the number 75 32. There are three arrays that will help us locate a verse:

 chapter              0                     1 
 totalChapterCharsAcc 0                     7
 totalVersesAcc       0                     3
 totalVerseCharsAcc   0       3        6    0       2      4     6
 verse                0       1        2    0       1      2     3
 words                50 2 21 12 15 21 32   75  32  21 33  32 45 64
 Word position        0  1 2  3  4  5  6    7   8   9  10  11 12 13

The array totalChapterCharsAcc indicates where is the word position for this chapter. Since chapter 0 (the first chapter) always starts from 0, in the readl PDB file is not stored in the array (the array starts from second chapter, but to make the example easy, I am displaying it). To go to a particular verse, we can look up on totalVerseCharsAcc, we add the verse location to the location that we get from totalChapterCharsAcc. But as you can see this array is linear per verse we don't know where chapter N starts, unless we know the total number of verses from chapter 0 to chapter N - 1. That is where we user the array totalVersesAcc. This array indicates the total number of verses from all previous chapters.

So how do we know where to start reading if we want to read a particular chapter/verse?

  • For chapter 1 verse 1 (the first verse in the book) it is clear that it must start in the offset 0.
  • For other chapters, we need to lookup in the array totalChapterCharsAcc to find the start of that chapter
  • After we know where to read for that chapter, we still need to skip some verses. To skip some verses we need to lookup in the array totalVerseCharsAcc:
    • for first chapter we can just lookup at totalVerseCharsAcc,
    • for other chapters, we need to lookup totalVersesAcc first and add the verse number to it. Using the example: to lookup the start of chapter 1 verse 2, we need to lookup totalChapterCharsAcc, which is 7. Then we need to lookup in totalVersesAcc for index 1, and it is: 3, so we will lookup at totalVerseCharsAcc[3 + 2] (2 is the verse number) and we got the number 4, which means that the start of the verse is 7 (from totalChapterCharsAcc) + 4 which is 11

To find the number of words in the verse, we can substract the numbers in the next totalVerseCharsAcc with the current one (in this case: 6 - 4 = 2 word numbers).

Decompressing words

Once we know the location of the words, we can read the words as numbers, then we can translate the numbers to strings. First we will need to know in which record we should read the word. We can check the number of words in the word index to find which word list we should use. After that we can find the word that we want. Using the previous word list as example:

Word length: 1
Word count: 3
Compressed: false
words:
# Word
0 a
1 ?
2 !
Word length: 2
Word count: 4
Compressed: false
words:
# Word
3 an
4 as
5 by
6 us

Word length: 2
Word count: 2
Compressed: true
words:
# Word
7 <4 0>  ---> by a
8 <6 1>  ---> us?

If we have word number 5, we can look that the first collection of words only contains 3 words, so we go to the next word list which contains 4 words, which means that the word number 5 should be inside there. We can substract with the previous word count list 5 - 3 = 2, so the word is in position 3 in second word list, which is the word by.

We also check if the table is in fact a compressed table. If compressed is true, then we need to do lookup on each word in the table

Copyright © 2009-2010 Yohanes Nugroho