Palm Bible Plus PDB Format
Yohanes Nugroho, December 2010
This document contains the description of the Palm Bible+ PDB Format. The description for PDB format itself can be found at the following URLS:
- http://www.hotpaw.com/rhn/palm/pdb.txt
- http://membres.lycos.fr/microfirst/palm/pdb.html
- http://wiki.mobileread.com/wiki/PDB#Palm_Database_Format
A PDB format is just a sequence of blocks called records. Inside the records it can contain anything. This document describes the format that is used by PalmBible+ for storing the compressed bible data. This document assumes that you can already read PDB format, and you can read any record randomly.
Version Info
The first record (record number 0) contains version info and information about books:
size | type | content |
---|---|---|
16 bytes | array of char | of version name |
128 bytes | array of char | version info (usually contains copyright info) |
1 byte | char | contains separator character (or '\0' if it there is no separator character) |
1 byte | byte | contains version attribute. First bit contains information about copy protection (1 means copy protected), second bit contains information about byteshifted version (1 = not byte shifted, used for Shift JIS), third bit indicates alignment (1 is right aligned, 0 is left aligned). |
2 bytes | word | record location of word index |
2 bytes | word | number of records for word list |
2 bytes | word | total number of books |
The following bytes in record 0 contains information about books (book info
):
size | type | content |
---|---|---|
2 bytes | word | book number |
2 bytes | word | location of book record |
2 bytes | word | number of records for this book |
8 bytes | array of char | short name of this book (e.g GEN for Genesis) |
32 bytes | array of char | full name of this book (e.g Genesis) |
Word Index
Word index records can be located by reading record location of word index
in the version info. The first record of word index records contains information about the next records.
Record 0
First 2 bytes contains the total number of indices (we will call this totalIndicesCount
. After that there is an array of index info (i will call this index info array
) containing totalIndicesCount
entries. Each entry contains:
size | type | content |
---|---|---|
2 bytes | short | word length |
2 bytes | short | total number of words that have length of word length |
1 byte | boolean | is this word contains compressed value or normal word. A normal word will point to array of characters, a compressed value will point to array of numbers (each of the number in the array will point to a word) |
1 byte | char | ignored value |
Record 1-totalIndicesCount
Each of the record contains data corresponding to the element in the index info array
. So, record 1 of word index (i.e: record location of word index
+ 1) contains index information in index info array
element 0.
The compressor that creates the bible plus PDB will create a list of words in increasing word length. This list is then divided into several collection of records. Each record collection contains words of same length. The words are numbered sequentially from the first to the last record (the numbering is implicit, it is not stored).
example:
Word length: 1
Word count: 3
Compressed: false
words:
# Word
0 a
1 ?
2 !
Word length: 2
Word count: 4
Compressed: false
words:
# Word
3 an
4 as
5 by
6 us
A word list record can also contain compressed record, this record doesn't contain string, but numbers that refers to other records. Example:
Word length: 2
Word count: 2
Compressed: true
words:
# Word
7 <4 0> ---> by a
8 <6 1> ---> us?
Book Record
To open a book, open the records at location pointed by location of book record
and read all the records for that book (see enumber of records for this book
in book info). The first record contains information about the book.
size | type | content |
---|---|---|
2 bytes | short | total number of chapters for this book (totalChapter ) |
2 bytes * totalChapter | array of short | total accumulated verse per chapter, starting from chapter 2 (totalVersesAcc ) |
4 bytes * totalChapter | array of int | total accumulated characters per chapter, starting from chapter 2 (totalChapterCharsAcc ) |
the rest of the record | array of short | total accumulated chapter per verse (totalVerseCharsAcc ) |
Explanation: for every book, all of the verse data (the words in the verse) are stored in array of words, each word represented by a number. There is no delimiter to indicate where is the start of a chapter or a verse:
For example, the record will contain numbers like these:
words 50 2 21 12 15 21 32 75 32 21 33 32 45 64
The numbers may represent chapter/verse like this:
chapter 0 1
verse 0 1 2 0 1 2 3
words 50 2 21 12 15 21 32 75 32 21 33 32 45 64
Word position 0 1 2 3 4 5 6 7 8 9 10 11 12 13
It means that chapter 0 verse 0 contains the number 50 2 21
, chapter 1 verse 0 contains the number 75 32
. There are three arrays that will help us locate a verse:
chapter 0 1
totalChapterCharsAcc 0 7
totalVersesAcc 0 3
totalVerseCharsAcc 0 3 6 0 2 4 6
verse 0 1 2 0 1 2 3
words 50 2 21 12 15 21 32 75 32 21 33 32 45 64
Word position 0 1 2 3 4 5 6 7 8 9 10 11 12 13
The array totalChapterCharsAcc
indicates where is the word position for this chapter. Since chapter 0 (the first chapter) always starts from 0, in the readl PDB file is not stored in the array (the array starts from second chapter, but to make the example easy, I am displaying it). To go to a particular verse, we can look up on totalVerseCharsAcc
, we add the verse location to the location that we get from totalChapterCharsAcc
. But as you can see this array is linear per verse we don't know where chapter N starts, unless we know the total number of verses from chapter 0 to chapter N - 1. That is where we user the array totalVersesAcc
. This array indicates the total number of verses from all previous chapters.
So how do we know where to start reading if we want to read a particular chapter/verse?
- For chapter 1 verse 1 (the first verse in the book) it is clear that it must start in the offset 0.
- For other chapters, we need to lookup in the array
totalChapterCharsAcc
to find the start of that chapter - After we know where to read for that chapter, we still need to skip some verses. To skip some verses we need to lookup in the array
totalVerseCharsAcc
:- for first chapter we can just lookup at
totalVerseCharsAcc
, - for other chapters, we need to lookup
totalVersesAcc
first and add the verse number to it. Using the example: to lookup the start of chapter 1 verse 2, we need to lookuptotalChapterCharsAcc
, which is 7. Then we need to lookup intotalVersesAcc
for index 1, and it is: 3, so we will lookup attotalVerseCharsAcc[3 + 2]
(2 is the verse number) and we got the number4
, which means that the start of the verse is 7 (fromtotalChapterCharsAcc
) + 4 which is 11
- for first chapter we can just lookup at
To find the number of words in the verse, we can substract the numbers in the next totalVerseCharsAcc
with the current one (in this case: 6 - 4 = 2 word numbers).
Decompressing words
Once we know the location of the words, we can read the words as numbers, then we can translate the numbers to strings. First we will need to know in which record we should read the word. We can check the number of words in the word index to find which word list we should use. After that we can find the word that we want. Using the previous word list as example:
Word length: 1
Word count: 3
Compressed: false
words:
# Word
0 a
1 ?
2 !
Word length: 2
Word count: 4
Compressed: false
words:
# Word
3 an
4 as
5 by
6 us
Word length: 2
Word count: 2
Compressed: true
words:
# Word
7 <4 0> ---> by a
8 <6 1> ---> us?
If we have word number 5, we can look that the first collection of words only contains 3 words, so we go to the next word list which contains 4 words, which means that the word number 5 should be inside there. We can substract with the previous word count list 5 - 3 = 2, so the word is in position 3 in second word list, which is the word by
.
We also check if the table is in fact a compressed table. If compressed
is true, then we need to do lookup on each word in the table
Copyright © 2009-2018 Yohanes Nugroho