Palm Bible Plus PDB Format
Yohanes Nugroho, December 2010
This document contains the description of the Palm Bible+ PDB Format. The description for PDB format itself can be found at the following URLS:
A PDB format is just a sequence of blocks called records. Inside the records it can contain anything. This document describes the format that is used by PalmBible+ for storing the compressed bible data. This document assumes that you can already read PDB format, and you can read any record randomly.
The first record (record number 0) contains version info and information about books:
|16 bytes||array of char||of version name|
|128 bytes||array of char||version info (usually contains copyright info)|
|1 byte||char||contains separator character (or '\0' if it there is no separator character)|
|1 byte||byte||contains version attribute. First bit contains information about copy protection (1 means copy protected), second bit contains information about byteshifted version (1 = not byte shifted, used for Shift JIS), third bit indicates alignment (1 is right aligned, 0 is left aligned).|
|2 bytes||word||record location of word index|
|2 bytes||word||number of records for word list|
|2 bytes||word||total number of books|
The following bytes in record 0 contains information about books (
|2 bytes||word||book number|
|2 bytes||word||location of book record|
|2 bytes||word||number of records for this book|
|8 bytes||array of char||short name of this book (e.g GEN for Genesis)|
|32 bytes||array of char||full name of this book (e.g Genesis)|
Word index records can be located by reading
record location of word index in the version info. The first record of word index records contains information about the next records.
First 2 bytes contains the total number of indices (we will call this
totalIndicesCount. After that there is an array of index info (i will call this
index info array) containing
totalIndicesCount entries. Each entry contains:
|2 bytes||short||word length|
|2 bytes||short|| total number of words that have length of
|1 byte||boolean||is this word contains compressed value or normal word. A normal word will point to array of characters, a compressed value will point to array of numbers (each of the number in the array will point to a word)|
|1 byte||char||ignored value|
Each of the record contains data corresponding to the element in the
index info array. So, record 1 of word index (i.e:
record location of word index + 1) contains index information in
index info array element 0.
The compressor that creates the bible plus PDB will create a list of words in increasing word length. This list is then divided into several collection of records. Each record collection contains words of same length. The words are numbered sequentially from the first to the last record (the numbering is implicit, it is not stored).
Word length: 1 Word count: 3 Compressed: false words: # Word 0 a 1 ? 2 ! Word length: 2 Word count: 4 Compressed: false words: # Word 3 an 4 as 5 by 6 us
A word list record can also contain compressed record, this record doesn't contain string, but numbers that refers to other records. Example:
Word length: 2 Word count: 2 Compressed: true words: # Word 7 <4 0> ---> by a 8 <6 1> ---> us?
To open a book, open the records at location pointed by
location of book record and read all the records for that book (see e
number of records for this book in book info). The first record contains information about the book.
|2 bytes||short|| total number of chapters for this book (
|2 bytes * totalChapter||array of short|| total accumulated verse per chapter, starting from chapter 2 (
|4 bytes * totalChapter||array of int|| total accumulated characters per chapter, starting from chapter 2 (
|the rest of the record||array of short|| total accumulated chapter per verse (
Explanation: for every book, all of the verse data (the words in the verse) are stored in array of words, each word represented by a number. There is no delimiter to indicate where is the start of a chapter or a verse:
For example, the record will contain numbers like these:
words 50 2 21 12 15 21 32 75 32 21 33 32 45 64
The numbers may represent chapter/verse like this:
chapter 0 1 verse 0 1 2 0 1 2 3 words 50 2 21 12 15 21 32 75 32 21 33 32 45 64 Word position 0 1 2 3 4 5 6 7 8 9 10 11 12 13
It means that chapter 0 verse 0 contains the number
50 2 21, chapter 1 verse 0 contains the number
75 32. There are three arrays that will help us locate a verse:
chapter 0 1 totalChapterCharsAcc 0 7 totalVersesAcc 0 3 totalVerseCharsAcc 0 3 6 0 2 4 6 verse 0 1 2 0 1 2 3 words 50 2 21 12 15 21 32 75 32 21 33 32 45 64 Word position 0 1 2 3 4 5 6 7 8 9 10 11 12 13
totalChapterCharsAcc indicates where is the word position for this chapter. Since chapter 0 (the first chapter) always starts from 0, in the readl PDB file is not stored in the array (the array starts from second chapter, but to make the example easy, I am displaying it). To go to a particular verse, we can look up on
totalVerseCharsAcc, we add the verse location to the location that we get from
totalChapterCharsAcc. But as you can see this array is linear per verse we don't know where chapter N starts, unless we know the total number of verses from chapter 0 to chapter N - 1. That is where we user the array
totalVersesAcc. This array indicates the total number of verses from all previous chapters.
So how do we know where to start reading if we want to read a particular chapter/verse?
- For chapter 1 verse 1 (the first verse in the book) it is clear that it must start in the offset 0.
- For other chapters, we need to lookup in the array
totalChapterCharsAccto find the start of that chapter
- After we know where to read for that chapter, we still need to skip some verses. To skip some verses we need to lookup in the array
- for first chapter we can just lookup at
- for other chapters, we need to lookup
totalVersesAccfirst and add the verse number to it. Using the example: to lookup the start of chapter 1 verse 2, we need to lookup
totalChapterCharsAcc, which is 7. Then we need to lookup in
totalVersesAccfor index 1, and it is: 3, so we will lookup at
totalVerseCharsAcc[3 + 2](2 is the verse number) and we got the number
4, which means that the start of the verse is 7 (from
totalChapterCharsAcc) + 4 which is 11
- for first chapter we can just lookup at
To find the number of words in the verse, we can substract the numbers in the next
totalVerseCharsAcc with the current one (in this case: 6 - 4 = 2 word numbers).
Once we know the location of the words, we can read the words as numbers, then we can translate the numbers to strings. First we will need to know in which record we should read the word. We can check the number of words in the word index to find which word list we should use. After that we can find the word that we want. Using the previous word list as example:
Word length: 1 Word count: 3 Compressed: false words: # Word 0 a 1 ? 2 ! Word length: 2 Word count: 4 Compressed: false words: # Word 3 an 4 as 5 by 6 us Word length: 2 Word count: 2 Compressed: true words: # Word 7 <4 0> ---> by a 8 <6 1> ---> us?
If we have word number 5, we can look that the first collection of words only contains 3 words, so we go to the next word list which contains 4 words, which means that the word number 5 should be inside there. We can substract with the previous word count list 5 - 3 = 2, so the word is in position 3 in second word list, which is the word
We also check if the table is in fact a compressed table. If
compressed is true, then we need to do lookup on each word in the table
Copyright © 2009-2010 Yohanes Nugroho