| Neural Networks |
| Neural Networks |
|
NEURAL NETWORKS - SPECIFICATIONS GENERAL I use a back-propagation network primarily for simplicity and trackability of the results. A single hidden layer is used, with a standard width of one-third of the combined inputs and outputs (based on experience). Learning is managed for a 20% error rate, which for most Bible books translates to about 50-100 learning trials. The choice of 20% was based on trial and error in how many non-conforming blocks would be identified (I was looking for a managable 5-10% range). The network goal is descriptive; not predictive (the latter being the normal goal). Therefore the criticality is a standardizing of the runs as much as possible, to allow comparability. INPUTS / OUTPUTS AND COLUMN SPECIFICATION Input and output files are always the same, since the goal is descriptive, to see how the internal elements inter-relate. I tested low and high detail files with the greek and hebrew, and found that about 50-60 elements is about optimal. Too many elements quickly over-learns; fewer elements increases learning difficulty quickly on some books. Single chapter books were not run since they over-learn quickly. Whether for vocabulary or syntax, the first 50 elements in the input array are used for a 'raw' monitoring of high-use values. In vocabulary, it's the top strongs numbers for the whole testament. For syntax, it's the high frequency morph-tag attributes. The remainder of the inputs are five frequency levels for the raw elements (strongs or literal morph-tags). ROW SPECIFICATION Even though the Bible is in the form of 'verses', the original language format is 'run-on', basically a long listing of words. To avoid the influence of arbitrary verse-breaking, rows are blocks of 25 words (columnized). So, for example, 25 words are read in, classified and then columnized to create the learning row. To avoid issues with arbitrary block-breaks, each new row begins at half-way through the previous block (instead of at the end of the previous block). This results in any given word being included in two rows. In the later re-versification, the two row occurances are averaged. I used '25', as the word block size, since this tends to overlap about two 'traditional' verses. RUN SEQUENCE DESCRIPTION (1) Versified data is converted into word-blocks and columnized into rows. (2) Each book is fed into a neural network as above, running vocabulary and syntax separately. This includes 'real' books and 'pseudo' books (eg Isaiah1). This creates two basic network files per book, including weights, connections, etc. Also included are the basic conformity values (eg a data element consistent or not with its environment). (3) For each real book, all other books (real and pseudo) are cycled through the saved network file for that book, to create 'what-if' results, for that book. The combinatorial here is quite steep. For each data element, there's the actual value, plus up to 60 what-if values, representing the alternative writers. (4) In order to use the information in Bible software, re-versification from the blocks is necessary. There is some data loss in the reversification, since averaging occurs between overlapping row-blocks. (5) Data is summarized to word, verse, chapter and book levels for data queries. OTHER DATA RETAINED (1) Neural dither-data is kept in raw files for later analysis of inter-data drivers and identities. This is important in comparing the base-level files (eg Byzantine vs NA27), especially by book. (2) Histogram data for data elements. This is sometimes helpful for comparing books and tracking down oddities. (3) Summarized input-data files and rolled up by word, block, chapter and book for quick attributing summarization for the neural runs. |
| Copyright ©, 2007, dmbarnhart |
| Back |