Brief instructions for using Panin's Panic. Brendan McKay, bdm@cs.anu.edu.au 13 September 1997 panin is a program for finding "numerical patterns" in English text. It is named in honour of Ivan Nikolayovitch Panin, who claimed to have discovered amazing patterns in the text of the Bible. Compilation. panin uses 64-bit arithmetic. Three options are provided automatically: (1) Dec Alpha machines, which have native 64-bit instructions. Use cc -o panin panin.c (2) The GNU compiler gcc, which has a "long long" type. Use gcc -o panin panin.c (3) Microsoft Visual C++ compilers, version 2.0 or later. Before you compile, you need to make a choice about letter values. panin can use two types of values for letters: numerical values: A=1 B=2 ... I=9 J=10 K=20 ... R=90 S=100 T=200 ... Z=800 place values: A=1 B=2 C=3 ... Z=26 Near the start of panin.c, are these two lines: #define USEPLACEVALS 0 #define USENUMVALS 1 Edit them as you wish (0 = don't use, 1 = use). It is ok to use either, neither, or both. Input data. panin can correctly read most English text. The only important exception is that it assumes "-" continues the current word. For example "first-year" is corrrectly regarded as one word, but "then---but" is also regarded as one word. Break such strings with a space, for example "then--- but". panin can handle five types of English word, but is not smart enough to know what they are. If you want, mark the input text by inserting these characters at the start of a word: + a common noun ^ a proper noun # a pronoun @ an article (a, an, the) ~ a verb For +example, these +sentences ~are marked correctly. Don't ~forget that #you ~need to ~remove other +occurences of @the special +characters. ~Note how @the +concept of +verb ~is @a +bit difficult to ~make precise. Lexical items. panin knows about "letters", "words" and "sentences". letter = abcd...zABC...Z word = a sequence of letters or - or ', including at least one letter (note: this means a number like "37" is not counted as a word. sentence = a sequence of at least one word followed by one of these characters: .?! A "feature" is a conjunction of zero or more "atomic features". The atomic features known to panin are listed below. Note that "c" usually means "letter". ec = a letter in even position in the whole passage oc = " " " odd " " " " " ecw = a letter in even position in a word ocw = " " " odd " " " " ecs = a letter in even position in a sentence ocs = " " " odd " " " " ew = a word in even position in the whole passage ow = " " " odd " " " " " ews = a word in even position in a sentence ows = " " " odd " " " " es = a sentence in even position in the whole passage os = " " " odd " " " " " fw = the first word in the whole passage lw = " last " " " " " flw = the first or last word in the whole passage fws = the first word in a sentence lws = " last " " " " flws = the first or last word in a sentence fs = the first sentence ls = the last sentence fls = the first or last sentence fcs = the first letter in a sentence lcs = " last " " " " flcs = the first or last letter in a sentence elsc = a sentence with an even number of letters olsc = " " " " " " " " elsw = a sentence with an even number of words olsc = " " " " " " " " elw = a word with an even number of letters olw = " " " " odd " " " vowel = a vowel (a,e,i,o,u) cons = a consonant (any letter except a vowel) wsv = a word starting with a vowel wsc = " " " " " consonant wev = a word ending with a vowel wec = " " " " " consonant ssv = a sentence starting with a vowel ssc = " " " " " consonant sev = a sentence ending with a vowel sec = " " " " " consonant cnoun = a common noun pnoun = a proper noun noun = a noun (either cnoun or pnoun) art = an article verb = a verb pronoun = a pronoun -cnoun = a word which is not a common noun -pnoun = a word which is not a proper noun -noun = a word which is not a noun -art = a word which is not an article -verb = a word which is not a verb -pronoun = a word which is not a pronoun sscn = a sentence starting with a common noun sspn = a sentence starting with a proper noun ssn = a sentence starting with a noun ssart = a sentence starting with an article sspr = a sentence starting with a pronoun ssvb = a sentence starting with a verb General features are conjunctions (intersection, logical AND) of atomic features. For example "flcw ssv" means "the first and last letters of the words of sentences that start with a vowel". Running the program. Suppose the following lines are in the file example.txt. For +example, these +sentences ~are marked correctly. Don't ~forget that #you ~need to ~remove other +occurences of @the special +characters. ~Note how @the +concept of +verb ~is @a +bit difficult to ~make precise. Now you can run panin like this: panin example.txt It comes back with the prompt "p maxok maxext forced = ". Enter "7 1 1", and it will reply: ALL V11781=7*1683 cons #98=7*7*2 ew #91=7*13 lw #7 ows #84=7*12 lcs V805=7*115 fs #42=7*6 art #7 V427=7*61 -art V11354=7*1622 Values are suffixed by a character indicating what it means: # = the number of letters V = the total numerical value of the letters P = the total place value of the letters Note that there is no "number of words" or "number of sentences" count. You have to infer those from the number of first letters of words or sentences (features fcw or fcs). The above output indicates that the total numerical value is 11781=7*1683 ("ALL" is not a feature name, it means ""). Also, the number of consonants is 98=7*7*2, the number of letters in words having even position in the passage is 91=7*13, the last word has 7 letters, and so on. The meanings of the input commands for panin are as follows. Say that the "size" of a feature is the number of atomic features it contains. p = the number you are interested in multiples of; maxok = the greatest size of features to test exhaustively; maxext = the greatest size of features to test recursively; forced = a feature that must be included Features with maxok < size <= maxext, are only tested if they are extensions by one extra atomic feature of some feature that is already included. For example, enter "7 1 3 cons". It will reply cons #98=7*7*2 cons ew V5229=7*747 cons fws #7 ecw cons ew V1785=7*255 ocw cons ew V3444=7*492 lcw cons ew V1610=7*230 flcw cons fws V350=7*50 cons ew lws #7 cons ew fs #14=7*2 cons fws wec V350=7*50 cons fws sec V350=7*50 cons ew olw V2534=7*362 cons ew elw V2695=7*7*55 cons fws elw V504=7*72 cons fws elsc V350=7*50 cons ew -art #56=7*8 cons ew verb #7 V644=7*92 cons ew -verb V4585=7*655 cons fws -verb V350=7*50 The input "cons" says that only features including "cons" are to be considered. The value "1" says to only exhaustively test features of size up to 1 (in this case, the only such feature is "cons"). The value "3" says that features of size 2 can be included if they are extensions of "cons", and features of size 3 can be included if they are extensions of features of size 2 that exhibit 7. By comparison, "7 3 3 cons" would include all features of size up to 3 that include "cons" --- there are 193 of them so we won't list them. Some effort is made to not present redundant features. For example, it would not present "cons fws wec" if all consonants in the first words of sentences were necessarily in words starting with consonants. However this pruning process is not perfect and sometimes duplicates appear (especially for very short passages). Hints. The most important rule in presenting the data is to leave out almost everything. panin can easily produce tens of thousands of features, but you need to look around to find some features with a regular pattern. This can take some time but usually is successful. Especially look for recursive splits of one feature into two features with the same type of result (#, V or P). In the above we see V(cons ew) = V(cons ew ecw) + V(cons ew ocw) = V(cons ew elw) + V(cons ew olw) = V(cons ew verb) + V(cons ew -verb). To look for features involving the interesting number more than once, just use a power of the base. For example 7*7*7*7 = 2401. Entering "2401 3 3" gives us oc ews wsc V2401 ews ocs wsc V2401 wsc elw -verb V2401 Check that the first two are not really the same before presenting both. (I think they are the same, because the sentences other than the last have an even number of characters.) The program gets very slow if you ask for features of large size. That is not the fault of the program; consider that there are 61 ataomic features, and so 61, 1830, 35990, 521855, 5949147, 55525372 possible features of size 1, 2, ..., 6.