Why are medial syllables in poetry less predictable than final/initial syllables.

1. Hypothesis 1: because lexical stress is less predictable on medial syllables. 
This is the analysis of poems based on stressed and unstressed syllables. 
Each graph shows the frequency with which the stress falls on a particular line in this language. 
The stresses are computed based on orthographic transcription using bin/profile.py

The profiles look language dependent: see <lang>_profile.png

Is it so for acoustics?

2. To create the histogram for prose/poetry:

/predict_next/bin/hist_lang.py

There is a difference which requires some further thought. 

3. To create the histograms for poetry across langs:
bash/hist_lang.sh

The pauses are not between the lines!

bin/count_pauses.py - counts the pauses and returns a plot

Mean (silences-1)/line:
recVy_HTKmfcc
Russian 0.673660714286 - no consistent pauses between lines
Greek 0.944444444444 - silences=lines
English 1.33116319444 - silences within lines

Looks line in English final sylls are less predictable, because they are more likely to be in the middle of the line. 

recVapeV_HTKmfcp:
Russian 0.601530612245
Greek 0.813131313131
English 0.908914728682

RESULT: In all three langs there are generally less pauses then lines, so some medial syllables may be in fact line final. 

How many syllables in 1 'V' label?

recVy_HTKmfcc
Russian 1.58919210999
Greek 1.53259359585
English 1.41758613738

recVapeV_HTKmfcp
Russian 1.32003915666
Greek 1.38180643744
English 1.29576044342


4. To create histograms showing the effect of number of segments:
text_analysis/bash/nn_lang.sh

In Russian and Greek there is a steady increase up to 8 segments. In Greek prose also becomes more predictable.
In English the peak is at 6 segments and for 7 and 8 the numbers decline. Also, In G and R final and initial are equally predictable. In English only final are predictable. 
French: same as greek and Russian
Chinese: Final predictable for 6-8 segments, initial - 2-5 segments

python bin/count_V.py <segm>

recVy_HTKmfcc 
Syllables/V Russian 1.58919210999
Syllables/V Greek 1.53259359585
Syllables/V English 1.41758613738
V between S Russian 9.86250408497
V between S Greek 5.59442323601
V between S English 4.792287613

recVapeV_HTKmfcp
Syllables/V Russian 1.32003915666
Syllables/V Greek 1.38180643744
Syllables/V English 1.29576044342
V between S Russian 13.645934544
V between S Greek 7.21052524916
V between S English 6.76726293732

In Russian in most cases several lines are combined. In English generally one line = one ips. Greek is in the middle. 

5. poetry_prop.sh = properties in poetry and prose

Poetry:

English: all properties more predictable in final position, followed by initial followed by medial. 
Loudness, frication and duration are more predictable in medial position.

Russian: all props most predictable in final and initial position.
In medial position: again loudness, frication and speechrate (0.28)

Greek, French same pattern as Russian.

Chinese: more bimodal distribution for F and I. 
Medial - only loudness more predictable. 

Prose: 

English:
medial - less normal distribution. The same properties are most predictable. 
final: speechrate, loudness and duration, but not frication are most predictable. 
initial: multimodal distributions, loudness seems most predictable. 

Russian:
initial: loudness and speechrate. 
medial: same as poetry.
final: speechrate and loudnessm, partially duration

Greek, French, Mandarin - medial same as poetry, final - similar to Russian and English. Speechrate and loudenss are more predictable than the rest. 
Initial - mainly loudness-related properties. 

Overall: In poetry all properties are more predictable in F and I, in medial position loudness, frication and duration seem to be most predictable. The same properties are also predictable in prose in medial position, but in initial and final position the most predictable properties seems to be loudness and speechrate. 

6. By number in prose (updated nn_lang.sh):

The center of distribution  for medial seems to be higher than for final or initial, but there are more final 'outliers' towards higher end. 

