Work-in-Progress by
Brad Lyon
A few notes are
here
and
here
Text Analysis Bigram Explorer (Draft)
Pick a space-delimited file from your system, or choose one of the available files in the drop-down, and the
bigrams
in the file will be sorted by their calculated Log-Likelihood Ratio.
Implementation based on
Apache Mahout's
LogLikelihood.java
. See
Ted Dunning's blog post
for background info.
Preset Data Files (most from
Project Gutenberg
)
Alice in Wonderland
Art of War
The Brothers Karamazov
The Canterbury Tales
Cornell MED from Classic3 Dataset
Divine Comedy
Dracula
Frankenstein
Grimm's Fairy Tales
Hamlet
A.E. Housman's Last Poems
Leaves of Grass
Les Miserables
Magna Carta
Longfellow's Works
Moby Dick
Peter Pan
Edgar Allan Poe's Works
Pride and Prejudice
Shakespeare (big)
Sherlock Holmes
Tom Sawyer
Ullysses
The Hundred Best English Poems (1904)