The pcfg parser can be trained, tested and evaluated using "make tests" in the main directory. It uses a cleaned up version of the tiny Hindi Treebank (in the tests/ directory). Additional documentation in the text file included with this mail. Anoop ________________________________________________________________________ HOWTO.training.txt ________________________________________________________________________ First install cbnlp-work.tar.gz ================================ - Install the programs to the bin directory: cd cbnlp-work make install - The tests are run in the tests/ directory cd tests Optional Step (not recommended, instead use the files created in tests/): ========================================================================= - Convert the original Treebank annotation into full dependency trees (stored in a Penn Treebank style notation) ../bin/tbc ../data/full_final.anncorra > tbc.out For full words (no stemming, also not recommended): ==================================================== - Convert the trees into CFG rules ../bin/convtrees -r anncorra_full.trees > convtrees.out - Use ML estimates to create PCFG ../bin/makepcfg convtrees.out > makepcfg.out - Parse using PCFG (enter sentences at STDIN) ../bin/pcfg_parser makepcfg.out Using ad-hoc Stemming (last two chars of word used) =================================================== cd to cbnlp-work, and then: bin/convtrees -n -r -h=@ tests/anncorra_stems.trees > tests/anncorra.cfg bin/convtrees -n -s tests/anncorra_stems.trees > tests/anncorra.sents bin/makepcfg tests/anncorra.cfg > tests/anncorra_grammar.pl time bin/pcfg_parser tests/anncorra_grammar.pl < tests/anncorra.sents > test s/anncorra_baseline.output 2> tests/anncorra_baseline.stderr bin/evalb -p tests/hindi.prm tests/anncorra_stems.trees tests/anncorra_basel ine.output Or do "make tests" in main directory