These practicals cover a part of the Durbin section on HMM modelling. Through these practical we will show you how to generate your own reference data, measure its properties, estimate a model with known labels, decode data with a provided HMM and train your own HMM using Viterbi The code is written in python with no special complication
All these exercises work on the same principle. A problem file is provided as x.y.fooo<_pb>.py. The purpose of the exercise is then to modify this initial file following the instruction of the exercise. There may be many ways to implement this solution. In order to help you, I have implemented my own solution and I am providing the output of this solution on the reference datasets. This sample output comes as a file named foo_
In order to make sure you have everything under control, you can regenerate different output files with the solution script.
1.1 - Estimate the parameters of this occasionally dishonest casino (ODC) series: 1.1.odc2stat.pb.py
## python 1.1.odc2stat.sol.pyc odc.run > 1.1.odc2stat.sol.output
1.2 - Create a generator that allows you to regenerate a series having the same properties (i.e. similar statistics) as those measured on odc.run. Your generator will take as input 5 numbers: 1.1.odc2stat.pb.py
pFL transition pLF transition, p6 the probability of emitting a 6 by the loaded dice N the run size
Use 1.1.odc2stat.pb.py to check sure your generator is correct
## python 1.2.odc.sol.pyc 0.1 0.2 0.5 1000 > 1.2.odc.sol.output
2.1 - Implement a viterbi decoding allowing you to decode the ODC series you generated in the last practical. Use The Durbin formulation, p56: 2.1.viterbi.pb.py
## python 2.1.viterbi.sol.pyc data.txt model.txt > 2.1.viterbi.sol.output
2.2 - Measure the accuracy of the decoding using the sensitivity, the specificity and the Sen2 as defined in burset1996.pdf. The main difficulty will be to define the false and true positives and the negatives (fp, fn, tp, tn): 2.2.viterbi.pb.py
## python 2.2.viterbi.sol.pyc data.txt model.txt > 2.2.viterbi.sol.output
2.3 - Generate series in which you will change the bias towards 6 from 1.01 up to 5, measure the accuracy of the decoding on these various series. What is the individual effect of each parameter on the accuracy with which you are predicting the loaded state. Can this effect be mitigated (i.e. increased or decreased by increasing the length of the run?). What do you conclude on the suitablility of HMM decoding whhen dealing with biological signal?
3.1 adapt the Viterbi algorithm into a training algorithm. Follow the Durbin formulation (Durbin p65): 3.1.viterbi.pb.py
## python 3.1.viterbi.sol.pyc data.txt model.txt > 3.1.viterbi.sol.output
Follow the Tutorial