- Use seq_reformat in order to decrease the redundancy in the the following Multiple Sequence Alignment (*). Keep the 10 most informative sequences.
- Re-do the extraction while making sure that you keep the two structures 1qq4A and 1svpA included in the dataset.
- Build a dataset where all the sequences have less than 50% identity.
- Check that the dataset is what you think it is
|