I have a tree constructed from a MSA (done with MAFFT) of around 400 protein sequences. I am using ETE3 to compute a tree using the following pipeline: none-trial01-pmodeltest_soft_ultrafast-phyml_default.
The resulting tree looks good and makes sense biologically (from what we know), however, the distance between some proteins is bigger than 1 for a lot of them. From what I understand, the scale is “the expected residue substitution per site” and this is the part I don’t understand.
See on the cropped tree (original is much bigger), the top branch and the bottom one have distances bigger than one. So this would mean that on average one expects to have differences at every amino acid? This does not makes sense as these proteins are supposed to be related and are aligning fine on the MSA.
Thank you very much for your help!
ps : see biostar post