Frequency of K-mers with mismatches

Posted On:02.28.2017
Posted by admin ,

I have been doing this amazing course on Stepik about bioinformatics for the past couple of weeks now and I wanted to share another program that I made and a little bit about how it worked.

The idea is to find a segment of k length of the k-mer desired inside a series of nucleotides with d number of mismatches. This program eventual goal is to help identify possible entrance points for duplication to begin near a genomic origin of replication.

The way I have designed this program breaks it into three parts. The first being a generation of all possible k-mers in the length of the genome. The second part is to begin comparing the entire genome line by line with all possible k-mers with a set amount of differences allowed. The final part figures out the most common k-mer’s within the genome.

This was a difficult problem to solve for me as I had to learn how to use itertools – a set of functions in python specifically made for complicated iterations.

Take a look at the github link to take a look at the whole program!

Github Link: here