I have been doing this amazing course on Stepik about bioinformatics for the past couple of weeks now and I wanted to share another program that I made and a little bit about how it worked.
The idea is to find a segment of k length of the k-mer desired inside a series of nucleotides with d number of mismatches. This program eventual goal is to help identify possible entrance points for duplication to begin near a genomic origin of replication.
The way I have designed this program breaks it into three parts. The first being a generation of all possible k-mers in the length of the genome. The second part is to begin comparing the entire genome line by line with all possible k-mers with a set amount of differences allowed. The final part figures out the most common k-mer’s within the genome.
This was a difficult problem to solve for me as I had to learn how to use itertools – a set of functions in python specifically made for complicated iterations.
Take a look at the github link to take a look at the whole program!
Github Link: here