Publish your project for free and start receiving offers from freelance contractors in serveral minutes after publication!
500 ₴

Sequence Variation and Alignment (python)

project expired

There are several ways that variation gets incorporated into the genome. One way is for mutations (point mutations, frameshift insertions, and frameshift deletions) to alter the genome, as done previously. In this lab, you will be expanding your mutation generator and detector to detect other types of sequence variation. You will add the ability to detect cross-overs/recombinations. Recombination occurs during meiosis when two parental homologous chromosomes switch whole portions of their DNA. You will incorporate the ability to detect variable copy numbers. This is when portions of the DNA that have repeat patterns get copied or deleted in the genome. You will also include the ability to detect transposon insertion. This is when a transposon sequence, gets inserted into host dna. A signature of a transposon insertion are the inverted repeats on either end of the sequence. Lastly, you will use multiple sequence alignment to align sequences with different variation types.  The steps for the assignment are as follows:  

  • Create a SequenceVariation class (or different name) which takes as an input a String of DNA and generates different sequence variations to the inputted sequence. Check it only contains the correct nucleotides (and N). Create a series of methods cause specific types of sequence variation to occur to the inputted sequence. Start with an input sequence seq1 = ‘GCACGTATTGATTGGCCTGTACCTA’.
  • Use your previously written mutation code to create a method that takes in a sequence and an integer n as inputs and outputs a new sequence (seq2) with n random point mutations. Check n is not greater than the sequence length. Test the method using seq1 and n = 10 as inputs.
  • Create a method that takes in two sequences and outputs two new sequences where a cross-over/recombination has occurred between the two sequences at a random location. Test the method using seq1 and seq2 as inputs and outputs seq3 and seq4.
  • Create a method for copy number variation that takes in a sequence, finds a repeated sequence of nucleotides (of length 3 or greater) next to one another and inserts a third copy into that sequence. Test the function by inputting seq1 and outputting seq5. Create a similar method that removes one of the copies and test it using seq1 and outputting seq6.
  • Create a transposon method that takes in a sequence of DNA and a transposon sequence (one that has inverted repeats at the end of it) and inserts the transposon into the original sequence. The transposon should insert itself at the location of its inverted repeat in the original sequence. Test the method using seq1 and tseq = ‘ACGTGGTTGCACGT’ and outputting seq7.
  • Create a SequenceVariationDetection class (or different name) which takes in two inputs as Strings of DNA and detects whether the different sequence variations discussed has occurred (point mutations, frameshifts, cross-overs, copy variations, and transposon insertions).
  • Calculate the similarity score between the original sequence seq1 and all other sequences (seq2-seq7).
  • Use the multiple sequence alignment algorithm to align all seven sequences.

Максимально простой код без регулярных выражений, декораторов и т.д.

  1. 4 daysconcealed
    Алексей Романко

    C задачей рботы с генными последовательностями сталкивался, все эти операции делал на джаве. Но на коленках задача не делается, нужно написать все фунцеции, а также тесты под них, дабы уверится что все работает. Суммарное количество кода 300-400 строк. Качественная работа займет 2-4 дня

    Ukraine Kyiv | 15 March at 10:01 |

Ксения Вожова
Russia Saint-Petersburg
Project published
15 March at 03:22