3. Tutorial

_images/diagram.png

3.1. Prepare your prerequisites

  1. install dependencies
  2. prepare outputs from previous MAKER run
  3. run detect_fused_gene.py
  4. run extract_to_maker.py
  5. review results or trouble shooting

3.2. Example - rice chromosome 9

  1. detect fused gene candidates:

    python bin/1_detect_fused_gene.py -i resources/rice_chr9_transcript.fa -g resources/rice_chr9.gff -n 30 -p chr9_S1R1
    
  2. split fused gene and locally re-annotation:

    python bin/2_extract_to_maker.py -i resources/rice_chr9.fa -d chr9_S1R1/gff.db -c chr9_S1R1/break_coordinates.brk \
    -t resources/rice_chr9_ctl/ -n 28 -p chr9_S2R1
    
  3. drop fused gene entries from original transcript and protein sequence fasta files:

    mkdir chr9_post_process
    
    python bin/3_process_fasta_file.py -i resources/rice_chr9_transcript.fa -g chr9_S2R1/gene_features_wo_fused.gff  \
    -o chr9_post_process/rice_chr9_transcript_drop_fused.fa
    
    python bin/3_process_fasta_file.py -i resources/rice_chr9_protein.fa -g chr9_S2R1/gene_features_wo_fused.gff  \
    -o chr9_post_process/rice_chr9_protein_drop_fused.fa
    
  4. run maker standard on defused gene set. MAKER standard is a procedure to get rid of low-quality gene models:

    python bin/4_run_maker_standard.py -t chr9_S2R1/merged_defused_transcripts.fa -p chr9_S2R1/merged_defused_protein.fa \
    -g chr9_S2R1/merged_defused.all.mod.gff -a Pfam/Pfam-A.hmm -o chr9_post_process/
    
  5. generate AED score improvement plot:

    python bin/5_generate_report.py -b chr9_S1R1/break_coordinates.brk \
    -i resources/rice_chr9.gff \
    -g chr9_post_process/merged_defused.all.mod.std.gff \
    -o chr9_post_process/chr9
    
  6. final assembly:

    mv chr9_S2R1/gene_features_wo_fused.gff chr9_post_process/
    
    cd chr9_post_process
    
    cat gene_features_wo_fused.gff merged_defused.all.mod.std.gff > chr9_V1.gff
    cat rice_chr9_transcript_drop_fused.fa merged_defused_transcripts.std.fa > chr9_V1_transcripts.fa
    cat rice_chr9_protein_drop_fused.fa merged_defused_protein.std.fa > chr9_V1_protein.fa
    
  7. DONE Final outputs:

    chr9_V1.gff
    chr9_V1_transcripts.fa
    chr9_V1_protein.fa