[QUESTION] Sequences being deleted when collapsing multi-line fasta into single line fasta?


Hello, I am new to bioinformatics and I am trying to edit a file from NCBI to collapse multi-line sequences into single lines. I found quite a few one-liners to do this, but for some reason whenever I run the command I end up with many sequences removed and replaced with blank lines starting with “@”. The command I used was:

awk 'BEGIN{RS=">";FS="n"}NR>1{seq="";for (i=2;i<=NF;i++) seq=seq""$i; print ">"$1"n"seq}' file.fa > collapsed.fa

Which gives an output like this: https://imgur.com/a/ap3Ue13

Does anyone know why this might be happening or have any recommendations as to how to fix/prevent this?

Thanks in advance!

Articles You May Like

FunMappOne: a tool to hierarchically organize and visually navigate functional gene annotations in multiple experiments
A hierarchical sparse coding model predicts acoustic feature encoding in both auditory midbrain and cortex
MultiDomainBenchmark: a multi-domain query and subject database suite
J. Marshall Shepherd: How Does Bias Shape Our Perceptions About Science?
Antibody-mediated biorecognition of myelin oligodendrocyte glycoprotein: computational evidence of demyelination-related epitopes

Leave a Reply

Your email address will not be published. Required fields are marked *