[QUESTION] Sequences being deleted when collapsing multi-line fasta into single line fasta?


Hello, I am new to bioinformatics and I am trying to edit a file from NCBI to collapse multi-line sequences into single lines. I found quite a few one-liners to do this, but for some reason whenever I run the command I end up with many sequences removed and replaced with blank lines starting with “@”. The command I used was:

awk 'BEGIN{RS=">";FS="n"}NR>1{seq="";for (i=2;i<=NF;i++) seq=seq""$i; print ">"$1"n"seq}' file.fa > collapsed.fa

Which gives an output like this: https://imgur.com/a/ap3Ue13

Does anyone know why this might be happening or have any recommendations as to how to fix/prevent this?

Thanks in advance!

Articles You May Like

Certain microbes may reduce allergy-like reactions in many people
Powerful particles and tugging tides may affect extraterrestrial life
Is A Science DMZ The Key To Solving Poor Data Utilization?
Gene therapy restores immunity in infants with rare immunodeficiency disease
A universal framework combining genome annotation and undergraduate education

Leave a Reply

Your email address will not be published. Required fields are marked *