Nov 162012
 

I’m working with publicly available RNA-seq data-sets generated from ABI SOLiD 3.0 sequencing. It’s been a real nightmare working with this data. The reason(s) why this is so is better left for another time.

I just got access to ABI’s LifeScope alignment software. It requires data to be in their new(ish) XSQ format. The data I have from the SRA is in the csfasta format, courtesy of their abi-dump script.

I thought conversion would be straightforward – ABI does provide a script to do it as well as a converter (probably the same script) in the LifeScope GUI interface.

But, of course, it wasn’t straightforward. I kept getting this error:

[mcgaugheyd@gryphon raw_data]$ ~/XSQ_Tools/./convertToXSQ.sh --mode Fragment --libraryName na --laneNumber 1 --runStartTime '2012-11-15 10:10:10' --c1 test.csfasta --q1 test.qual -xsqfile test.xsq[16Nov12 07:59:15,343][ERROR]- Invalid bead id: >mbt_solid3.0.1 1_19_223_F3
 com.lifetechnologies.exception.ConverterException: Invalid bead id: >mbt_solid3.0.1 1_19_223_F3
	at com.lifetechnologies.util.AbstractXSQMapper.validateBeadId(AbstractXSQMapper.java:602)
	at com.lifetechnologies.util.AbstractXSQMapper.getNextBeadId(AbstractXSQMapper.java:135)
	at com.lifetechnologies.util.CsfastaXSQMapper.getMaxReadLength(CsfastaXSQMapper.java:311)
	at com.lifetechnologies.converter.SolidFragmentConverter.convertFile(SolidFragmentConverter.java:73)
	at com.lifetechnologies.XSQConverter.runXSQConverter(XSQConverter.java:48)
	at com.lifetechnologies.XSQConverter.main(XSQConverter.java:144)

My csfasta file looks something like this:

#
# Title: Solid_ZebrafishWTA_Zebrafish_WTA
#
>mbt_solid3.0.1 1_19_223_F3
T32223.10221000030202232102222210123302221222030222
>mbt_solid3.0.2 1_19_963_F3
T22132.12310012033102112223111333111221102011133131
>mbt_solid3.0.3 1_19_971_F3
T01221.30021010132130031131311211010102221022213121
>mbt_solid3.0.4 1_19_1057_F3
T10221.10031313310021320301032000012010333121330113
>mbt_solid3.0.5 1_19_1217_F3
T12000.13210313200001011313110002232022200010000000
>mbt_solid3.0.6 1_19_1360_F3
T31013.21202220213111212222011110112302110011103122
>mbt_solid3.0.7 1_19_1407_F3
T10010.10113211310110112213222231210312223211133223
>mbt_solid3.0.8 1_20_584_F3
T20210.31212311131112121120111200111101130211111111
>mbt_solid3.0.9 1_20_717_F3
T32121.01221020231223122301202111123131103010113313

I first thought the issue was with the space in the tag. So I wrote a script to remove it. Didn’t help. Then I tried simply replacing the tag with numbers counting up from 1. Didn’t help. Then I started changing all sorts of random options. Didn’t work. Tried using the LifeScope GUI. I started up the conversion 16 hours ago and it’s still at 0% complete. Great.

After some Googling I found this.

“The Tag_ID is a unique identifier for every tag, which consists of four components: panel_xpixel_ypixel_tagtype. For example, 1_567_321_F3 describes a bead in panel 1 at coordinates 56, 321 (X,Y) with the F3 tag (first tag in a mate pair, only tag in a fragment run).”

Aha! The issue is that the abi-dump script that I used to converted the sra file to csfasta adds the file name to the tag. The XSQ converter isn’t accounting for the possibility. This little python script removes the offending information.

#!/usr/bin/env python

import sys

file = open(sys.argv[1])

for line in file:
    if line[0] == '>':
        line = line.split()
        output = ">" + line[1]
        print output
    else:
        print line[:-1]

 

 

Now it works!

 

Update : 2013-01-28.

Never mind! These files fail when you try to import them into LifeScope (they do not seem to be recognized as XSQ files). They only way I could get these files to work properly was to convert them to the XSQ format using the GUI LifeScope interface.

 Leave a Reply

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

(required)

(required)