[Gramene] Re: TAIR9 vs TAIR10 - mixup on naming, was: RE: Old versions?

Spooner, Will wspooner at cshl.edu
Wed Jun 29 12:26:00 EDT 2011


Hi Ann, Jonathan,

Genome assembly names cause us endless problems, and we only look after a few of them!  

Ann; you would like us to use 'A_thaliana_Jun_2009' as the assembly name? If so I will patch the databases, and we will also need Dan Staines to do the same over at Ensembl Genomes.

Jonathan; as an 'independent' who has an interest in such things, what would your preference for assembly naming conventions be? Are there standards groups already considering this problem?

Best,

Will

On 28 Jun 2011, at 19:10, Ann Loraine wrote:

> Okay, it sounds like gramene needs to correct this. 
> 
> Probably the person at gramene who downloaded the Arabidopsis data wasn’t aware that TAIR10 is really the same assembly as TAIR9. A lot of people get confused about this. Or, they might have checked the actual data and discovered a difference, despite what the documentation says.
> 
> Jonathan, since you are already providing validation services, could you also provide some additional sanity checking of the data?
> 
> In this case, you could write something that determines whether two reference assemblies with different names are in fact the same.
> 
> In addition, you could write something else that checks that two DAS sources that serve the same genome sequence assemblies are indeed delivering the same data. Confusion could easily arise if one source is delivering masked sequence data but another one isn’t. 
> 
> Best,
> 
> Ann
> 
> On 6/28/11 12:13 PM, "Jonathan Warren" <jw12 at sanger.ac.uk> wrote:
> 
>> Hi
>> 
>> The DAS Registry automatically takes it's information from the gramene DAS sources document http://dev.gramene.org/gramenedas/das/sources. If the reference coordinates are the same then the coordinate system should remain as TAIR 9. However if there are new sequences added then new entry_points need to be added to a the DAS reference sources. If the gramene sources document reverts to TAIR 9 then the registry will automatically reflect this. As all the data sources using the TAIR 10 coordinate system are gramenes no other problems should arise.
>> 
>> On 28 Jun 2011, at 15:26, Loraine, Ann wrote:
>> 
>>> 
>>>  
>>> 
>>> Greetings all,
>>>  
>>>  There is some confusion about the meaning of TAIR9 versus TAIR10.
>>>  
>>>  TAIR9 is both a genome assembly release and a genome annotation release, meaning: it includes both new sequences for Arabidopsis chromosomes and some revised gene models.
>>>  
>>>  TAIR10 is a genome annotation release only. The chromosomes did not change from TAIR9 to TAIR10 according to this README file:
>>>  
>>>  ftp://ftp.arabidopsis.org/home/tair/Sequences/whole_chromosomes/README_whole_chromosomes.txt
>>>  
>>>  quote*Please note that the chromosome files have NOT CHANGED FROM TAIR9 to TAIR10*unquote
>>>  
>>>  Thus, the gene structure annotations released in TAIR10 are using the same reference sequence as the gene structure annotations released with TAIR9.
>>>  
>>>  I've noticed the DAS registry contains both TAIR9 and TAIR10 as reference assembles and that some data sets (looks like alignments) are referencing TAIR10 chromosomes. This is incorrect as there is no TAIR10 genome assembly.
>>>  
>>>  Also, I would like to suggest using a different term for the TAIR9 assembly in order to avoid future confusion. Please use the term: A_thaliana_Jun_2009. This is what we are using to designate this genome assembly in the Integrated Genome Browser QuickLoad and DAS systems. It would be great if the DAS registry could either recognize this as a synonym for TAIR9 or use this term instead so that people will not continue to be confused about the meaning of TAIR*.
>>>  
>>>  Best wishes,
>>>  
>>>  Ann Loraine
>>>  ____________________
>>>  Ann Loraine
>>>  Associate Professor
>>>  Dept. of Bioinformatics and Genomics, UNCC
>>>  North Carolina Research Campus
>>>  600 Laureate Way
>>>  Kannapolis, NC 28081
>>>  704-250-5750
>>>  www.transvar.org <http://www.transvar.org> 
>>>  
>>>  
>>>  
>>>  -----Original Message-----
>>>  From: Jonathan Warren [mailto:jw12 at sanger.ac.uk]
>>>  Sent: Tue 6/28/2011 8:46 AM
>>>  To: gramene at gramene.org
>>>  Subject: Old versions?
>>>  
>>>  Hi
>>>  
>>>  Do you still host old versions of gramene? if so where are they?
>>>  More specifically the DAS sources for say TAIR 9...7etc rather than 
>>>  TAIR 10? If they exist I can register them with the DAS Registry and 
>>>  they maybe useful for the DAS community and researchers?
>>>  
>>>  Thanks in advance
>>>  
>>>  Jonathan Warren
>>>  Senior Developer and DAS coordinator
>>>  blog: http://biodasman.wordpress.com/
>>>  jw12 at sanger.ac.uk
>>>  Ext: 2314
>>>  Telephone: 01223 492314
>>>  
>>>  
>>>  
>>>  
>>>  
>>>  
>>>  
>>>  
>>>  
>>>  --
>>>   The Wellcome Trust Sanger Institute is operated by Genome Research
>>>   Limited, a charity registered in England with number 1021457 and a
>>>   company registered in England with number 2742969, whose registered
>>>   office is 215 Euston Road, London, NW1 2BE.
>>>  
>>>   
>>>  
>>>  
>> 
>>  
>> Jonathan Warren
>> Senior Developer and DAS coordinator
>> blog: http://biodasman.wordpress.com/
>> jw12 at sanger.ac.uk
>> Ext: 2314
>> Telephone: 01223 492314
>> 
>> 
>> 
>> 
>> 
>> 
>>  
>> 
>> 
>> --  The Wellcome Trust Sanger Institute is operated by Genome Research  Limited, a charity registered in England with number 1021457 and a  company registered in England with number 2742969, whose registered  office is 215 Euston Road, London, NW1 2BE. 
>> 
> 
> -- 
> Ann Loraine
> Associate Professor
> Dept. of Bioinformatics and Genomics, UNCC
> North Carolina Research Campus
> 600 Laureate Way
> Kannapolis, NC 28081
> 704-250-5750 (office)
> http://www.transvar.org
> 
> _______________________________________________
> Gramene mailing list
> Gramene at brie4.cshl.edu
> http://mail.gramene.org/mailman/listinfo/gramene

William Spooner
wspooner at cshl.edu







More information about the Gramene mailing list