Takashi Gojobori speaking at the Genbank 25th Anniversary

Uploaded by NCBINLM on 30.04.2010

Thanks so much, Francis, for a wonderful introduction, and it is great pleasure for me to be here
thanks to Don Lindberg and Dave Lipman, Jim Ostell, and David Landsman, and the other GenBank people.
Here, actually, we had to congratulate all of you on your 25 years of service, and thank you very much
for continuous collaboration with DDBJ as well as EMBL, so this is from all the staff of DDBJ.
So from previous speakers we know that EMBL started in 1980, then GenBank started 1982.
Then DDBJ, the center was established in 1986, so of course, the trial input of nucleotide sequence data
had started a little bit earlier than that, responding to the request from Europe and the US, but we
regard 1986 as the founding year of DDBJ. And there was a tremendous discussion where we should
establish DDBJ in Japan, and they told us National Institute of Genetics. I think partly because
there was Motoo Kimura who proposed the neutral theory of molecular evolution, so he was one
of my mentors, actually. Here is Mishima, which is Tokyo and Osaka, then if you have bullet trains
you can send, it takes only one hour to the west. Then maybe you're welcome to visit DDBJ.
Actually, the international collaboration as everyone is talking, and we had the tremendous privilege
as a name, because it started from D, so for many years we had a kind of consensus that it start
from alphabetical order, but to some point of time, the GenBank EMBL started to complain,
so let us make a much more random question now. Now it's a random question. So, actually,
these are recent years of international collaborators meeting. Like this year we are prepared to
work on all collaborators to Mishima in the coming May. So in addition to collaborators meeting,
also we are going to have Advisory Board meeting by videoconference this year. So as you see
this is kind of a history over the years in nucleotide accumulation, but as you see the first DDBJ
release was in July 1987. The number of entries was only 66. Then the number of nucleotides is 100,000.
It occupied about only .25 percent. So right now DDBJ's share in terms of entries would be like
12 to 15 percent. In terms of nucleotides it's 10 percent. But we are so much grateful to GenBank
and EMBL because they treated us as an equal partner from the beginning even though only .25 percent
of shares. And this is kind of recent activity of annual submission. Like this one is entries. The
lighter one is the nucleotide. The peak was in 2005, but then a little bit reduced, but this is a
kind of Asia-Pacific countries. This is DBDJ particularly from Japanese contribution.
Then like this is GenBank from Asian. This is EMBL. So certainly we are making certain contribution
particularly from Asia-Pacific regions, and we have certain secondary databases such as Genome Search.
In particular, recently we installed a new such system, which is called - sorry - ARSA - you should
remove A - sorry. This is the very first algorithm. It has been called Shunsaku, and as you see
thanks to American and the European initiative Human Genome Sequencing Project was
completed, so Japanese also contributed this project. And this is previous Prime Minister Koizumi,
the previous, previous minister. So here is - Sugawara was attending at the ceremony.
Okay, so like here is the human genome, and also we really made an important role for Rice Genome Project.
This is only chromosome one, but also all genome of Rice Sequencing Project. In collaboration
with RIKEN we also conducted so-called the phantom functional analysis -
annotation of mouse, and actually, when we are asked by Hayashizaki from RIKEN to
make a collaboration, we trained four annotators at the beginning then we send them to RIKEN.
It was starting point of phantom. And these are kind of recent publications. Also we conducted
human full-length cDNA annotation. That was because more than 60 percent of human
full-length cDNA clones have been accumulating in Japan, so with the help of GenBank and
the EMBL, particularly David Lipman and Graham Cameron, we made an annotation jamboree.
That was a very big one by gathering a total of 120 experts in Tokyo, then we made an annotation
of human genes, and this was published in PLoS Biology the second issue. In particular
one of the indiscernible database from invitational database, was h-angel. This is anatomical
or gene expression database, and it contains about 19,000 loci, over 60 different human tissues
and indiscernible lines. If you make a greater category of ten from neural to endocrine, then we can
tentatively identify the genes specifically expressed in a particular greater tissues when a given
clone is expressed of the more than 50 percent of all categories. So like red for neural
and the yellow like a indiscernible , so this is kind of a distribution over the human genome.
And when you look at the tissue-specific genes, which we discovered as 1,479 genes at this time,
in particular we identified neural system's first gene to be about 400. So by comparing with genomes
whose complete genome has been sequenced, we can map each of 389 genes when those genes
are matched. So like utilizing those databases amazingly kind of 1.13 gene indiscernible has emerged just before vertebrates, so tremendous gene might have emerged just before
vertebrates, particularly when you focus on neural assistant specific genes. And quite recently
Japanese government started the so-called Target Protein Project, which is post Protein 3000 Project
and the DDBJ becomes responsible for making platform of this project. In the meanwhile,
I would like to point out, as you know, now next generation sequencing machine is tremendous.
Like this slide was kindly given by George Weinstock from Baylor College of Medicine in Houston
as one of the major sequencing centers in this country. So this is 4-5-4 machine. Then it's almost
comparable to AT conventional A-B-A machine, so it's tremendous advancement. And as you know
apart from ethics program, Jim Watson's personal genome was sequenced last year as of May.
Also, I think one of the speakers in this symposium, Craig Venter's personal genome was also
sequenced like last September. And 1,000 Human Genome Sequencing Project has changed to the
2,000 Human Genome Project. I don't know the latest number but it's certainly increasing.
What I was so much amazed was this press release that was made as of February 11th in this year.
So Pacific Biosciences claims that they are going to sequence human genome in about four minutes,
only four minutes, and this was based on nanobiotechnology. The original paper was published
in PNS in January of this year. So if one single human genome is sequenced completely only by
four minutes, then GenBank, EMBL, DDBJ really have to think of this kind of technology called
output. Moreover, what I'd like to emphasize on this occasion is like a paradigm shift is taking
place like the genome revolution to a sequence revolutions. The current sequencing technology is
utilized, of course, not only genome sequencing but also gene expression such as transcription,
ESTs, SAGE and CAGE. CAGE is a very similar technology to SAGE. Only 20 nucleotides from
transcription stop site can be sequenced by indiscernible together. Like mRNA, functional, non-coding RNAs
can be also obtained particularly from sequencing. Also previously CHIP-chip, CHIP-pet, right now CHIP-seq
by sequencing the binding region of DNA with particular protein, DNA and the protein interaction
can be illustrated by sequencing. Also PPI - protein, protein interaction - to hybrid system
whatever it is to a mammalian, and also in order to know the product, the sequencing is crucial.
Then epigenomics when you like to see indiscernible site , then you can add the additional
nucleotide, then you sequence it, then the complimentary nucleotide would be different one from
original genomic sequence. From here you can identify epigenetic system. So clearly, I think
the usage of sequencing technology has been changing to the variety of biological phenomena,
which is so much important. Taking an example of Japanese project that is Genome Network Project,
it started just four years ago. Then it will be completed within this year. Again DDBJ becomes
responsible for constructing Genome Network information platform. When you are interested in
trans-regulation network, this is kind of elementary system, so coding regions, then they got resistant.
When the transcriptional product is made into a protein, then the protein may have interaction,
then finally it goes to a regulatory system, again coding starts, transcription starts, so this protein might go
to different genes, this is a transcription regulatory network. To understand, this system
is the goal of Genome Network Project. So in particular those are a kind of overview of the
present data accumulation, but as you see, caged data has been accumulated more than 30
median segments, so this is as of December 2007, so currently almost 100-median segment
has been accumulated, then those data will be open from DDBJ. Those data currently open already.
So when those 20 nucleotide sequence data which shows a transcription stop site, TSS, this is the
TSS distribution of the human genome. Interestingly, RIKEN produced a mouse caged data, too.
Therefore, we even though we have to take into account the chromosomal rearrangements,
still we can make a comparison with TSS distribution. And in particular, this is an X chromosome.
We conducted a similar analysis to other chromosomes, then suddenly obtained caged data
shows highest peak of the indiscernible TSS. Then it have just like a gowshan distribution.
However, as you see, those gaussian distribution is minus 100 best pair to plus 100 best pairs
of the centering indiscernible transcription stop site. So that means beyond minus 100
and the plus 100 then it's very, very scarce. That means we know a lot of transcripts are
made over the human genome. In the case of a mouse, 78 percent of non-coded region may be
transcribed, but still we speculate very important transcript may not be so large even though we
know the functional importance of micro RNAs, non-coding functional RNAs. But there's still,
I think, a lot of transcription is taking place over the coding region. So this is David Lipman,
and this is Yoshio Tateno who retired quite recently, but still he continued to work with DDBJ.
And here this is indiscernible of Japanese collaborators and that was a meeting, and those
previous director Dr. indiscernible and this is I think Graham inaudible , and I think
this is Jim and Graham. Maybe this is in the ENBL. Indiscernible . There Graham and indiscernible .
And here, I very thank you so much for continuous collaboration, and we hope GenBank
and the EMBL to continue the collaboration, particularly from the DDBJ site. In addition to
the GenBank and EMBL, I would like to thank Christian Burks and Los Alamos people
when - you know, they very help indiscernible . So personally, I would like to invite David
to the stage - could you - I'd really like to give this indiscernible to you, showing our thanks
from DDBJ. Thanks so much. Applause Okay, so really, I would like to end my talk by
stating that the DNA should be for world peace. Thank you so much for your attention.