Trypanosoma cruzi - Programa de Pós

Transcrição

Trypanosoma cruzi - Programa de Pós
Universidade Federal de Minas Gerais
Instituto de Ciências Biológicas
Departamento de Bioquímica e Imunologia
Tese de Doutorado
Genômica Comparativa para Identificação de
Fatores de Virulência no Trypanosoma cruzi
Rondon Pessoa de Mendonça Neto
Orientadora: Profa. Dra. Santuza Maria Ribeiro Teixeira
Co-orientadora: Profa. Dra. Daniella Castanheira Bartholomeu
Novembro de 2013
Rondon Pessoa de Mendonça Neto
Genômica Comparativa para Identificação de
Fatores de Virulência no Trypanosoma cruzi
Tese submetida ao Programa de PósGraduação em Bioinformática da
Universidade Federal de Minas
Gerais como requisito parcial para a
obtenção do título de Doutor em
Bioinformática
Orientadora: Profa. Dra. Santuza Maria Ribeiro Teixeira
Co-orientadora: Profa. Dra. Daniella Castanheira Bartholomeu
Universidade Federal de Minas Gerais
Instituto de Ciências Biológicas
Departamento de Bioquímica e Imunologia
Belo Horizonte/MG - Brasil
Novembro de 2013
“Por mais longa que seja a caminhada,
o mais importante é dar o primeiro passo.”
Vinícius de Moraes
Ao Lucas
Sumário
Agradecimentos............................................................................................. I
Lista de abreviaturas..................................................................................... II
Lista de figuras, tabelas e anexos...........................................................
IV
Resumo......................................................................................................... VII
Abstract......................................................................................................... IX
1. Introdução................................................................................................ 1
1.1 Trypanosoma cruzi e a Doença de Chagas...................................... 1
1.2 Variabilidade Populacional do Trypanosoma cruzi............................ 4
1.3 Genoma do Trypanosoma cruzi......................................................... 5
1.4 Genômica Comparativa de Tripanosomatídeos................................ 9
1.5 O Clone CL-14................................................................................... 13
1.6 Expressão Gênica em Tripanosomatídeos........................................ 14
2. Objetivos.................................................................................................. 19
2.1 Objetivo Geral.................................................................................... 19
2.2 Objetivos Específicos......................................................................... 19
3. Materiais e Métodos................................................................................. 20
3.1 Sequenciamento do DNA Nuclear e Mitocondrial............................. 20
3.2 Pré-processamento e Pré-análises das Sequências......................... 21
3.3 Amplificação por PCR do DNA Nuclear e Mitocondrial de Cepas do
Trypanosoma cruzi............................................................................ 23
3.4 Análises Filogenéticas....................................................................... 24
3.5 Montagem do Genoma Mitocondrial.................................................. 26
3.6 Determinação do Número de Cópias de Famílias Multigênicas........ 27
3.7 Determinação de Identidade entre CDS............................................ 29
3.8 Análises de genes Trans-sialidases com Repetições SAPA............. 30
3.9 Sequenciamento do Transcriptoma e mapeamento.......................... 32
4. Resultados............................................................................................... 35
4.1 Sequenciamento e Montagem do Genoma do Clone CL-14............. 35
4.2 Análises Filogenéticas....................................................................... 42
4.3 Montagem
e
Análise
do
Genoma
Mitocondrial
de
CL-
14....................................................................................................... 52
4.4 Análise Comparativa de Famílias Multigênicas................................. 57
4.5 Análises das diferenças nos genes codificando Trans-sialidases com
Repetições SAPA em CL Brener e CL-14......................................... 66
4.6 Sequenciamento e Mapeamento do Transcriptoma de CL-14.......... 75
5. Discussão…………………………………………………………….............. 80
6. Referências Bibliográficas........................................................................ 95
7. Anexos..................................................................................................... 106
Agradecimentos
Agradeço a todos que me apoiaram nessa etapa. Minha gratidão especial é
à:
Profa. Santuza Teixeira e Daniella Bartholomeu. Muito obrigado, o que eu
aprendi devo à vocês. Só me mostraram o melhor caminho, planejamento. Acima da
ciência que me passaram, o profissionalismo apresentado é admirável.
Meus colegas dos laboratórios LGMT, LIGP e HPGL;
Dr. Najib El-Sayed;
Dr. Ricardo Gazzinelli;
Dra. Caroline Junqueira;
Todos os colaboradores dos Institutos e Universidades que passei com esse
trabalho;
Meus professores;
Meus amigos e parentes, compreensivos com meu trabalho, com destaque à
Tia Júlia, que me mostrou essa trilha;
Agradeço especialmente ao sacrifício que Lucas e Nádia tiveram, sem
reclamações, nessa etapa de muito trabalho e sacrifício;
Por fim, à CAPES.
I
Lista de Abreviaturas
BAC – Bacterial artificial chromosome – cromossomo artificial de bactéria
cDNA – Complementary DNA – DNA complementar
DGF – Dispersed gene family – família de genes dispersos
DNA – Desoxiribonucleic acid – Ácido desoxirribonucleico
DTU – Discrete typing unity – unidade discreta de tipagem
GPI8 – Glycosylphosphatidylinositol-anchor transamidase subunit 8 – Subunidade 8
transamidase da âncora glicosil fosfatidil inositol
gRNA – Guide RNA – RNAs guias
Kb – kilobases - 103 bases nucleotídicas
kDNA – Kinetoplast DNA – DNA do cinetoplasto
MASP – Mucin associated surface protein – Proteína de superfície associada à
mucina
mRNA – Messenger RNA – RNA mensageiro
MURF1 – Maxicircle unidentified read frame 1 – Frame de leitura de maxicírculo não
identificado 1
MURF2 – Maxicircle unidentified read frame 1 – Frame de leitura de maxicírculo não
identificado 2
ND4 – NADH desidrogenase 4
ND5 – NADH desidrogenase 5
II
ng – Nanogramas
nt – Nucleotídeos
ORF – Open reading frame – Janela aberta de leitura
PFGE – Pulse field gel electrophoresis – Eletroforese em gel em campo pulsátil
RNA – Ribonucleic acid – Ácido ribonucleico
RNAseq – RNA sequencing – Sequenciamento quantitativo de RNA
rRNA – Ribossomal RNA – RNA ribossomal
SAPA – Shed acute phase antigen – antígeno de fase aguda exudado
SNP – Single nucleotide polymorphism – Polimorfismo de nucleotídeo único
snRNA –RNA pequeno nuclear
snoRNA – RNA pequeno nucleolar
T. cruzi – Trypanosoma cruzi
TcTS-SAPA – Trans-sialidase de Trypanosoma cruzi com repetições SAPA
tRNA – RNA transportador
UTR – Untranslated region – Região não traduzida
WGS – Whole genome shotgun – Estratégia de sequenciamento baseada na
fragmentação de todo o genoma
III
Lista de figuras, tabelas e anexos
Figura 1 – Representação esquemática do ciclo de vida do Trypanosoma
cruzi....................................................................................................................... 2
Tabela 1 – PCR de diferenciação dos grupos de T. cruzi.................................... 25
Tabela 2 – PCR para diferenciação de tamanhos dos clusters de repetições
SAPA..................................................................................................................... 31
Tabela 3 – Dados comparativos entre sequenciamento e montagem dos genomas
dos clones CL-14 e CL Brener.............................................................................. 35
Figura 2 - Número de reads de CL-14 pelo tamanho em pares de base............. 36
Figura 3 - Pulse field do DNA total de CL-14 e CL Brener................................... 37
Figura 4 – Sintenia entre contigs de CL-14 e seus cromossomos homólogos em CL
Brener.................................................................................................................... 40
Figura 5 – Southern Blots genômicos................................................................... 42
Tabela 4 – PCR in silico de marcadores utilizados na genotipagem do T.
cruzi............................................................................................................. .......... 44
Figura 6 – Eletroforese dos amplicons dos marcadores para diferenciação de
DTUs...................................................................................................................... 46
Figura 7 – Árvores filogenéticas............................................................................47
Figura 8 – Parte do alinhamento entre os dois diferentes haplótipos................... 49
Figura 9 – Alinhamento das reads de CL-14 com genes homólogos de CL
Brener.................................................................................................................... 50
IV
Figura 10 - Cobertura das reads de CL-14 nos maxicírculos de CL Brener e
Esmeraldo.............................................................................................................. 52
Tabela 5 – Polimorfismos encontrados entre os kDNAs dos clones CL Brener e CL14........................................................................................................................... 53
Figura 11 - Comparação entre os genomas mitocondriais de CL-14 e CL
Brener.................................................................................................................... 55
Tabela 6 – Contagem das famílias gênicas e grupos de ortólogos...................... 58
Tabela 7 – Médias dasi dentidades das sequências codificadoras de proteínas......................................................................................................................... 59
Figura 12 – Resultados do algoritmo.................................................................... 62
Figura 13 – Comparação de mapeamento entre o algoritmo e BWA................... 64
Figura 14 – Coberturas da Trans-sialidase pelas reads genômicas de CL Brener e
CL-14..................................................................................................................... 66
Figura
15
–
Cobertura
nucleotídeo
a
nucleotídeo
da
Trans-sialidase
Tc00.1047053509495.30 por reads de sequenciamento genômico de CL Brener e
CL-14 e por reads do sequenciamento do transcriptoma de CL-14...................... 68
Figura 16 – Eletroforeses de analises de TcTS-SAPA......................................... 69
Figura 17 – Organização das Repetições SAPA nos clones CL Brener e CL14........................................................................................................................... 71
Figura 18 – Western blot TcTS e TcTS-SAPA..................................................... 73
Figura 19 – Exemplos de perfis de RNAs totais e bibliotecas de cDNA............... 75
V
Figura 20 – Mapeamento das reads do sequenciamento do mRNA de CL-14.... 78
Anexo 1 – Publicação: Predicting the proteins of Angomonas deanei, Strigomonas
culicis
and
their
respective
endosymbionts
reveals
new aspects
of
the
trypanosomatidae family........................................................................................ 105
Anexo 2 – Publicação: Distinct genomic organization, mRNA expression and cellular
localization of members of two amastin sub-families present in Trypanosoma
cruzi…………………………………………………………………………………….... 126
Anexo 3 – Manuscrito em preparação: Genome sequence of a highly attenuate
clone of Trypanosoma cruzi identifies SAPA repeats as a major virulence factor in
this human parasite…………………….………………………………………….…… 138
VI
Resumo
Trypanosoma cruzi, o agente etiológico da doença de Chagas, pertence a um
grupo de organismos com genoma peculiar, no qual expansões massivas de
famílias de genes de proteínas de superfície estão presentes e uma grande parte
deste é dedicada à sequências repetitivas. A conclusão do sequenciamento genoma
de referência, do clone CL Brener, revelou vários dados relacionados à virulência do
parasito. CL-14 é um clone avirulento derivado da mesma cepa de T. cruzi CL, no
entanto, em contraste com CL Brener, o clone CL-14 não é infeccioso nem
patogênico in vivo. Com o objetivo de investigar os determinantes moleculares de
virulência do T. cruzi, foi realizada uma comparação direta entre os genomas dos
clones CL Brener e CL-14, com base nas sequências disponíveis CL Brener e
sequências do genoma de CL-14 por nós geradas utilizando a plataforma 454 FLX.
Embora ambos os genomas não foram totalmente montados, verificou-se que eles
apresentam organização altamente semelhante tanto com relação ao genoma
nuclear quanto ao genoma mitocondrial (kDNA), possuem números semelhantes de
sequências codificantes preditas bem como números semelhantes de cópias de
membros das famílias de multigênicas. Análises de PCR, bem como inferências
filogenéticas mostraram que o CL-14 é também um clone híbrido, que pertence à
mesma DTU que o clone CL Brener (TcVI). Análises de similaridade e Southern blot
indicam que os dois clones apresentam cariótipos semelhantes e identidade de
sequência superior a 99 %. A única diferença importante detectada entre estes dois
genomas é relativa a um subgrupo da grande família de genes que codificam as
trans-sialidases (TcTS), conhecidas por apresentarem um domínio C-terminal
contendo 12 repetições de aminoácidos denominado ‘shed acute phase antigen’ ou
repetições SAPA. Presentes no genoma do clone CL Brener, o qual possui pelo
VII
menos três cópias de TcTS contendo domínios repetitivos variando entre 19-41
repetições, as repetições SAPA são altamente imunogênicas e promovem um
aumento da meia vida das proteínas TcTS liberadas na corrente sanguínea do
hospedeiro. No clone CL -14, foi identificada somente uma cópia de TcTS contendo
três repetições SAPA. Esta quantidade reduzida de repetições SAPA em genes de
TcTS em CL-14, confirmada experimentalmente por PCR, ensaios de Southern blot,
western blot e dados de transcriptoma, pode constituir um dos fatores responsáveis
pelas diferenças de virulência entre as duas linhagens.
VIII
Abstract
Trypanosoma cruzi, the etiologic agent of Chagas disease, belongs to a group
of organisms with a peculiar genome in which a massive expansion of surface
protein gene families is present and a large proportion of it is devoted to repetitive
sequences. The completion of the CL Brener reference strain genome revealed
several new features related to the parasite virulence. CL-14 is an avirulent clone
derived from the same T. cruzi CL strain, however, in contrast to CL Brener, CL-14 is
neither infective nor pathogenic in vivo. To investigate the molecular determinants of
T. cruzi virulence, we performed a direct comparison of the CL Brener and CL-14
genomes, based on the available CL Brener sequences and sequences we have
generated from CL-14 using the 454 FLX plataform. Although both genomes have
not been fully assembled, we found that they have highly similar nuclear genome
organization, almost 100% identical mitochondrial maxi-circle kDNA, similar numbers
of predicted coding sequences as well as number of copies of members of multigene families. PCR analyses as well as phylogenetic inferences showed that CL-14
is also a hybrid that belongs to the same DTU as CL Brener (TcVI). Southern blot
analyses indicate a similar karyotype and, for most multigenic families, sequence
identity among the two clones is higher than 99%. The only major difference
detected between these two genomes is related to a sub-group of the large TransSialidase gene family (TcTS), known to have a C-terminal domain with 12-aminoacid repeats named ‘shed acute phase antigen’ or SAPA repeats. At least three
copies of TcTS containing a repetitive domain varying from 19 to 41 repeats, which
are highly immunogenic and promote an increase in the half-life of TcTS protein
sheded in the host bloodstream, are present in the CL Brener genome, whereas in
CL-14, only one copy containing 3 SAPA repeats was identified. This reduced
IX
amount of SAPA repeats in the CL-14 TcTS, confirmed by PCR, Southern, western
blot analyses and transcriptome data, may constitute one of the factors responsible
for the differences in virulence between these two strains.
X
1. Introdução
1.1 Trypanosoma cruzi e a Doença de Chagas
A Doença de Chagas ou Tripanossomíase Americana é causada pelo
Trypanosoma cruzi, parasito protozoário descoberto por Carlos Chagas no
início do século XX. A tripanossomíase americana foi designada como doença
tropical negligenciada pela Organização Mundial da Saúde (WHO, 2013).
A doença acomete entre 7 e 8 milhões de pessoas e causa 12000
mortes por ano (Rassi et al., 2010). São encontradas áreas endêmicas em 21
países latino americanos. No entanto, nas últimas décadas, a doença tem sido
cada vez mais detectada nos Estados Unidos da América e Canadá devido à
imigração de pessoas entre os países (WHO, 2013). Em consequência do largo
uso de inseticidas em spray, Uruguai, Chile e Brasil declararam que estão livres
de transmissão via Triatoma infestans, o vetor principal do T. cruzi (Schofield,
et al., 2006).
A transmissão do parasito para o homem ocorre mais comumente com o
contato das fezes infectadas do vetor hematófago com mucosas ou feridas
abertas pelo mesmo ao sugar o sangue do hospedeiro (Fig. 1). Insetos
triatomíneos são exclusivamente hematófagos e se tornam infectados com T.
cruzi quando se alimentam de sangue de mamíferos contendo formas
tripomastigotas do parasito. Uma vez no intestino do inseto, o parasito se
1
Figura 1: Representação esquemática do ciclo de vida do Trypanosoma cruzi.
a: presença de formas tripomastigotas metacíclicas nas fezes do vetor; b:
entrada de formas tripomastigotas metacíclicas no hospedeiro vertebrado por
lesão ou fissura na pele ou mucosas; c: multiplicação intracelular das formas
amastigotas;
d:
diferenciação
das
formas
amastigotas
em
formas
tripomastigotas; e: liberação das formas tripomastigotas e infecção de novas
células do hospedeiro; f: liberação das formas tripomastigotas para a corrente
sanguínea do hospedeiro; g: infecção de tecidos musculares e/ou nervosos por
formas tripomastigotas; h: ingestão de formas tripomastigotas sanguíneas pelo
vetor i: diferenciação das formas epimastigotas em formas tripomastigotas
metacíclicas no intestino posterior do vetor, reiniciando o ciclo de vida do
parasito. Figura retirada e traduzida Expert Reviews in Molecular Medicine,
Cambridge University Press, 2002.
2
transforma em epimastigotas, as quais são formas replicativas. Na porção final
do trato digestivo do barbeiro as formas epimastigotas se diferenciam em
tripomastigotas metacíclicas, a forma do T. cruzi capaz de infectar mamíferos
pela transmissão vetorial. Quando insetos infectados defecam durante o
repasto sanguíneo, eles depositam parasitos, o que pode resultar em
transmissão pelo contato como conjuntivas, mucosas ou a lesão da picada do
inseto. Os tripomastigotas metacíclicos penetram nas células do hospedeiro e
se transformam em amastigotas, as formas replicativas no hospedeiro
vertebrado. Após vários ciclos de multiplicação, as amastigotas se transformam
em tripomastigotas e a célula hospedeira é rompida, liberando parasitos no
sangue. Os tripomastigotas liberados podem infectar células adjacentes ou
serem distribuídos pelo corpo pelos vasos linfáticos ou sanguíneos, infectando
órgãos e tecidos distantes. O ciclo de vida e a transmissão continuam quando
vetores se alimentam do sangue de hospedeiros contaminados (Brener et al.,
2000). Menos frequente a transmissão pode ocorrer por transfusão sanguínea,
transplante de órgãos, transmissão congênita (WHO, 2013) ou ainda por
transmissão oral por ingestão de alimentos contaminados. A mortalidade está
mais associada com o estágio crônico da doença, a qual pode levar vários
anos para desenvolver. Não existe vacina para a doença de Chagas e existem
somente 2 medicamentos disponíveis para o tratamento, ambos com pouca
eficácia e apresentando sérios efeitos colaterais (Brener et al., 2000 e WHO,
2013).
3
1.2 Variabilidade na População de Trypanosoma cruzi
Características biológicas (Andrade, 1974), bioquímicas e moleculares
(Miles et al., 1981, Morel et al., 1980, Tybarenc e Ayala, 1991, Freitas et al.,
2006, Herrera et al., 2007), permitiram a classificação das varias cepas de T.
cruzi em dois grupos denominados TcI e II. Essas linhagens são muito
divergentes
como
revelado
pelos
autores
e
pertencem a
ambientes
predominantemente distintos: TcI, na região central da América do Sul e com
ciclo de vida silvestre, apresenta baixo índice de parasitismo em humanos. O
TcII, com transmissão doméstica, causa infecções em humanos com alta
parasitemia em áreas endêmicas (Zingales et al., 1999). No Brasil, cepas TcII
aparentemente são exclusivamente responsáveis por lesões da Doença de
Chagas (Freitas et al., 2005).
Em 2006, Freitas et al., separaram 144 diferentes haplótipos através de
filogenia com marcadores moleculares e demonstraram que algumas cepas
não poderiam ser classificadas como Tc I ou II, sugerindo um novo grupo para
essas cepas, o Tc III. Outras cepas não foram classificadas pelos parâmetros
estudados e, portanto, a classificação filogenética não estaria completa. Um
novo grupo de cepas deveria ser criado, pois possui características de dois
grupos distintos, Tc II e III, indicando a existência de cepas híbridas.
Com o melhor entendimento da sua estrutura populacional e inclusão de
novos marcadores moleculares, mais recentemente as várias cepas deste
4
protozoário passaram a ser classificadas em seis grupos, T. cruzi I-VI (Zingales
et al., 2009). Na nova classificação, foi denominado um grupo híbrido, TcVI, o
qual é oriundo do parental receptor TcII e doador TcIII (Freitas et al., 2006). O
genoma mitocondrial do TcIV é derivado de cepa parental pertencente ao
grupo TcIII.
1.3 Genoma do Trypanosoma cruzi
Análises da sequência completa do genoma do T. cruzi, publicada em
2005 (El-Sayed et al., 2005) mostraram que seu genoma de 55 milhões de
pares de bases (Mb) é diplóide, dos quais 50% são codificantes e grande parte
corresponde a sequências repetitivas, como retrotransposons e genes de
grandes famílias de proteínas de superfície. O clone referência escolhido para
o projeto genoma do T. cruzi é o clone CL Brener (Brener e Chiari, 1963), um
clone híbrido o qual é pertencente ao grupo TcVI (Zingales et al., 2009). A
escolha do clone CL Brener para o projeto genoma foi baseada em cinco
características: seu padrão de infecção em camundongos é bem conhecido, foi
isolada do vetor Triatoma infestans, possui um tropismo preferencial para
coração e células musculares, apresenta uma clara fase aguda em humanos
infectados e é sensível a drogas utilizadas clinicamente para a doença de
Chagas (Zingales et al., 1997). Outros importantes trabalhos de análises
genômicas com esse clone haviam sido previamente publicados, incluindo
5
análises de cariótipo (Branche et al., 2006 e Porcile et al., 2003), mapas físicos
e geração de ESTs de todos os estágios de vida do ciclo do parasita (Brandão
et al., 1997; Cano et al., 1995; Cerqueira et al., 2005; Henriksson et al., 1995;
Porcel et al., 2000; Verdun et al., 1998).
O sequenciamento do genoma do T. cruzi foi baseado na técnica wholegenome shotgun (WGS) com uma cobertura de 14 vezes e montagem final de
5486 scaffolds (El-Sayed et al., 2005). Durante o sequenciamento do genoma
verificou-se que este é um genoma híbrido resultante da fusão de dois
genótipos de cepas oriundas de T. cruzi II e T cruzi III, ou seja, possui dois
haplótipos diferentes. Foi então sequenciado pela mesma técnica, com uma
baixa cobertura (2,5x) o genoma do T. cruzi clone Esmeraldo, pertencente ao
grupo Tc II. Comparando os contigs do clone CL Brener e as reads do clone
Esmeraldo, foi possível discriminar os dois haplótipos de CL Brener.
Sequências de CL Brener mais similares aos reads de Esmeraldo foram
denominadas Esmeraldo-like. O outro haplótipo foi anotado como nonEsmeraldo-like. Dos
Esmeraldo-like; 6043
22570
genes
preditos, 6159
representam alelos
representam alelos
non-Esmeraldo-like
e
10368
representam sequências que não puderam ser associadas a um haplótipo em
particular. Além de descrever o genoma e sua organização, os autores ainda
apresentaram uma nova família de genes com mais de 1300 cópias, a família
das MASPs, que codificam proteínas de superfície associadas a mucinas. Por
meio de análises filogenéticas, Freitas et al., (2006) confirmaram a natureza
6
híbrida do clone CL Brener, como resultante da fusão de cepas parentais
pertencentes aos grupos T. cruzi II e T. cruzi III.
A fim de gerar uma montagem com maior resolução, que representasse
melhor os cromossomos do T. cruzi, Weatherly et al. (2009) gerou consensus
de cada par de cromossomos homólogos para ambos haplótipos. Os autores
montaram inicialmente 11 cromossomos baseados na sintenia com os
cromossomos de T. brucei. Outros cromossomos foram montados após o
mapeamento de ambas as extremidades de clones de Bacterial Artificial
Chromosome (BAC) que tivessem sequências de diferentes contigs ou
scaffolds na direção correta. No total, 41 cromossomos foram montados,
contagem a qual corrobora com a contagem de cromossomos de T. cruzi
predita baseada em estudos com pulsed-field gel electrophoresis (PFGE)
(Branche et al., 2006). A montagem proposta por Weatherly et al. 2009,
apresenta 90% dos genes anotados no genoma. Verificou-se que a
organização genômica do T. cruzi é extremamente sintênica com os genomas
de T. brucei e L. major (os quais juntamente com o T. cruzi são conhecidos
como Tri-Tryps). Essa sintenia é bem conservada em regiões contendo os
genes housekeeping, mas é quebrada em regiões de famílias gênicas que
codificam
para
cromossômicas
proteínas
de
superfície
internas
não-sintênicas
que
e
ocorrem
regiões
em
posições
subteloméricas.
Retroelementos e RNAs estruturais ocorrem também nessas regiões de baixa
sintenia (El-Sayed et. al., 2005b).
7
O genoma mitocondrial presente nestes organismos, denominado kDNA
é constituído por 25-50 cópias de maxicírculo (com 22Kb) e 5000-10000 cópias
de minicírculos (com 7,5Kb) (Shapiro, 1993), Esta rede de kDNA única está
presente na estrutura da mitocôndria, que caracteriza os eucariotos flagelados
da classe Kinetoplastida, a qual os tripanosomatídeos pertencem. Possui
aproximadamente 22 Kb, é distinto dos outros genomas mitocondriais pelo seu
grande tamanho, complexidade e conteúdo (Westenberger et al., 2006) e
compreende aproximadamente 20-25% do DNA total desse organismo (Souza,
2003). Este DNA é um importante marcador taxonômico e, a partir dele, foi
definida a relação filogenética de 45 cepas de T. cruzi, agrupadas em três
clados, A, B e C (Machado e Ayala, 2001). O maxicírculo do clone CL Brener é
um oriundo do TcIII e o maxicírculo do clone Esmeraldo é pertencente ao TcII.
Em 2011, Frazén et al. apresentaram a sequência do genoma do clone
Sylvio X10/1 de T. cruzi, um representante do grupo Tc I. Os dados revelaram
que os genomas dos dois clones, Sylvio X10/1 e CL Brener, possuem alta
sintenia e um set de genes muito similar, mas com grandes diferenças na
quantidade de genes pertencentes às famílias multigênicas. Os alelos do clone
Sylvio X10/1 tem 97% e 96% de identidade com os haplótipos non-EsmeraldoLike e Esmeraldo-like, respectivamente, o que sugere que o clone Sylvio X10/1
é mais similar ao haplótipo non-Esmeraldo like, ou seja, ao genótipo de T. cruzi
tipo III. A quantidade de DNA não codificante entre os genomas também é
extremamente semelhante.
8
1.4 Genômica Comparativa de Tripanosomatídeos
Estudos de genômica comparativa revelam aspectos importantes
relacionados às diferenças no ciclo de vida, tipo de hospedeiro, virulência e
patogenicidade de organismos causadores de doenças. Apesar da grande
distância filogenética, análises dos genomas do T. cruzi e de dois outros
tripanosomatídeos
patogênicos
para
o
homem,
Leishmania
major
e
Trypanosoma brucei, revelaram um proteoma comum contendo 6200 proteínas
e uma alta sintenia gênica (El-Sayed et al., 2005b). A frequente correlação
entre blocos sintênicos conservados e os grandes clusters genômicos
direcionais (DGC), os policístrons característicos dos três tripanosomatídeos,
também refletem seu acoplamento das reações de transcrição com o
subsequente processamento do RNA pelo trans-splicing e poliadenilação.
Apesar disso, existem diferenças substanciais mesmo em genes com o mesmo
contexto genômico, o que indica adaptações específicas a pressões seletivas,
estratégias de sobrevivência de cada organismo e diferenças no ciclo de vida,
O Trypanosoma cruzi, como já citado, tem seu ciclo de vida entre o vetor
invertebrado triatomíneo e o hospedeiro vertebrado, onde infecta o sangue e
invade células. O Trypanosoma brucei divide seu ciclo de vida entre um vetor
invertebrado (moscas do gênero Glossina), nas formas epimastigotas e
promastigota, e entre o hospedeiro vertebrado na forma tripomastigota, apenas
no sangue do hospedeiro, não invadindo células. A Leishmania major, também
9
divide seu ciclo de vida entre um vetor invertebrado e hospedeiro vertebrado.
Os vetores são mosquitos fêmeas dos gêneros Lutzomyia ou Phlebotomus,
onde ocorrem as formas amastigota e promastigota procíclica. Na forma
promastigota, a Leishmania invade o hospedeiro vertebrado pela picada do
inseto, em sua forma promastigota e, após invasão celular, se transforma em
amastigotas, a forma replicativa. Os três tripanosomatídeos também praticam
diferentes estratégias de evasão do sistema imunológico do hospedeiro: L.
major altera a função dos macrófagos infectados, T cruzi expressa uma
complexa variedade de antígenos de superfície de dentro das células que
infecta e T. brucei se mantem extracelular, mas contorna a resposta imune do
hospedeiro pela mudança periódica de sua principal proteína de superfície, a
VSG (El-Sayed et al, 2005b e Pays et al., 2004)
A localização de grandes arranjos de genes que codificam para
proteínas de superfície, perto ou dentro de telômeros e a presença de
elementos transponíveis nesses arranjos podem aumentar a frequência de
recombinação e resultar na variação de sequências codificadoras. Isso é
observado em T. cruzi com as MASPs, DGF-1 e RHS, esta última também em
T. brucei, onde estes genes podem estar relacionados com a evasão imune e
sobrevivência em diferentes hospedeiros. A recombinação frequente dessas
regiões resulta em grandes polimorfismos entre cromossomos homólogos.
Análises similares realizadas com os dados dos genomas de três
espécies de Leishmania, Leishmania infantum, Leishmania braziliensis
10
(Peacock et al., 2007) e Leishmania major (Ivens et al., 2005) mostraram
também que a formação de pseudogenes e perda de genes são eventos que
poderiam determinar algumas das diferenças observadas nos processos de
interação parasito-hospedeiro (Peacock et al., 2007).
O estudo comparativa do genoma do Trypanosoma brucei gambiense, a
subespécie causadora da doença do sono em humanos, com o genoma do T.
brucei brucei, uma subespécie que não infecta humanos, não foi capaz de
revelar a presença de sequências específicas nesses tripanosomatídeos, que
poderiam explicar as diferenças na capacidade de infectar hospedeiros
humanos. No entanto, foram identificadas contagens diferentes de cópias de
genes que codificam para proteínas de superfície entre os clones infectivos e
não-infectivos (Jackson et al., 2010).
Dentre
outros
tripanosomatídeos
que
tiveram
seus
genomas
sequenciados, cabe ressaltar: Leishmania tarentolae, a espécie não virulenta
com ausência de genes associados ao estágio intracelular no hospedeiro
mamífero (Raymond et al., 2012); Leishmania donovani, onde o genoma de
isolados clínicos indicou co-infecção com Leptomonas (Singh et al., 2013);
Trypanosoma cruzi marinkellei B7, um parasito associado a morcegos, com
variação no número de cópias de genes em famílias multigênicas e muitas
sequências
únicas,
incluindo
potenciais
genes
subespécie-específicos
(Franzén et al., 2012); Leishmania amazonensis, o agente etiológico da
leishmaniose cutânea humana, revelando genes de superfícies únicos para o
11
gênero que podem estar relacionadas com o desenvolvimento da doença e
interação com células do hospedeiro e também foi proposto um interactoma
híbrido entre proteínas secretadas pelo parasito em fatores que imitam o
sistema do hospedeiro (Real et al., 2013); Angomonas deanei e Strigomonas
culicis, cuja sequência revelou dados sobre a interação e adaptação desses
tripanosomatídeos com seus endossimbiontes, fornecendo informações sobre
a evolução de células eucariotas (Motta et al., 2013). Em 2013, Goodhead et
al., sequenciaram os genomas de duas subespécies de Trypanosoma brucei,
T. b. gambiense e T. b. rhodesiense as quais além de serem geneticamente e
geograficamente distantes e são associados a fases diferentes da doença do
sono africana. Utilizando de marcadores específicos desenvolvidos para cada
um desses genótipos foi observado que o T. b. rhodsiense isolado de um único
foco possui genótipo e fenótipo dos dois genomas de referência. Os resultados
dos autores sugerem que houve introgressão genética entre as subespécies
infectivas de T. brucei e, portanto elas não são geneticamente isoladas. Dados
de outros genomas como de T. congolense IL3000, T. vivax Y486, L. mexicana
U1103, C. fasciculata Cf-Cl, T. brucei Lister 427, T. cruzi Esmeraldo, T. cruzi JR
cl. 4, E. monterogeii LV88 e L. panamensis L13 são encontrados em
http://tritrypdb.org.
12
1.5 O Clone CL-14
O clone CL Brener, utilizado como cepa referência para o projeto
genoma do T. cruzi foi isolado a partir de uma cepa isolada do Triatoma
infestans (revisado por Zingales et al., 1997).
Um segundo clone derivado
dessa mesma cepa, o clone CL-14 de T. cruzi apresenta como característica
peculiar o fato de ser totalmente avirulento. Ensaios de infecção in vitro
mostraram que o clone CL-14 é quatro vezes menos invasivo quando
comparado com a cepa CL parental (Atayde et al., 2004).
A inoculação de tripomastigotas do clone CL-14, além de não produzir
parasitemia, mesmo em animais imuno-deficientes, é capaz de induzir
imunidade protetora eficiente, subsequente ao desafio com a cepa CL,
prevenindo a mortalidade, desenvolvimento de parasitemia e sintomas da
doença em camundongos (Lima et al., 1990 e 1995). Em 1999, Paiva et al.,
demonstrou que a CL-14 induz imunidade envolvendo resposta CD8+ em
animais, os quais, após serem desafiados com CL Brener, não apresentam
parasitemia e parasitismo tecidual, inclusive em animais neonatos. Os animais
imunizados são capazes de induzirem produção de INF- , IgG1, IgG2a e IgG2b
(Pyrrho et al., 1998, Soares et al., 2003, Atayde et al., 2004).
Devido a sua natureza não virulenta, o clone CL-14 foi usado como vetor
vacinal contra melanoma (Junqueira et al., 2012). O gene do antígeno NYESO-1, característico por baixa expressão em tecidos normais, mas com alta
13
expressão em neoplasias como tumores de pulmão, esôfago, fígado,
estômago, próstata, ovário, vesícula e melanoma, foi clonado em vetor de
expressão em CL-14. O T. cruzi CL-14 transgênico expressando NY-ESO-1,
quando testado em modelos animais, foi capaz de induzir altos níveis de
resposta imune humoral e celular do tipo Th1 contra o antígeno NY-ESO-1,
além de inibirem completamente o crescimento de melanomas, quando células
humanas expressando NY-ESO-1 foram injetadas nos animais infectados com
CL-14.
1.6 Expressão Gênica em Tripanosomatídeos
A expressão gênica em T. cruzi, como nos outros membros da família
Trypanosomatidae, ocorre de forma bastante peculiar. O parasito transcreve
seus genes constitutivamente em longos transcritos policistrônicos que são
processados pós-transcricionalmente. A iniciação da transcrição é bidirecional
entre dois policístrons diferentes (Martínez-Calvillo et al., 2003, 2004.). Os
genes codificadores de proteínas são transcritos pela RNA polimerase II em
pre-mRNAs
policistrônicos
(Martínez-Calvillo
et
al.,
2003).
Uma
vez
sintetizados, reações de trans-splicing, que resultam na união da sequência
Spliced Leader, ou SL contendo cap à extremidade 5’ do transcrito (Liang et al.
2003) e a poliadenilação ocorrem gerando mRNAs monocistrônicos maduros
(Teixeira e Da Rocha 2003). Hartmann et al., (1998) e López-Estraño et al.,
14
(1998) demonstraram que regiões intergênicas ricas em polipirimidinas guiam a
adição do SL e a poliadenilação. Foi demonstrado por Campos et al., 2008, que
em T. cruzi o tamanho médio entre os sítios de adição do SL e os motivos de
polipirimidina tem 18 nucleotídeos e a distância média entre os sítios de adição
de cauda poli-A e a sequência rica em polipirimidina upstream mais próxima é
de 40 nucleotídeos.
Os autores demonstraram também que os tamanhos
médios das sequências 5’-UTR e 3’-UTR de T. cruzi são 35 e 264 nucleotídeos,
respectivamente,
sendo
menores
que
o
observado
para
T.
brucei,
corroborando com os resultados de análises comparativas dos genomas dos
Tri-Tryps obtidos por Berriman et al., 2005, El-Sayed et al., 2005a e Ivens et al.,
2005, que mostram que o genoma de T. cruzi é mais compacto que o genoma
de T. brucei.
Poucos estudos sobre expressão gênica ao nível global em T. cruzi
foram publicados. Utilizando a técnica de microarray (Minning et al., 2003,
Minning et al., 2009), observou-se que um total de 4992 transcritos (aprox. 41%
dos genes) parece ser regulado negativa ou positivamente em pelo menos em
um dos estágios de vida do parasita. Alguns desses resultados de microarray
foram validados por comparação com dados de RT-PCR quantitativo. Foi
também observado que membros de clusters parálogos em T. cruzi podem
exibir divergências significativas de expressão ao longo do ciclo de vida, no que
diz respeito à abundância dos respectivos mRNAs, como é o caso das
amastinas, proteínas expressas em amastigotas (Teixeira et al., 1994), mas
15
com membros que expressam também em epimastigotas (Kangussu-Marcolino
et al., 2013). Essas análises de microarray apresentam, entretanto, algumas
limitações sendo, uma delas, o fato de ser necessário o conhecimento prévio
das sequências a serem analisadas, as quais precisam estar presentes nos
chips (de oligonucleotídeos ou de cDNAs). Além disso, a detecção de
variações nos níveis de mRNAs mais raros torna-se muito difícil, mais ainda
quando a quantidade de mRNA para a hibridização com as sondas dos chips é
pequena, pois algumas formas, como as amastigotas intracelulares, são
difíceis de serem obtidas. Também é difícil representar todos os genes no
espaço disponível dos chips. A técnica de microarray não tem sensibilidade de
detectar RNAs de baixa expressão, pode criar artefatos de hibridização
cruzada, gerar os dados apresenta um custo elevado e tem baixo rendimento
(Wang et al., 2009).
Em contraste com a tecnologia de microarray, abordagens baseadas em
sequenciamento de cDNAs determinam diretamente os níveis dos vários
mRNAs nas células. Existe hoje, um grande volume de dados de expressed
sequence tags (ESTs) nos bancos de dados genômicos. Porém esses dados
não apresentam claramente informações sobre a expressão gênica, pois além
de não cobrirem todo o transcriptoma, não temos como quantificar de forma
precisa os transcritos correspondentes a cada EST. O advento da nova
tecnologia de sequenciamento de cDNA ou RNA-seq, que é capaz de
determinar a estrutura e também quantificar o nível dos transcritos nas células,
16
possibilita agora a obtenção do conjunto completo de dados sobre a expressão
de um genoma. Essa nova tecnologia já foi utilizada para estudar o
transcriptoma em T brucei. Nesses estudos (Siegel et al., 2011, Kolev et al.,
2010 e Archer et al., 2011) foram geradas sequências de RNAs presentes nas
duas formas do parasito e demonstrou-se que a iniciação da transcrição
nesses organismos não é restrita ao início dos grupos de genes, mas pode
ocorrer bidiredicionalmente em sítios internos, como descrito para Leishmania
major (Martínez-Calvillo et al., 2003 e 2004) e para Trypanosoma cruzi
(revisado por Araújo et al, 2011). O mapeamento das reads provenientes do
sequenciamento do transcriptoma do T. brucei realizado por Kolev et. al., 2010,
mostrou que a transcrição é bidirecional, pois as reads derivadas de
fragmentos de RNA 5’-trifostato mapearam em sentidos opostos, a partir da
mesma origem. Em 2011, Archer et al. identificou padrões de sequências
(motifs) de transcritos co-regulados, sugerindo que possam ser sinais de
regulação de expressão.
Os motifs de RNA envolvidos na regulação do ciclo celular foram
descritos para T. brucei e são conservados entre outros cinetoplastídeos. Além
disso, estes estudos descreveram as posições dos sítios de adição do spliced
leader e dos sítios de adição de cauda poli-A. Transcritos não descritos
anteriormente foram anotados e, não menos importante, as quantidades dos
transcritos foram definidas para cada estágio do ciclo de vida, definindo o nível
de expressão dos genes (Siegel et al., 2005).
17
Em 2011, Franzén e seus colaboradores publicaram o transcriptoma de
pequenos RNAs não codificantes do T. cruzi em larga escala pela técnica RNA-seq, a
fim de descrever o metabolismo dos RNAs neste organismo, uma vez que ele não tem
vias clássicas de processos relacionados à RNA interferente (da Rocha et al., 2004).
Os autores encontraram sequências relacionadas a rRNAs, snRNAs, snoRNAs,
grande quantidade de pequenos RNAs derivados de tRNAs e 92 novos loci, onde a
maioria não apresenta homologia com classes conhecidas de RNA.
18
2. Objetivos
2.1 Objetivo Geral:
Investigar as bases genômicas da diferença de infectividade entre os
clones de Trypanosoma cruzi, CL Brener, um clone virulento e CL-14, um clone
avirulento no modelo de infecção animal.
2.2 Objetivos Específicos
1. Geração e análise da sequência completa do genoma do clone CL-14;
2. Comparação de sequências nucleotídicas entre CL-14 e CL Brener;
3. Análises filogenéticas entre CL-14 e CL Brener;
4. Análise do conteúdo gênico e número de cópias de famílias multigênicas
entre CL Brener e CL-14;
5. Montagem e anotação do genoma de maxicírculo de CL-14;
6. Geração e análise de sequências de cDNA de epimastigotas, amastigotas
intracelular e tripomastigotas de CL-14 por RNA-seq e comparação com
sequências de RNA expressas em CL Brener e CL-14.
19
3. Materiais e Métodos
3.1 Sequenciamento do DNA nuclear e mitocondrial
Formas epimastigotas do clone CL-14 do Trypanosoma cruzi, foram
cultivadas em meio LIT como descrito por Teixeira et al., (1994). O DNA total
foi extraído e foi utilizado para a construção de duas bibliotecas genômicas,
cada uma com 5mg DNA. Uma biblioteca foi realizada pelo método shotgun,
onde o DNA é fragmentado aleatoriamente e outra biblioteca foi realizada pelo
método paired end tag (PET), no qual o DNA é fragmentado em pedaços
maiores e são sequenciadas “etiquetas” nas extremidades destes fragmentos,
os quais são posteriormente mapeados no genoma que é então montado
(Fullwood et al., 2009). Cada biblioteca foi sequenciada individualmente por
pirosequenciamento de alto rendimento, high-throughput pyrosequencing, com
o
equipamento
Roche
454
FLX-Titanium no Laboratório Nacional de
Computação Científica – LNCC, em Petrópolis/RJ. A montagem do genoma
total do T. cruzi, também realizada no LNCC foi feita de novo pelos softwares
Mira (Chevreux et al., 2004) e Newbler (www.454.com).
20
3.2 Pré-processamento e pré-análises das sequências
O grupo do LNCC nos enviou três conjuntos de sequências: as reads
que são as sequências geradas pelo sequenciador, e duas montagens de
contigs, uma montada pelo Mira e outra pelo Newbler.
O produto do sequenciamento, as reads, foi pré-processado in silico
através do software Seqclean e do banco de dados UniVec. O primeiro passo
foi procurar por adaptadores e sequências de baixa qualidade usando o
Seqclean com o banco UniVec. Esses adaptadores são provenientes de duas
etapas distintas da técnica de sequenciamento: adaptação e amplificação dos
fragmentos do DNA dentro das beads e a amplificação para o sequenciamento
propriamente
dito.
Suas
sequências
são:
Primer
A
–
CCTCCCTCGCGCCATCAG e Primer B – GCCTTGCCAGCCCGCTCAG, para
amplificação das amostras e Primer A – GCCTCCCTCGCGCCA e Primer B –
GCCTTGCCAGCCCGC para o sequenciamento. Aquelas sequências que
apresentaram contaminação por adaptadores ou baixa qualidade foram tiveram
essas estruturas retiradas.
As sequências montadas não receberam pré-processamento.
Como primeiras análises, determinadas com uma pipeline da linguagem
Perl e o pacote de funções BioPerl (Stajich et al., 2002), estabelecemos a
quantidade de sequências geradas durante o sequenciamento e também
durante as montagens, a quantidade total de nucleotídeos sequenciados, a
21
cobertura do genoma (quantas vezes o genoma foi sequenciado), a
porcentagem de CG, e o N50.
N50 é uma métrica estatística onde é calculado o tamanho do médio dos
melhores contigs montados, usando um genoma de referência, o qual
utilizamos o genoma do Trypanosoma cruzi clone CL Brener. Esse valor é
definido pelo tamanho do contig onde, a soma de contigs maiores ou iguais
produza o mesmo valor que a metade do genoma de referência. Os contigs
foram organizados do maior para o menor e, somando um a um, quando a
soma atingiu o valor da metade do genoma diplóide do clone CL Brener, o N50
foi definido pelo tamanho do último contig somado.
Os contigs montados do clone CL-14 maiores que o N50 foram
alinhados pelo software MEGABLAST contra os cromossomos montados do
clone CL Brener (TritrypDB versão 4.3) e, os cromossomos que apresentaram
maior pontuação com os contigs nos alinhamentos, foram selecionados como
homólogos. Os contigs foram então alinhados com seus cromossomos
homólogos pelo software CONTIGuator (Galardini et al., 2011) para análises de
sintenia.
22
3.3 Amplificação por PCR do DNA nuclear e mitocondrial de cepas do
Trypanosoma cruzi
Foram realizadas reações de polimerase (PCR) in silico e depois
confirmadas in vitro para dois marcadores nucleares, mini-exon SL (Burgos et
al., 2007) e rDNA 24S (Souto et al., 1996) e para um marcador mitocondrial
na sequência do gene citocromo oxidase II (COII), como descrito por De Freitas
et al., 2006.
Com uma pipeline em Perl e algoritmos do software e-PCR (Schuler,
1997),
procuramos
por
sequências
entre
os
primers
F-
AAGGTGCGTCGACAGTGTGG e R- TTTTCAGAATGGCCGAACAGT para o
marcador do gene nuclear que codifica para a subunidade ribossomal 24S e
pelas sequências entre os primers F- CGTACCAATATAGTACAGAAACTG e RCTCCCCAGTGTGGCCTGGG para o marcador nuclear miniexon SL. Foram
permitidos alinhamentos de primers com até 2 gaps e 2 mismatches afim de
permitir o pareamento dos primers e verificar se os primers não anelariam em
outras regiões. A análise do marcador plastidial COII foi realizada com um
passo adicional onde os amplicons resultante do PCR eletrônico com os
primers F- CCATATATTGTTGCATTATT e R- TTGTAATAGGAGTCATGTTT
foram recuperados do genoma montado por um script em Perl e digeridos
também in silico pelo software ReMap do pacote Emboss (Rice et al., 2010),
configurado para encontrar o sítio de restrição da enzima AluI.
23
As mesmas análises foram feitas in vitro com DNA extraído de células
de formas tripomastigotas de cultura de representantes dos grupos Tc I-III, V,
VI e duas amostras biológicas de CL-14. As reações foram compostas de 1X
tampão GoTaq, 1.5 mM MgSO4 , 40 mM dNTPs, 0.75 U Taq e 10 pM de cada
primer. As sequências dos primers foram as mesmas das análises in silico e os
programas estão resumidos na Tabela 1. O produto de PCR obtido para o
marcador COII foi digerido com a enzima AluI, 1U para cada 20 mL, overnight a
36 °C.
Os resultados de PCR dos marcadores Mini-exon SL e rDNA 24Sα e o
produto de digestão do marcador COII foram submetidos a eletroforese em gel
de poliacrilamida 6%, seguido por coloração com nitrato de prata.
3.4 Análises Filogenéticas
Com uma pipeline em Perl e outros programas, como os algoritmos do
pacote BLAST, ClustalW (Larkin et al., 2007) e Mega 5 (Tamura et al., 2011),
determinamos a distância filogenética de 3 genes de cópia simples entre a
clone CL-14, os dois haplótipos da clone CL Brener e o clone Sylvio X10/1.
Esses genes são: proteína de reparo de mismatch de DNA (MSH2); proteína
de resposta ao estresse oxidativo trypanotiona redutase e citocromo oxidase II,
um gene codificado no genoma mitocrondrial.
24
COII
Mini-exon
rDNA 24Sa
94 °C / 5 min
94 °C / 3 min
94 °C / 10 min
Desnaturação
94 °C / 45s
94 °C / 1 min
94 °C / 30 s
Anelamento
45 °C / 45s
68 °C / 1 min (2*)
60 °C / 30 s
Desnaturação inicial
66 °C / 1 min (2*)
64 °C / 1 min (2*)
62 °C / 1 min (2*)
60 °C / 1 min (35*)
Extensão
Ciclos
Extensão final
72 °C / 1 min
72 °C / 1 min
72 °C / 30 s
40
43**
30
72 C / 5 min
72 °C / 10 min
72 °C / 10 min
Tabela 1 - Programas utilizados para PCR de diferenciação dos grupos de T.
cruzi.
* Número de ciclos com diferentes temperaturas. ** Número total de ciclos.
25
Para tal, os contigs de CL-14 foram alinhados por MEGABLAST com
filtro de baixa complexidade desligado, contra as CDS (coding sequences)
deCL Brener. As subsequências dos contigs de CL-14 que melhor alinharam
com as CDS de CL Brener foram cortadas e anotadas como CDS de CL-14.
Estas, juntamente com suas CDS de referência da CL Brener, foram alinhadas
múltipla e globalmente pelo software ClustalW, permitindo gaps de até 10
nucleotídeos, com 100 reamostragens bootstrap utilizando a matriz de
pontuação IUB, a qual pontua e alinha sequências com nucleotídeos não
definidos, como “N” no lugar de nucleotídeos (Larkin et al., 2007). Scripts em
Perl foram desenvolvidos para cortar overhangs nos alinhamentos, que são
sequências presentes nas extremidades dos alinhamentos, onde estas não
pareiam com todas
as
sequências, gerando blocos de alinhamentos
compactos. Tais alinhamentos processados foram agrupados pelo algoritmo
neighbour
joining
no
software
Mega5
para
determinar as
distâncias
filogenéticas para cada gene entre os diferentes clones.
3.5 Montagem do Genoma Mitocondrial
Para determinar o haplótipo ao qual a CL-14 pertence, verificamos a
cobertura pelas reads de CL-14 nos maxicírculos dos clones CL Brener e
Esmeraldo, os quais são maxicírculos Tc III e Tc II, respectivamente. As reads
foram
alinhadas
contra
os
maxicírculos
26
em
questão
pelo
software
MEGABLAST, software do pacote BLAST, com o filtro de alta complexidade
desligado, uma vez que este genoma possui muitas regiões repetitivas. Com
scripts em Perl, foram selecionados apenas os melhores hits e foi gerada uma
figura apresentando a cobertura dos maxicírculos, para cada alinhamento.
Posteriormente, também com scripts em Perl, foi verificado se as reads se
alinham com apenas um ou com os dois genomas plastidiais.
3.6 Determinação do Número de Cópias de Famílias Multigênicas
As ORFs do clone CL Brener montadas e anotadas, foram selecionadas
e agrupadas de acordo com suas famílias gênicas. Em pipeline próprio escrito
em Perl, as reads de CL-14 foram mapeadas contra cada grupo de ORFs e, a
partir da cobertura, a quantidade de sequências de cada ORF representada no
genoma da CL-14 foi estimada. Como primeiro passo, todo o banco de reads é
alinhado contra cada uma das ORFs. São selecionados os melhores
alinhamentos recíprocos, ou seja, aqueles que tem maior pontuação tanto no
sentido read -> ORF quanto ORF->read. Os alinhamentos selecionados são
computados para gerar um arquivo com a contagem de reads que cobrem cada
um dos nucleotídeos das ORFs, um a um. A média das coberturas de todos os
nucleotídeos é subtraída pelo desvio padrão e, este resultado, é dividido pela
cobertura do sequenciamento de cada haplótipo. O resultado desse algoritmo é
27
arredondado como um número inteiro e apresentado como a contagem predita
do número de cópias para cada sequência analisada.
Esse pipeline é capaz de contar as sequências tanto para uma única
sequência ou para um grupo de sequências similares, como uma família
multigênica, a partir de apenas um representante. Para a estimativa de cópias
de uma única sequência e com contagem que represente apenas ela, testa-se
o mapeamento sempre aumentando o a identidade dos alinhamentos até que o
resultado pare de convergir. Para conduzir a contagem de uma família
multigênica inteira a partir de uma única sequência, faz-se o contrário,
diminuindo a identidade dos alinhamentos até que a contagem final pare de
convergir.
Os resultados do pipeline são: a contagem total de cada sequência
query, um gráfico com as coberturas de cada nucleotídeo plotado, um arquivo
de texto com as coberturas nucleotídeo a nucleotídeo e um arquivo com as
reads ortólogas a cada ORF da referência.
As reads de CL-14 foram também alinhadas pelo software BWA (Li &
Durbin, 2010) com a opção “bwasw” contra o genoma do clone CL Brener para
comparar o mapeamento do pipeline com o mapeamento desse software. O
mapeamento foi configurado para alinhar cada read apenas uma vez. A
visualização foi feita com o software IGV (Thorvaldsdóttir et al., 2012).
28
3.7 Determinação de Identidade entre CDS
As reads do clone CL-14 foram mapeadas contra as CDS do clone CL
Brener (TritrypDB versão 4.3) pelo software BWA, com a opção “bwasw”, para
mapeamento de reads longas. O arquivo de mapeamento foi editado pelo
pacote de funções SAMTOOLS (Li et al., 2009) para a conversão de
arquivo .bam para .sam e distribuição organizada dos dados. Ainda com o
pacote SAMTOOLS, foi utilizada a opção “mpileup” para identificar os
polimorfismos entre os clones a partir do mapeamento. O pacote de funções
BCFTOOLS, desenvolvido pelos mesmos autores, foi utilizado para gerar um
arquivo de texto com as informações dos polimorfismos. Com as informações
de polimorfismos entre as reads e as CDS de referência, utilizou-se o script
VCFUTILS, pertencente ao pacote SAMTOOLS, para gerar as CDS do clone
CL-14.
As CDS de CL-14 preditas foram alinhadas contra as CDS do clone CL
Brener pelo software MEGABLAST, onde os melhores alinhamentos recíprocos
foram selecionados e, as médias das identidades entre eles foram anotadas
como a identidade média entre as CDS. Famílias multigênicas também foram
selecionadas e tiveram suas identidades médias verificadas.
29
3.8 Análises de Genes Trans-sialidases com Repetições SAPA
Com pipeline para desenho de primers que flanqueiam repetições,
desenvolvido no laboratório da Dra. Daniella C. Bartholomeu, e o software ePCR, identificaram-se diferenças nos tamanhos dos genes trans-sialidase que
contem repetições SAPA, TcTS-SAPA. As coberturas desses genes de CL
Brener foram estimadas em CL-14 com suas reads, utilizando o pipeline de
contagem de número de cópias, exposto anteriormente. As reads ortólogas de
cada gene foram montadas pelo software CAP3 e, os contigs resultantes,
forma analisados no software ORFfinder para a seleção da ORF ortóloga à
ORF da CL Brener. As ORFs foram alinhadas pelo software CLustalW, em
alinhamento global e as quantidades de repetições SAPA para os dois clones,
contadas manualmente.
Foi desenvolvido um par de primers para a verificação da diferença de
tamanho
dos
clusters
SAPA
entre
os
clones.
O
primer
F
5’-
CGGGATCGTGGGAGACGGGT-3’ anela-se dentro da região codificadora da
trans-sialidase
Tc00.1047053509495.30
e
o
primer
R
5’-
ACCGTTGCCAGCGGGAGTTG-3’ anela-se na região 3’-UTR do mesmo gene.
O programa para amplificação está na Tabela 2. A cada reação, foram
adicionados 30ng de DNA template.
30
Desnaturação inicial
94 °C / 10 min
Desnaturação
94 °C / 30 s
Anelamento
55 °C / 30 s
Extensão
72 °C / 30 s
30
Ciclos
72 °C / 10 min
Extensão final
Tabela 2 – Programa de PCR para
diferenciação
de
tamanhos dos
clusters de repetições SAPA.
31
Nosso grupo realizou eletroforeses de digestões por endonucleases dos
DNAs totais dos clones CL Brener e CL-14. As enzimas utilizadas, AluI, PuvII e
HpaII, clivam sequências nucleotídicas dentro das repetições SAPA. Essas
digestões foram hibridizadas com sondas SAPA. Foram feitos também, western
blots. Parasitos nas fases de vida epimastigota e tripomastigota cultivadas em
meio LIT a 28oC e coletadas durante a fase de crescimento exponencial foram
lavados em tampão fostato (pH 7.4). As células foram contadas e ajustadas
para a concentração de 3x10 8 células/mL em tampão de amostra (0,5M TrisHCl, 0,01M EDTA, 5% SDS, 5% 2-mercaptoetanol) e fervidas por 5 min. A
eletroforese correu em gel de poliacrilamida na concentração 0,1% SDS/12%
poliacrilamida. A corrida se deu a 100 volts por 2 horas a temperatura
ambiente. Polipeptídeos foram transblotados em folhas de nitrocelulose
(0,45 m de tamanho de poros) a 100 volts por 1,5 horas e depois bloqueados
com 20mM de Tris e 0,13 mM de NaCl, pH 7,6 overnight a 4oC. Logo após,
foram hibridizados com anticorpos anti trans-sialidases e anti SAPA durante 1
hora à temperatura ambiente e a reação foi parada após 30 minutos.
3.9 Sequenciamento e mapeamento do Transcriptoma de CL-14
O cultivo
de
formas
epimastigotas, amastigotas intracelulares e
tripomastigotas derivadas de cultura de tecidos das cepas CL Brener e CL14
foram feitos de acordo com os métodos descritos por Chiari (1981) e Teixeira et
32
al. (1994). RNA total foi purificado de culturas de epimastigotas do clone CL-14
de acordo com os métodos descritos em Teixeira et al. (1994).
O mRNA foi extraído utilizando por cromatografia em colunas RNAeasy
Extration Kit (Qiagen) seguindo as instruções do fabricante. A purificação das
amostras de RNA foram realizadas utilizando o RNeasy MiniEluteTM Cleanup
Kit (Qiagem), seguindo as instruções do fabricante. As amostras limpas foram
quantificadas
com
o
aparelho
NanoDrop
ND-100
UV/Vis
(NanoDrop
Technologies, USA) e vizualizadas em gel de agarose desnaturante 1,2% a fim
de verificar a qualidade do RNA total, de acordo com procedimentos padrões
descritos em Ausubel et al., (1995).
Para a produção das bibliotecas de cDNA T. cruzi obtidos a partir de
cultura de células foi utilizado o TruSeq RNA Sample Preparation Kits v2
(Illumina) de acordo com as instruções do fabricante, utilizando primers
específicos, que segreguem as diferentes amostras e seus tempos amostrais.
As bibliotecas de cDNA foram sequenciadas no sistema Illumina Hiseq
1500 existente na facility do Dr. Najib M. El-Sayed. As reads foram identificadas
e filtradas pela sua qualidade, utilizando o software FASTQC (Andrews, 2010).
O mapeamento das reads contra o genoma de referência, genoma montado e
anotado do T. cruzi CL Brener, o qual já foi realizado para as amostras
previamente sequenciadas, se deu pelo software Tophat2 (Kim et al., 2013).
33
4. Resultados
4.1 Sequenciamento e Análise do Genoma do Clone CL14
Foram
gerados
aproximadamente
3,5
milhões
de
reads
no
sequenciamento de uma biblioteca de shotgun (WGS) obtida a partir do DNA
total extraído de formas epimastigotas do T. cruzi clone CL-14 com um total de
mais de 1,5 bilhões de nucleotídeos sequenciados (Tabela 3). Essas
sequências, em sua maioria, têm tamanho de aproximadamente 400pb (Figura
2), como é esperado para o sequenciamento realizado pela plataforma Roche
454 FLX-Titanium. Baseado no tamanho do genoma nuclear haplóide estimado
em 55Mb (Souza et al., 2011), o total de nucleotídeos sequenciados para a CL14 corresponde a uma cobertura de 27 vezes. Um tamanho de genoma similar
foi estimado para o clone CL Brener (El-Sayed et al., 2005) e a comparação de
bandas cromossomais separadas por pulsed-field gel electrophoresis (PFGE)
mostra um padrão de bandas similares entre CL-14 e CL Brener (Figura 3).
O genoma nuclear diplóide de 110Mb predito para o clone CL Brener,
uma estimativa baseada nos dados de sequenciamento, a qual é similar ao
genoma estimado para CL-14. O conteúdo GC estimado em 51%, baseado no
total de reads do genoma da CL-14 é também similar ao do genoma da CL
Brener, mas é maior que o conteúdo GC do clone Sylvio X10/1, o qual
representa 49,21% dos nucleotídeos sequenciados.
34
CL-14
CL Brener
Metodologia
454 FLX
Sanger
Contagem
3457102
1192680
Bases sequenciadas
1506882872
768436632
Cobertura
27x
14x
Contagem
43'906
4'008
Pares de bases
54782655
60372297
N50
1629
25950
Conteúdo CG
50,62%
51,00%
Reads
Contigs
Tabela
3
–
Dados
comparativos
entre
sequenciamento e montagem dos genomas dos
clones CL-14 e CL Brener.
35
Figura 2 - Número de reads de CL-14 pelo tamanho
em pares de base.
36
Figura 3 – Cariótipo molecular de
CL-14
e
CL
Brener
apresentando
os
cromossomos
destes
clones.
As
setas
vermelhas mostram algumas diferenças entre os
genomas.
Apesar
de
divergentes, os
cromossomos
são muito parecidos quanto aos
tamanhos
e
quantidade.
37
prevista experimentalmente por densitometria de pulse-field gel electrophoresis
(entre 106,4 e 110,7 Mb (Cano et al., 1995)), é ligeiramente maior que o
Alguns
resultados
diferem daqueles
com os obtidos durante o
sequenciamento do genoma do clone CL Brener, pois este foi sequenciado
pelo método de Sanger (El-Sayed et al., 2005a), diferentemente da CL-14 que
teve seu genoma sequenciado por pirosequenciamento. O tamanho das reads
geradas pelo sequenciamento de Sanger é maior que as reads geradas pelo
pirosequenciamento, 800 e 400 bases em média, respectivamente. Além disso,
o genoma do clone CL Brener foi sequenciado em mate pairs com 3, 10, 45 e
100Kb, ao passo que o genoma de CL-14 foi sequenciado com pair ends de
aproximadamente 3Kb. Essas diferenças foram suficientes para garantir a
montagem mais eficiente do clone CL Brener, onde foram gerados 4008
contigs em contraste com os mais de 43 mil contigs para o clone CL-14. A
grande dificuldade na montagem destes genomas se deve ao fato de o T. cruzi
ter 50% de repetições em seu código genético. Muitas destas repetições
ultrapassam o tamanho das reads, impossibilitando uma montagem correta dos
contigs e também a real representação dessas zonas repetitivas, no que diz
respeito ao seu tamanho e ocorrências ao longo do genoma.
Os resultados do genoma montado também diferem do genoma da CL
Brener pelos softwares de montagem (Celera Assembler para a montagem do
genoma da CL Brener e Newbler para a montagem do genoma da CL-14) e
pela
característica
inerente
ao pirosequenciamento que não consegue
38
sequenciar eficientemente regiões de homopolímeros com mais de 6
repetições (Ronaghi, 2000).
O genoma haplóide de CL Brener tem um número estimado de 12000
genes, organizados em longos clusters que são transcritos policistronicamente.
(El-Sayed et al., 2005a). Análises dos contigs de CL-14 indicam uma
organização genômica similar. A figura 4 apresenta dois arranjos de sintenia
entre cromossomos de CL Brener e os contigs montados de CL-14. Com uma
montagem incompleta e com contigs pequenos, a figura 4 contém apenas
trechos dos cromossomos de CL Brener escolhidos. As sequências de CL-14
são sintênicas com seus ortólogos em CL Brener por todas suas extensões,
não havendo inversão da polarização de codificação das fitas de DNA. Por
causa do grande número de contigs, não foi possível predizer a contagem
acurada do número total de genes a partir da montagem, uma vez que muitas
das open reading frames “ORFs” estão truncadas. Mais ainda, como
demonstrado à frente, assim como o clone CL Brener, a CL-14 tem o genoma
híbrido, constituído por dois distintos haplótipos, o que torna a montagem do
genoma ainda mais complexa e difícil. No entanto, para investigar a existência
de
mudanças
no
cariótipo
ou
na
presença
de
grandes
rearranjos
cromossomais, foram feitas hibridizações de bandas cromossomais separadas
por PFGE com diferentes sondas. Algumas diferenças nas localizações de
genes gp82 foram descritas por Atayde et al., 2004, que identificou a presença
de duas bandas cromossomais hibridizando com a sonda gp82 no clone
39
CL-14 que são ausentes na cepa CL. No entanto, como a cepa CL é formada
por uma população mista de diferentes clones, foi decidido comparar o
cariótipo molecular dos clones CL-14 e CL Brener. Os resultados estão
apresentados na Figura 5. Foram feitas duas hibridizações diferentes com o
cariótipo completo por PFGE, os quais incluíram sondas para sequências da
família multigênica MASP, para o gene de cópia simples GPI8 e duas
hibridizações em gel de eletroforese oriundo de digestões enzimáticas, com
sondas para sequência das também famílias multigênicas amastinas e DGF-1.
Os resultados indicam grande similaridade entre os clones tanto no tamanho
quanto na intensidade das bandas, apresentando divergência apenas com a
sonda da grande família MASP, aparentemente distribuída em grande parte
dos cromossomos dos genomas dos dois clones, onde duas bandas
apresentadas no clone CL Brener estão com menor intensidade no clone CL14. Esses resultados sugerem que não são encontrados grandes rearranjos
entre os dois genomas.
40
Figura 4 – Sintenia entre contigs de CL-14 e seus cromossomos homólogos em
CL Brener. As linhas horizontais representam trechos dos cromossomos
montados do clone CL Brener e contgs do clone CL Brener. Em vermelho, os
blocos sintênicos entre as sequências dos clones. Figura gerada pelo software
CONTIGuator.
41
Figura 5 – As figuras superiores apresentam bandas cromossomais separadas
por Pulse-Field Gel Electrophoresis (PFGE) e coradas com brometo de etídio.
Duas sondas de DNA genes diferentes foram utilizadas para hibridar com
membranas provenientes do PFGE (southern blot), mostrando mesmo número
de bandas e posições entre amostras de CL Brener e CL-14. As figuras
inferiores mostram hibridizações com sonda de DNA (southern blot), para os
genes amastina e DGF-1 em membranas provenientes de géis de eletroforese
digeridos por enzimas de restrição.
42
4.2 Análises Filogenéticas
Para determinar a qual grupo a CL-14 pertence, os marcadores
nucleares correspondentes à subunidade ribossomal 24S
rDNA e o gene
Spliced Leader (SL), como também um marcador para um gene do genoma
mitocondrial, a citocromo oxidase II (COII) foram analisados (De Freitas et al.,
2006). Reações em cadeia de polimerase (PCR) in silico foram realizadas
usando primers específicos para essas sequências e os tamanhos dos
amplicons gerados usando as reads de CL-14 como alvo, foram comparados
com os tamanhos esperados de amplicons correspondentes das sequências
genômicas do clone CL-Brener. Para o amplicon da citocromo oxidase II, nós
comparamos os tamanhos dos produtos de digestão por AluI, também
realizada in silico.
Como apresentado na Tabela 4, a comparação de fragmentos
resultantes da amplificação de marcadores 24S
rDNA e SL indicam que o
clone CL-14 deveria ser classificado como TcII, pois estão presentes amplicons
de 150pb para o marcador SL e 125pb para o marcador 24S
rDNA. Esses
resultados são encontrados para o clone Esmeraldo (TcII) e CL Brener (TcVI).
Adicionalmente, produtos de PCR correspondentes ao gene mitocondrial COII
resultam em 2 fragmentos de 81 e 294pb após a digestão com AluI, o que é
característico de cepas oriundas de T. cruzi tipos III, IV, V e VI.
43
Linhagem SL rDNA 24S
COII
Tc I
150
110
30, 81 e 264
Tc II
150
125
81, 82 e 212
Tc III
200
110
81 e 294
Tc IV
200
125
81 e 294
Tc V
150
110 e 125
81,264 e 294
Tc VI
150
125
81 e 294
Cl-14
150
125
81 e 294
Tabela 4 – PCR in silico de marcadores
utilizados na genotipagem do T. cruzi.
Tamanho dos amplicons em pares de base
para
cada
marcador molecular. Os
marcadores mini-exon SL e rDNA 24S
representam o genoma nuclear e o marcador
COII, o genoma mitocondrial.
44
Juntos, estes resultados, assim como os resultados descritos adiante, indicam
que, similarmente à CL Brener, CL-14 é um clone híbrido e deve ser
classificado como Tc VI. Como os dois clones foram isolados da mesma cepa e
baseado no fato que o marcador mitocondrial corresponde a TcIII, levantamos
a hipótese de que o clone CL-14 é derivado do mesmo evento de hibridização
que ocorreu entre cepas ancestrais
pertencentes
a TcII e TcIII, o qual,
similarmente ao clone CL Brener, manteve uma mitocôndria do parental Tc III.
Os resultados obtidos com análises in silico foram confirmadas in vitro pela
amplificação de DNAs purificados de culturas de epimastigotas de CL-14 e CL
Brener, utilizando primers que amplificam os marcadores SL, 24S
rDNA e
COII (Figura 6).
Além das análises desses marcadores, foi feito um agrupamento por
similaridade das sequências de aminoácidos de dois genes nucleares, MSH2 e
Tripanotiona redutase (TR), os quais apresentam, entre os haplótipos,
diferenças nas sequências de nucleotídeos e também um gene mitocondrial, o
COII. Os resultados, apresentados na figura 7 confirmam nossa predição de
que a CL-14 é muito próxima filogeneticamente da CL Brener e que sequências
pertencentes a dois haplótipos distintos (esmerado like e non-esmeraldo like)
são presentes no genoma da CL-14.
Alinhamentos de sequências entre 392310 reads do genoma de CL-14
que correspondem a regiões codificantes e sequências codificantes dos dois
haplótipos de CL Brener mostram que 175612 (44,7%) tem maior similaridade
45
com o haplótipo Esmeraldo like, 185497 (47,3%) com o haplótipo nonEsmeraldo like. Para um total de 31201 reads (8%), não foi possível distinguir
entre os dois haplótipos.
46
Figura 6 - Eletroforese dos amplicons dos marcadores: A mini-exon SL, B - rDNA 24S , C - COII. A coluna Controle
representa a PCR sem amostra de DNA. A amostra
Colombiana é uma cepa representante do TcI, Esmeraldo
TcII, 231 é uma cepa TcIII, 115 TcV, CL Brener e CL-14 são
TcVI.
47
Figura 7 - Árvores filogenéticas produzidas pelo
algoritmo neighbor-joining, a partir de sequências
peptídicas dos genes MSH2 e tripanotiona redutase
(genes nucleares) e citocromo oxidase II (gene
mitocondrial) entre CL-14 e CL Brener. Sequências do
clone Sylvio X10/1, um T. cruzi TcI foram adicionadas
para demonstrar a distância com esse DTU de T.
cruzi.
48
As reads de CL-14 foram alinhadas às CDS correspondentes aos alelos
esmeraldo like e non-esmeraldo like para verificar aonde eles se diferem. Os
genes homólogos de CL Brener têm aproximadamente 2,2% de diferença entre
si nas regiões codificantes (El-Sayed et al., 2005a). Essas diferenças são
provocadas por SNPs (single nucleotide polymorphism) entre as sequências. A
figura 8 apresenta um fragmento de alinhamento local e as diferenças entre os
dois haplótipos de um gene de uma proteína de ligação ao RNA, com
identificadores Tc00.1047053506211.70 e Tc00.1047053508895.50, que são
esmo like e non-esmo like, respectivamente. Ao alinharmos as reads de CL-14
com seus homólogos em CL Brener e analisarmos as suas regiões
polimórficas, verificamos que cada read alinha-se perfeitamente com apenas
um haplótipo. Sempre que um alelo de CL Brener tem sua região codificante
totalmente coberta pelas reads, o seu alelo homólogo também foi totalmente
coberto. Mais que isso, nenhuma read apresenta características para os dois
haplótipos ao mesmo tempo, apenas para um, sugerindo que o clone CL-14
tem dois alelos distintos para cada gene (Figura 9). Assim como o clone CL
Brener, pode apresentar alelos polimórficos, esmeraldo like e non-esmeraldo
like.
Em oposição aos resultados obtidos com os alinhamentos das reads de
CL-14 contra as CDS de CL Brener, muitos contigs de CL-14 apresentaram
características dos dois haplótipos, evidenciando que os haplótipos não foram
corretamente segregados durante a montagem dos contigs.
49
Esmo like
Non-esmo like
Figura 8- Parte do alinhamento entre os dois diferentes
haplótipos de CL Brener dos genes Tc00.1047053506211.70 e
Tc00.1047053508895.50, que codificam para proteínas que se
ligam ao RNA. As setas vermelhas indicam os polimorfismos
entre os mesmos.
50
Figura 9 - Alinhamento esquemático das reads de CL-14 com genes
homólogos de CL Brener em seus diferentes haplótipos. Os losangos
representam SNPs entre as sequências de CL. As sequências gênicas são
representadas pelas linhas longas e delimitadas e as reads de CL-14 são
representadas pelas linhas menores. Em vermelho, sequências atribuídas ao
haplótipo esmeraldo like (esmo like) e em azul, sequências atribuídas ao
haplótipo non-esmeraldo like ( non-esmo like). Nenhuma read de CL-14 alinhou
com os dois haplótipos concomitantemente nas regiões polimórficas.
51
4.3 Montagem e Análise do Genoma Mitocondrial de CL-14
Para definir o haplótipo do maxicírculo do clone CL-14, verificamos a
cobertura de suas reads nos maxicírculos dos clones Esmeraldo e CL Brener,
Tc II e III, respectivamente.
Como pode ser observado na figura 10, a partir de alinhamentos locais,
13907 reads de CL-14 alinharam-se com o maxicírculo de CL Brener e apenas
94 com fragmentos do genoma mitocondrial do clone Esmeraldo. As reads
mapeadas, selecionadas por seus best hits e sem redundância de alinhamento,
mostraram maior similaridade pelo maxicírculo TcIII. Todas as reads similares a
este clone, também foram alinhadas via MEGABLAST ao maxicírculo de CL
Brener, indicando que elas representam regiões mais conservadas desses
genomas.
Apenas 9 polimorfismos entre CL Brener e CL-14 foram identificados em
toda a extensão do maxicírculo, sendo elas provenientes de inserção ou
deleção, como mostra a tabela 5. Todos os genes do maxicírculo da CL Brener
estão representados no maxicírculo do clone CL-14, em alto grau de sintenia.
A
montagem
e
anotação
do
maxicírculo
de
CL-14
apresenta
aproximadamente 20,6Mb e contém, downstream aos genes que codificam
para as subunidades ribossomais 12S e 9S, todos os 18 genes codificadores
de proteínas previamente identificados no maxicírculo do clone CL Brener e
52
Figura 10 - Cobertura das reads de CL-14 nos maxicírculos de CL Brener e
Esmeraldo, Tc III e Tc II respectivamente. As réguas representam os genomas
linearizados e os retângulos azuis, as reads. No detalhe, a cobertura completa
do maxicírculo de CL Brener.
53
Posição
CL Brener
CL-14
Fenômeno
5943
GTTTTT
GTTTT
Deleção
6271
TAAAA
TAAA
Deleção
10789
TAAAAAAAAA
TAAAAAAAA
Deleção
14149
ATTTT
ATTT
Deleção
14638
AACA
AA
Deleção
16564
GTTTTT
GTTTT
Deleção
16989
GA
GTA
Inserção
17287
TAA
TA
Deleção
19829
CTTTTTTTT
CTTTTTTT
Deleção
20006
GTTT
GTT
Deleção
Tabela 5 – Polimorfismos encontrados entre os kDNAs
dos clones CL Brener e CL-14.
54
pertencentes ao maxicírculo, a partir de alinhamentos com MEGABLAST contra
o maxicírculo de CL Brener foram montadas pelo software CAP3. Os contigs
resultantes foram montados manualmente até formar um único scaffold, que
representa o genoma mitocondrial do clone CL-14 (Figura 11).
55
Figura 11 - Comparação entre os genomas mitocondriais de CL-14 e CL Brener
e os polimorfismos entre eles.
56
4.4 Análise Comparativa de Famílias Multigênicas
Uma vez que a montagem das reads do clone CL-14 resultou em um
genoma muito fragmentado, decidimos realizar análises de genes pertencentes
às grandes famílias multigênicas que são sabidamente envolvidas em
interações parasito-hospedeiro, baseadas nas reads de CL-14. Para isso foi
desenvolvido um script em PERL, onde as reads de CL-14 foram alinhadas por
alinhamento par a par e local via MEGABLAST contra todas as CDS do clone
de referência CL Brener. A extensão da cobertura desses alinhamentos foi
avaliada a fim de verificar se algum gene de CL Brener não está representado
no genoma do clone CL-14. Após procurar num total de 23216 genes
codificadores de proteínas preditos no genoma de CL Brener, concluímos que
todos os genes estão presentes no genoma da CL-14, o que indica que o
conteúdo genético de ambos é altamente similar. Análises comparativas
baseadas nas sequências das reads apresentam mais de 99,5% de identidade
entre sequências de famílias multigênicas descritas em CL Brener.
A fim de selecionar as reads ortólogas para cada CDS de CL Brener,
filtraram-se os alinhamentos, escolhendo o melhor hit tanto read a CDS quanto
de CDS a read. As reads dos melhores hits de cada CDS foram selecionadas e
utilizadas para gerar uma cobertura de alinhamentos, observando que cada
read foi selecionada para apenas um alinhamento. Com os alinhamentos, foi
possível determinar quantas reads cobrem cada base das CDS, gerando
57
informações de cobertura nucleotídeo a nucleotídeo. Os valores das coberturas
foram normalizados via z-score. O z-score é dado pela diferença da cobertura
do sequenciamento e da média das coberturas, dividido pelo seu desvio
padrão. Portanto, com base nos valores de cobertura do sequenciamento do
genoma e na cobertura das CDS montadas, foi possível estimar o número de
cópias de cada gene ou família multigênica analisados.
Além de apresentar grupos idênticos de genes, não existem grandes
diferenças no número de cópias entre membros de famílias multigênicas entre
os dois genomas (Tabela 6). Dois genes de cópia simples, MSH2 e PGP, foram
utilizados como referência para calibrar o software de contagem de número de
cópias. A Tabela 7 apresenta as identidades das sequências codificadoras de
proteínas entre CL Brener e CL-14. Todas as sequências codificadoras de
proteínas montadas e anotadas em CL Brener tem seus resultados no campo
CDS e, outras famílias multigênicas e grupos de ortólogos, estão descritos em
suas correspondentes linhas. Os resultados mostram alta similaridade entre as
sequências analisadas, onde as CDS tem um mínimo de identidade de 99,73%
(tabela 7), um valor maior que o encontrado entre as sequências codificantes
dos haplótipos de CL Brener, que é de 97,8% (El-Sayed et al., 2005ª).
58
Famílias gênicas
CL-14
CL Brener
Trans-sialidase
1463
1481
MASP
1399
1465
Mucinas
999
992
RHS
773
777
DGF-1
565
569
GP63
491
449
RNA helicases
156
157
Kinesinas
102
102
Tuzinas
83
83
Cruzainas (calpainas)
67
66
Dineína heavy chain
45
45
Amastinas
27
27
GAPDH
21
20
KMP-11
18
11
MSH2
2
2
PGP
2
2
Tabela 6 – Contagem dos membros das famílias gênicas nos
clones CL-14 e CL Brener, utilisando script desenvolvido neste
trabalho.
59
Identidade %
CDS
99,79
MASP
99,87
Trans-sialidase
99,80
RHS
99,74
DGF
99,84
GP63
99,73
RNA-binding
99,83
Amastin
99,69
Tabela 7 – Média das identidades
das sequências codificadoras de
proteínas entre CL Brener e CL-14.
CDS são todas as sequências
codificadoras montadas e anotadas
em CL Brener.
60
O algoritmo utilizado para a contagem de genes e membros de famílias
multigênicas com a cobertura das sequências de referência pelas reads,
desenvolvido por nosso grupo, foi capaz de estimar também a contagem de
genes real para o clone CL Brener, pois na ocasião da publicação do genoma
(El-Sayed et al., 2005ª) os autores contaram apenas o número de ORFs
montadas. Essa estratégia prévia deixa de representar sequências que não
foram montadas, seja por dificuldade com o processo de montagem genômica,
pela grande extensão de repetições ou cópias que são idênticas e não foram
segregadas corretamente. A contagem pela cobertura de reads contorna esses
impasses. Sequências “quimeras”, onde dois diferentes haplótipos foram
montados juntos em uma só sequência também são encontrados no genoma
montado do clone CL Brener. Em virtude da cobertura nucleotídeo a
nucleotídeo de nosso algoritmo, é possível analisar essas sequências e, se
houver SNPs entre elas, eles são detectados. Com as informações de SNPs,
pode-se segregar os diferentes haplótipos que por ventura tenham sido
montados juntamente.
Ajustando-se a identidade entre as reads de CL-14 e as ORFs de CL
Brener a 99,5%, é possível realizar não só a contagem dos genes de CL-14,
como também identificar as diferenças entre os alelos, pois, a essa
estringência, observa-se a cobertura apenas parcial de SNPs entre os
mesmos, uma vez que a cobertura dos SNPs será a metade das sequências
conservadas. O algoritmo é capaz de realizar a contagem de membros de
61
grandes famílias mesmo que a similaridade entre eles seja alta. Diminuindo-se
o cutoff de identidade até que a contagem de genes pare de convergir, pode-se
estimar a dimensão de famílias multigênicas com poucos representantes das
famílias. Isso é desejável nos casos onde os genes, quando tem suas
sequências muito semelhantes, foram montados juntamente em um ou poucos
representantes. A cobertura das sequências para de convergir por outros
fatores estabelecidos no algoritmo, principalmente pela seleção de ortólogos a
partir da escolha do melhor alinhamento recíproco.
O algoritmo tem como saída de resultados, cinco arquivos, sendo cada
um com: número de cópias preditas, figura de coberturas nucleotídeo a
nucleotídeo, arquivo de texto com as coberturas, alinhamento das reads em
formato XML e lista de reads com suas respectivas sequências da referência. A
figura 12 apresenta parte de um arquivo de texto com as coberturas
nucleotídeo a nucleotídeo (Figura 12-A) e gráficos com as coberturas dos dois
alelos do gene GP72 com reads de CL Brener bem como histogramas das
frequências das coberturas (Figura 12-B). É possível observar a topografia das
coberturas e, nos histogramas, a barra de maior valor encontra-se próxima à
cobertura sequenciada de cada alelo, que no caso do sequenciamento do
genoma do clone CL Brener, foi de sete vezes.
Com o intuito de verificar a acurácia do algoritmo, e se seus resultados
são compatíveis com os softwares disponíveis, foi feito também mapeamento
de uma sequência de trans-sialidase com sequências repetitivas a partir do
62
Figura 12. Resultados do algoritmo. A- Texto parcial das coberturas de cada
nucleotídeo da ORF mapeada. B – Gráficos da cobertura dos genes, onde a
abscissa representa a ORF em toda sua extensão. Os histogramas são as
frequências dos valores de coberturas. O valor encontrado acima dos
histogramas é o número de cópias predito.
63
software BWA, utilizando reads do clone CL-14. A Figura 13 apresenta
mapeamento pelo BWA e pelo nosso algoritmo, mostrando que, além das
coberturas
encontradas serem as mesmas, as posições dos motivos
repetitivos, que neste caso são degenerados, também foram preditas de
maneira semelhante.
O algoritmo de contagem de número de cópias de genes também foi
utilizado, em colaboração com os pesquisadores da Rede Genoma Brasileira
nos
estudos
das
famílias
multigênicas
presentes
nos
genomas
dos
tripanosomatídeos Angomonas deanei e Strigomonas culicis (Motta et al.,
2013) e Trypanosoma rangeli (em preparação). Esse algoritmo foi também
utilizado nos estudos sobre a caracterização dos genes que codificam para as
duas sub-famílias que codificam para as proteínas amastinas presentes no
genoma do clone CL Brener (Kangussu-Marcolino et al., 2013), A correção
estimativa do número de cópias de genes de amastinas dentro de grupos
específicos, previamente realizada por El-Sayed et. al., 2005a, possibilitou
inferir sobre clusters de genes desta família. Foi observado que genes de
amastinas do mesmo grupo são organizadas no mesmo cromossomo e em
tandem, separadas ou não por genes que codificam para proteínas tuzinas.
64
Figura 13 – Comparação de mapeamento sobre o gene TcTSSAPA Tc00.1047053509495.30 pelo BWA visualizado pelo IGV
com o algoritmo desenvolvido. Ambos foram feitos com reads do
clone CL-14. A figura superior, gerada pelos softwares BWA e
IGV, apresenta histograma de cobertura do gene pelas reads e
cada barra abaixo é uma read alinhada. A figura inferior foi gerada
pelo algoritmo desenvolvido e apresenta as coberturas de cada
nucleotídeo da sequência de referência pelas reads mapeadas.
65
4.5 Análises das diferenças nos genes codificando Trans-sialidases com
Repetições SAPA em CL Brener e CL-14
Vários motivos repetitivos foram reportados como associados ao haplótipo de
virulência em parasitos (Mendes et al., 2013). Utilizando
algoritmos
para
design de marcadores e PCR in silico, foram observadas diferenças nos
tamanhos
dos
genes
que
codificam
para
Trans-sialidases
Tc00.1047053507085.30, Tc00.1047053509495.30 e Tc00.1047053510787.10.
Tais divergências se dão pela menor quantidade de motivos repetitivos SAPA,
de sequência 5’-GACAGCAGTGCCCACGGT ACGCCCTCGACTCCCGTTGAC
AGCAGTGCCCACGGTACACCCTCGACTCCCGTT-3'.
Em
CL
Brener,
a
Trans-sialidase Tc00.1047053509495.30 possui 19 repetições SAPA enquanto
seu homólogo no clone CL-14 possui apenas 3 repetições.
A figura 14 apresenta a cobertura de reads dos clones CL Brener e CL14 mapeadas na sequência codificadora deste gene. A abcissa representa toda
a sequência de nucleotídeos e a ordenada, a cobertura por reads para cada um
dos nucleotídeos. A cobertura apresentada em verde refere-se a CL Brener. A
linha vermelha representa a cobertura pelas reads de CL-14.
A cobertura pelas reads de CL Brener na região da SAPA é
característica para sequências com apenas uma cópia no genoma, onde o
genoma foi sequenciado na ordem de 14 vezes. O mesmo resultado não é
66
Domínios N e C-terminais
Domínio da família das sialidases
Domínio lectina like
Domínio SAPA
Figura 14 - Coberturas da Trans-sialidase Tc00.1047053509495.30 pelas reads
genômicas de CL Brener e CL-14. Em verde, a cobertura nucleotídeo a
nucleotídeo da sequência em CL Brener. A linha vermelha mostra a cobertura
no clone CL-14. Abaixo do gráfico, a régua mostra as posições dos domínios.
67
observado para o clone CL-14. Suas reads, provenientes do sequenciamento
do genoma, cobrem a sequência do gene, porém claramente não apresentam
cobertura para toda a extensão das 19 cópias de SAPA, cobrindo apenas três
motivos SAPA repetitivos. Utilizando reads geradas no sequenciamento do
transcriptoma do clone CL-14 (veja resultados no item 4.6), obtivemos o
mesmo resultado, onde a cobertura da CDS é observada, porém não há
cobertura em toda a extensão das repetições SAPA (Figura 15). O trecho de
repetições SAPA possuem degenerações ao longo de sua extensão e, a
correspondência de cada uma no clone CL-14 em relação à CL Brener pode
ser observada nos resultados de cobertura das reads do sequenciamento
genômico. Isso se deve ao fato de o tamanho médio das reads geradas no
pirosequenciamento cobrirem mais de uma repetição SAPA. No entanto, as
reads geradas no sequenciamento do transcriptoma são menores e não
cobrem sequências de repetições suficientes para detectar, pelo transcriptoma,
a posição exata das repetições SAPA com suas degenerações.
Para verificar se in vitro os resultados encontrados in silico acerca da
baixa quantidade de repetições SAPA encontradas no clone CL-14, quando
comparado
ao
clone
CL
Brener,
podem
ser
também
observadas
experimentalmente, foram desenvolvidas duas estratégias baseadas em
amplificação das trans-sialidases com SAPA e em análises de southern blot. A
figura 16-A apresenta eletroforese em gel de agarose produtos de PCR obtidos
68
Cl-14 transcriptome
Rep1
Cl-14 transcriptome
Rep2
Cl-14 genome
CL Brener genome
Figura 15 – Cobertura nucleotídeo a nucleotídeo da Trans-sialidase
Tc00.1047053509495.30 por reads de sequenciamento genômico de CL
Brener e CL-14 e por reads do sequenciamento do transcriptoma de CL14. A sombra rosada indica a posição das repetições SAPA.
69
Figura 16 – A- Eletroforese de PCR com primers que
anelam em sequências flanqueadoras das repetições
SAPA. B- O gel da esquerda é eletroforese de fragmentos
de digestão das enzimas AluI, PuvII e HpaII, de DNA
genômico dos clones CL Brener e CL-14. O gel da direita
apresenta Southern Blot com sondas SAPA, hibridizadas
sobre as digestões.
70
com um par de primers que flanqueiam as repetições SAPA, sendo o primer F
upstream aos motivos, dentro da região codificadora e o primer R, downstream
aos motivos, fora da região codificadora. A amplificação em CL-14 apresenta
uma única banda de aproximadamente 500pb. A amostra de CL Brener gerou
um arraste, sem apresentar bandas bem definidas, o que é devido à formação
de muitos fragmentos com tamanhos variados. Esse fenômeno é causado pelo
fato das repetições SAPA, em grande quantidade, anelarem entre si, formando
fragmentos inespecíficos. Foi também realizado southern blot com sondas de
repetições SAPA sobre fragmentos de DNA de CL-14 e CL Brener digeridos
com as enzimas AluI, PvuII e HpaII (Figura 16-B). Tais enzimas possuem seu
sítio de corte dentro do motivo SAPA e são, portanto, capazes de cortar os
motivos repetitivos e, com hibridizações, quantificar as diferenças nos números
de repetições. Com a enzima AluI, é possível identificar um número maior de
bandas provenientes de digestão na amostra CL Brener e também a maior
intensidade dessas bandas, quando comparadas com CL-14. As hibridizações
com sonda SAPA sobre as digestões com enzimas PvuII e HpaII, apresentam
sinal mais forte em CL Brener, indicando maior presença dos motivos SAPA
neste clone. Isso confirma os dados obtidos in silico evidenciando que as transsialidases com SAPA (TcTS-SAPA) estão em número menor no clone CL-14 e
possuem quantidades menores de repetições SAPA em suas sequências
(Figura 17).
71
CL Brener
98,6% iden.
CL-14
100% iden.
...
..... .
.
..... ....
.
..... .......... ......... ......
Domínios sialidase e lectina
SAPA
Domínio transmembrana
Figura 17 – Desenho esquemático da organização das repetições SAPA nos
clones CL Brener e CL-14, presentes no gene Tc00.1047053509495.30.
72
Com o intuito de verificar, se as diferenças observadas nas análises
genômicas podem ser confirmadas na expressão de proteínas, foi feito um
ensaio de western blot com anticorpos anti-SAPA e anti-trans-sialidase com
proteínas totais dos clones CL Brener e CL-14, obtidas de culturas dos estágios
de vida epimastigota e tripomastigota (figura 18). São identificadas transsialidases nas amostras de tripomastigotas em CL Brener e CL-14, indicando
que ambas expressam trans-sialidases. Porém, quando utilizados anticorpos
anti-SAPA, apenas as amostras de tripomastigotas de CL Brener são
identificadas com sinal forte, e uma banda fraca na amostra de tripomastigota
de CL-14 é observada.
73
CL Br Try
CL Br Epi
CL-14Try
CL-14 Epi
CL Br Try
CL Br Epi
CL-14Try
CL-14 Epi
CL Br Try
CL Br Epi
CL-14Try
CL-14 Epi
MW
177 –
118 –
75 51 39 26 18 –
KDa
Comassie
Anti-SAPA
Anti-TS
Figura 18 – À esquerda, perfil de eletroforese de proteínas totais dos clones
CL Brener e CL-14, nos estágios de vida epimastigota e tripomastigota. No
meio, Western blot com anticorpo anti-SAPA. À direita, Western Blot com
anticorpos anti-trans-sialidase.
74
4.6 Sequenciamento e Mapeamento do Transcriptoma de CL-14
Com o objetivo de investigar se haveria diferenças no padrão global de
expressão gênica entre os dois clones, foram produzidas bibliotecas de cDNAs,
provenientes de mRNAs extraídos de formas epimastigotas de T. cruzi clone
CL-14 obtidas de culturas em meio LIT e de formas tripomastigota e
amastigotas obtidas 48 e 60 horas após a infecção de células de fibroblasto de
prepúcio humano (HFF). No caso das culturas de amastigotas, foram extraídos
os RNAs do parasito juntamente com RNA total das células hospedeiras. As
figuras 19 A apresenta perfil dos RNAs totais de amostra de células infectadas
por 48 horas contendo formas amastigotas e, a figura 19 B apresenta o perfil
dos RNAs totais das formas epimastigotas. Na figura 19 A observam-se dois
picos com um alto sinal de fluorescência correspondentes as subunidades
maior e menor do ribossomo da célula hospedeira e ainda picos de menor
intensidade, na mesma posição do rRNA da subunidade menor da célula
hospedeira e que são correspondentes as moléculas de rRNA do parasito. Em
19 B é possível observar picos com aproximadamente 2000 e 2151
nucleotídeos, provenientes da subunidade maior do rRNA do T. cruzi e 2221
nucleotídeos correspondente ao rRNA da subunidade menor do ribossomo do
parasito. Picos de menor peso molecular (< 200nt) correspondem a tRNA,
alguns pequenos RNAs ou são resultantes de degradação de RNA.
75
Figura 19 – Exemplos de perfis de RNAs totais e bibliotecas de cDNA. A- RNA
total extraído de amostras de células HFF infectadas com amastigotas a 48
horas após a infecção. B- RNA total de epimastigotas. C- Bibliotecas de cDNA
produzidas a partir de mRNA de células HFF e amastigotas. D- Bibliotecas de
cDNA produzidas a partir de mRNA de epimastigotas. Em C e D, os picos das
extremidades são marcadores de peso molecular.
76
Com uso de beads magnéticas contendo olido-dT ligados, os mRNAs
foram purificados e utilizados como fitas molde para produção das bibliotecas
de cDNAs a serem sequenciadas na plataforma Illumina HiSeq 1500. O
transcriptoma de cada estágio de vida foi sequenciado a partir de duplicatas
biológicas com fragmento de pair-ends de 300 nucleotídeos (Figura 19 C e D).
Cada fragmento (pair-end) teve 100nt de suas extremidades sequenciados
(reads de 100 nt), utilizando a facility de sequenciamento existente no
departamento de Plant Biology, na University of Maryland, onde realizei o
estágio de doutorado sanduíche.
As reads geradas foram mapeadas pelo software TopHat2 nos
cromossomos do genoma do Trypanosoma cruzi CL Brener versão 4.3 do
TritrypDB em duas estratégias diferentes. Uma das estratégias mapeou as
reads utilizando cada haplótipo da referência separadamente. A outra
estratégia mapeou as reads no genoma de referência completo, de uma só
vez. Como exemplo, descrevemos os resultados obtidos com o mapeamento
das reads obtidas da biblioteca gerada com RNA extraído de células infectadas
por 48h. Esse mapeamento feito contra o genoma completo, configurado para
não mapear uma read mais de uma vez, resultou em aproximadamente 19 e 11
milhões de pair-ends alinhados (para cada réplica biológica), onde 17 e 10,5
milhões de pair-ends tiveram ambas reads mapeadas, 16 e 9,5 milhões foram
mapeados na direção correta, sendo as reads R1 alinhadas sentido
downstream ao DNA de referência e as reads R2 alinhadas sentido upstream.
77
Aproximadamente 1,5 milhão de reads da réplica 1 e 847 mil reads da réplica
2, foram alinhadas em singletons, sem seus pares.
Os resultados dos mapeamentos contra o genoma completo e contra
cada haplótipo separadamente, visualizados pelo software IGV, demonstraram
novamente a natureza híbrida do genoma do clone CL-14 (Figura 20).
Devido ao fato de não ter sido ainda concluído o sequenciamento e
análise do transcriptoma do clone CL Brener de T. cruzi (clone de referência do
projeto genoma) não foi possível, com os dados gerados a partir das
bibliotecas de CL-14 realizar as análises comparativas que são o objetivo
dessa parte do trabalho. Essas análises estão em andamento em colaboração
com o grupo do Dr. Najib El-Sayed, da Universidade de Maryland. Uma vez
concluídas, essas análises nos permitirão determinara se existem diferenças
no conjunto de genes expressos nos vários estágios do ciclo de vida desses
dois clones de T. cruzi e que poderiam estar relacionadas às diferenças na
virulência observada entre eles.
78
Figura 20 – Mapeamento das reads do sequenciamento do mRNA de CL-14.
O gene utilizado como referência é o MSH2, um gene de cópia simples. As
duas figuras superiores apresentam mapeamento sobre o genoma completo.
As figuras de baixo mostram o mapeamento em cada haplótipo
separadamente, evidenciando os polimorfismos entre os haplótipos. As setas
mostram os polimorfismos encontrados em reads mapeadas nos haplótipos
errados. Cada barra cinza representa reads corretamente mapeadas com
seus mate pairs. Barras de outras cores são mapeamentos simples, sem
mate pairs.
79
5. Discussão
O sequenciamento do genoma completo do Trypanosoma cruzi clone CL
Brener (El-Sayed et. al., 2005ª) confirmou a natureza diplóide desse organismo
e ainda mostrou que se trata de um genoma híbrido, contendo 22570 genes
codificadores de proteínas. Segundo Arner et. al., 2007, essa predição de
conteúdo gênico é muito conservadora, sendo que o T. cruzi tem pelo menos o
dobro de genes, se forem consideradas todas as cópias de genes que fazem
parte de famílias multigênicas. Isso ocorre devido ao fato de genes
pertencentes a essas famílias multicópias terem disso montados de forma
incompleta e de forma não precisa, sem distinção dos diferentes haplótipos.
O sequenciamento se deu por WGS (whole-genome shotgun), utilizando
o método de Sanger, onde foram geradas aproximadamente 768 milhões de
nucleotídeos numa cobertura de 14 vezes o genoma. O sequenciamento do
clone CL-14 foi realizado também por WGS, porém via pirosequenciamento, e
gerou 1,5 bilhão de nucleotídeos, com cobertura de 27 vezes o genoma, em
contraste com o sequenciamento mais modesto do clone CL Brener.
Mesmo assim, nenhum dos sequenciamentos gerou grandes contigs,
devido ao caráter repetitivo das sequências desses genomas. Com um
tamanho médio das reads de 650 e 400 nucleotídeos para CL Brener e CL-14,
respectivamente, e a grande quantidade de sequências repetitivas do genoma
80
(mais que 50%) não foi possível fazer uma montagem completa desses
genomas.
Essas repetições incluem genes que codificam proteínas de
superfície e muitas outras famílias gênicas e repetições fora de sequências
codificantes. Com reads maiores, El-Sayed et al., 2005ª geraram 4008 contigs
para o clone CL Brener, atingindo N50 de 25950 nucleotídeos, um resultado
bem melhor quando comparado ao nosso, com um N50 de 1629 nucleotídeos
para o clone CL-14. Nos dois casos esses números são baixos, sabendo-se
que os cromossomos do T. cruzi variam de tamanho entre 0,51 e 3,27 milhões
de pares de bases (Souza et al., 2013). O N50 é um parâmetro que estima o
número de nucleotídeos da montagem correspondente à soma de contigs do
mesmo tamanho ou maiores (partindo-se do maior para o menor), com a qual
se atinge a metade do total de nucleotídeos do genoma. Essa medida, o N50 é,
portanto, uma estimativa de tamanho dos maiores contigs montados.
Para o estudo de genes individuais, a montagem do genoma do clone
CL Brener é satisfatória, pois os contigs montados são, na maioria das vezes,
maiores que o tamanho médio de sequências codificadoras de proteínas, cuja
média é 1513pb (El-Sayed et al., 2013). Dos mais de 40 mil contigs montados
com o genoma do clone CL-14, somente 10772 tem tamanhos maiores que o
tamanho médio das CDS de T. cruzi, mas entretanto muitos deles podem ser
quimeras geradas com sequencias dos dois haplótipos. A montagem do
genoma
da
CL
Brener resultou em 838
scaffolds, os
quais foram
posteriormente agregados em 41 pares de cromossomos preditos com
81
tamanhos entre 78 Kb a 2,4Mb (Weatherly et. al., 2009). Note-se que foram
montados cromossomos com tamanhos muito menores do que os tamanhos
estimados por PFGE. Note-se também que nesse trabalho, 9,3 milhões de
nucleotídeos provenientes de contigs ou singlets não foram incorporados na
montagem, ou seja, mesmo essa “montagem aperfeiçoada” publicada por
Weatherly et. al. (2009) ainda está bastante incompleta. Utilizando mapas de
sintenia com cromossomos do Trypanosoma brucei e bibliotecas de BACs do
projeto genoma original, esses autores propuseram a existência de 41 pares de
cromossomos no clone CL Brener. Dados de PFGE indicam a presença de 20
bandas cromossômicas. Estes dados não são contraditórios, visto que ocorre
co-migração de bandas cromossômicas em T. cruzi (revisado por Zingales, et
al., 1997). No caso da CL-14, uma tentativa de geração de scaffolds utilizando
o programa Mauve Multiple Genome Alignment (gel.ahabs.wisc.edu/mauve/ )
baseada nos contigs de CL Brener, não resultou em uma melhora na nossa
montagem.
Como perspectivas acerca da montagem do genoma do T. cruzi,
planejamos o sequenciamento de novo do genoma do clone CL Brener e CL-14
na plataforma de sequenciamento Illumina a qual, apesar de fornecer reads
menores, gera uma quantidade maior de informação, devido à grande
cobertura. Essa maior cobertura do genoma, somada ao sequenciamento
original, feito pela metodologia de Sanger, no caso de CL Brener tornará mais
fácil a segregação dos haplótipos e concatenação de contigs, com o intuito de
82
representar com fidelidade os cromossomos. Uma outra estratégia presente em
nossas perspectivas é a de sequenciar alguns cromossomos individuais com a
nova plataforma de sequenciamento, Single Molecule Real-Time (SMRT)
desenvolvida pela Pacific Biosciences, a qual permite gerar reads maiores
entre
10-20 Kb (English et al., 2012). Com isso torna-se possível montar
cromossomos inteiros, com aproximadamente 1000 sequencias de SMRT,
montagem essa onde os clusters contendo famílias multigênicas estariam
devidamente
representados.
Almejamos
que
essas
estratégias
de
sequenciamento combinadas possam fornecer dados suficientes para uma
montagem fiel do genoma e por fim fornecer resultados promissores para a
continuidade dos estudos genômicos do Trypanosoma cruzi.
O genoma nuclear diplóide de 110 Mb estimado para o clone CL Brener,
é similar ao tamanho do genoma observado para CL-14, 112Mb (Souza et al.,
2011). Em 2005, El-Sayed et al., estimou, com base em dados de sequencia
que o genoma nuclear do clone CL Brener possui 110Mb. Pequenas
divergências nas estimativas dos tamanhos dos genomas do CL Brener
refletem a dificuldade de montar o genoma com tamanha carga de repeti ções,
pois elas podem não ser representadas em todas suas extensões, acabando
por subestimar as predições. Ambos os genomas, CL Brener e CL-14 são
significante maiores do que o genoma nuclear do clone Sylvio X10/1, um T.
cruzi do grupo TcI, o qual tem 5,9Mb a menos de sequências haplóides
(Fránzen et al., 2011), quando comparado ao predito por El-Sayed et al., 2005.
83
Muitas das diferenças que justificam o tamanho reduzido do genoma do clone
Sylvio X10/1 são relacionadas aonmúmero de membros pertencentes a
grandes famílias multigênicas (Fránzen et al., 2011)
Os genomas de CL-14 e CL Brener são mais semelhantes entre si do
que os genomas de outras cepas de T. cruzi, no que diz respeito ao tamanho
destes e composição de bandas cromossômicas (Souza et al., 2011). As
hibridizações de sondas de DNA com bandas cromossômicas dos dois clones
(southern blots), confirmam esse resultado. A sonda MASP, de família
multigênica, mostra as mesmas bandas hibridizadas, sendo que a intensidade
de algumas não é a mesma. Essas diferenças de sinal devem ser devidas a
diferenças entre as sequências dentro da família gênica, mas não no conteúdo
gênico, visto que a estimativa no número de cópias é a mesma. O mesmo
resultado é observado para uma sonda que hibridiza com o gene de cópia
única GPI8 e em outros dois southern blots hibridizados com sondas dos genes
de amastina e DGF- , nesses casos utilizando fragmentos de DNA genômico
gerados por digestão enzimática.
Membros
da
família
Kinetoplastida
tem o
genoma
mitocondrial
conhecido como kDNA (kinetoplast DNA). O kDNA do T. cruzi consiste em
milhares de minicírculos variáveis e dezenas de maxicírculos. Os minicírculos,
com
aproximadamente
1,4Kb,
são
compostos
de
quatro
sequências
conservadas de 100 a 200 pares de bases repetidas e sequencias
84
hipervariáveis nas quais estão codificados os RNAs guias, ou gRNA (Ray,
1989). Os maxicírculos, por sua vez, tem aproximadamente 22Kb, 15Kb dos
quais correspondem a sequências codificadoras de proteínas mitocondriais
(Ruvalcaba-Trejo e Sturm, 2011; Junqueira et al., 2005).
O sequenciamento do maxicírculo da CL Brener mostrou que este pode
ser classificado como pertencente ao grupo TcIII, ou seja, no híbrido CL Brener
foi herdada uma mitocôndria da cepa parental TcIII. A cepa Esmeraldo é uma
cepa não híbrida pertencente ao grupo TcII e, portanto, possui a sequência do
maxicírculo do tipo TcII (Westenberg et al., 2006). O genoma do maxicírculo do
clone CL-14 foi montado tendo o maxicírculo do clone CL Brener como
referência, após a seleção dos reads com base no alinhamento com o genoma
mitocondrial de CL Brener. A grande cobertura de reads em determinados
pontos dos maxicírculos indicam regiões largamente repetitivas, as quais sendo
maiores que as reads geradas pelo pirosequenciamento, impedem uma
montagem correta dessas sequências.
O DNA mitocondrial da CL-14 confirma a natureza híbrida deste clone,
isolado da mesma cepa da qual foi também isolado o clone CL Brener, pois
ambos tem similaridade maior entre si quando comparada à sequencia do
genoma do maxicírculo da cepa Esmeraldo, que é TcII. Pode-se observar que
os genomas mitocondriais dos clones CL-14 e CL Brener apresentam alto grau
de sintenia. Apenas nove polimorfismos foram identificados entre os mesmos,
sendo que todos são, ou inserção ou deleção de nucleotídeos (Figura 9). Entre
85
as diferenças observadas, chamam a atenção as deleções de nucleotídeos nas
sequencias de citocromo oxidase I, a MURF1, MURF2, ND5 e inserção na
região codificadora de ND4. Visto que esses mRNAs sofrem extensas
modificações pós-transcricionais por adição de Uridinas (RNA editing) não
podemos afirmar por enquanto se essas inserções e deleções estariam
afetando a expressão dessas enzimas mitocondriais. No entanto, a maioria dos
polimorfismos é encontrada dentro de homopolímeros, os quais podem ter sido
erroneamente representados pelo sequenciamento do genoma do clone CL-14,
que foi realizado pela plataforma 454 FLX, Os nossos dados de RNA-seq
poderão servir para esclarecer essa questão e talvez identificar outras
diferenças relevantes que poderiam resultar nas diferenças de virulência entre
os clones.
Os marcadores moleculares nucleares desenvolvidos por Souto et al.
(1996), mini-exon SL e rDNA 24S
e um marcador mitocondrial da sequência
do gene citocromo oxidase II (COII) (De Freitas et al., 2006) foram utilizados
para análises in vitro e in silico para a determinação de qual grupo de T. cruzi
o clone CL-14 pertence. Todos os resultados indicam que este clone é, assim
como o clone CL Brener, pertencente ao grupo VI do T. cruzi (TcVI), um grupo
que abrange linhagens híbridas (Broutin et al., 2006). Outras análises indicam
que os haplótipos de CL-14 teriam a mesma origem filogenética dos haplótipos
de CL Brener, sendo eles esmeraldo like e non-esmeraldo like, ou seja, um
haplótipo TcII e outro haplótipo TcIII, como descrito por El-Sayed et al. (2005ª).
86
Essa conclusão, ou seja, de que o clone CL-14 pertence a mesma DTU que o
clone CL Brener e teriam a mesma origem filogenética, asseguram
que
análises genômicas baseadas em cobertura da reads de CL-14 sobre o
genoma de referência da CL Brener sejam conduzidas com confiança.
Dois genes nucleares de cópia simples e um gene mitocondrial também
de cópia simples, foram selecionados para montar as reads relacionadas a
esses genes de CL-14 e construir uma árvore filogenética. A árvore com
sequencias dos genes nucleares construída de forma independente das
sequencias do genoma de referência, resultou em duas sequências para cada
gene e uma única sequência para o gene mitocondrial. Alinhamentos múltiplos
e globais foram realizados entre as sequências de cada gene, com dados dos
clones CL Brener e CL-14 e árvores filogenéticas foram feitas com os
resultados dos alinhamentos. É notável que os alelos dos genes nucleares
clusterizam-se com seu alelo específico, confirmando que se trata de um
genoma híbrido e que os haplótipos de CL-14 são oriundos de parentais TcII e
TcIII. Outliers T. cruzi tipo I, genes do clone Sylvio X10/1, foram adicionados às
árvores produzidas para identificar se algum dos alelos da CL-14 estaria
filogeneticamente mais próximo à TcI, o que não foi observado. Os mesmos
resultados foram encontrados na árvore produzida com gene mitocondrial,
apresentando que o gene de CL-14 é mais próximo ao de CL Brener que Sylvio
X10/1, indicando que o haplótipo do genoma mitocondrial de CL-14 é TcIII,
assim como kDNA de CL Brener (Westenberger et al., 2006).
87
Uma vez que não foi possível montar o genoma do clone CL-14 todas as
análises e comparações genômicas foram realizadas com base nos reads, ou
seja, pela cobertura das sequências de genoma de referência, o clone CL
Brener. Para isso, as reads de CL-14 foram alinhadas contra todas as
sequências de CL Brener que estão devidamente montadas, sem mistura de
haplótipos. Nenhuma das reads que apresenta ao menos uma característica
específica de algum haplótipo apresenta alguma característica do outro
haplótipo. Para todo polimorfismo entre reads de CL-14 e o genoma de
referência nas regiões onde o genoma de referência apresenta polimorfismos
entre seus haplótipos, são observadas reads de CL-14 correspondentes
apenas a um dos haplótipos. Isso confirma novamente que o genoma do clone
CL-14 além de ser híbrido, não possui genes quimera ou mistura de
sequências entre seus alelos.
A mesma análise foi realizada para a família das amastinas, porém com
mais representantes desta família, a qual possui 12 cópias montadas na versão
do genoma publicada por El-Sayed (2005a). Ainda que a função das amastinas
não tenha sido elucidada, a expressão desses genes na superfície de
amastigotas em T. cruzi e Leishmania spp, sugere que elas participem de
importantes interações com as células do hospedeiro mamífero (Rochette et
al., 2005). A diversidade de sequência nesta família, a qual se deve a alguns
fatores, como taxas de mutação e mecanismos de conversão de genes,
88
poderia estar relacionada ao papel que as amastinas possam ter nessas
interações com proteínas distintas do hospedeiro (Cerqueira et al., 2008).
Os clones de T. cruzi CL-14 e CL Brener contem o mesmo conteúdo
gênico, incluindo housekeeping genes que codificam famílias multigênicas de
proteínas de superfície e genes de função desconhecida. Não há ORFs ou
família gênica descritos em CL Brener que não tenham cobertura total ou
parcial em CL-14. Não foi possível identificar ORFs específicos do clone CL-14
e, portanto, não presentes em CL Brener. Por outro lado, Frazén et. al., 2011ª,
encontraram diferenças em 6 ORFs presentes em CL Brener mas ausentes em
Sylvio X-10. Os autores ainda verificaram que as dimensões das famílias
multigênicas entre esses clones não são semelhantes, em sua maioria. Foram
identificadas diferenças nas sequências nucleotídicas de muitos homólogos,
principalmente em famílias multigênicas. Tais diferenças são relacionadas a
SNPs, adição ou deleção de motifs característicos dessas sequências e
extensão de repetições. Essas divergências também foram observadas
comparando os genomas de CL Brener e Sylvio X10/1, entre subespécies de T.
brucei (Jackson et. al., 2010) e entre espécies de Leishmania (Peacock et. al.,
2007). Diferentemente do observado para Sylvio X10/1, não observamos
grandes diferenças na quantidade de cópias de genes de famílias multigênicas.
Nas nossas buscas por características genômicas que poderiam ser
correlacionadas com diferenças de virulência entre os clones estudados, foi
identificado um número menor de repetições de aminoácidos presentes em um
89
sub-grupo de genes da família multigênica trans-sialidase. A família das transsialidases são proteínas de membrana, importantes alvos de estudo, pois
participam da interação entre o parasito e hospedeiro. Elas transferem o ácido
siálico das células do hospedeiro para a superfície celular do parasito,
modulando a ação do sistema imunológico do hospedeiro (Schauer et al.,
1983) e participam de outros aspectos da interação parasito-hospedeiro (DcRubin e Schenkman, 2012).
Algumas trans-sialidases do subgrupo I possuem repetições SAPA (shed
acute phase antigen) em sua porção C-terminal (Pollevick et al., 1991) que
parecem ter o papel de aumentar a meia-vida da proteína liberada no sangue
do hospedeiro (Buscaglia et al., 1999). Tais repetições são alvos do sistema
imune adaptativo do hospedeiro. Anticorpos não inibitórios são gerados contra
a proximidade do sítio catalítico e do domínio lectina das trans-sialidases
(Pitcovsky et al., 2002) e, junto com as repetições SAPA altamente
imunogênicas, atrasam a formação de anticorpos inibitórios ou neutralizantes,
os quais controlam os níveis do parasito (Risso et al., 2007).
O clone CL-14 possui trans-sialidases do subgrupo I com repetições
SAPA, TcTS-SAPA, porém com um número menor de repetições do as TcTS
do subgrupo I do clone CL Brener. Observamos que, para a TcTS-SAPA
Tc00.1047053509495.30 de CL Brener, 19 SAPAs são encontradas enquanto
seu homólogo no clone CL-14 possui apenas 3 repetições. Com um total de 12
aminoácidos
em
cada
SAPA,
algumas
90
degenerações
específicas
e
conservadas são observadas, tanto em CL Brener quanto em CL-14. Pela
cobertura das reads de CL-14 sobre essa TcTS-SAPA em questão (figura 13) é
possível observar, com base nas degenerações, que as sapas cobertas em CL14 não são apresentadas na mesma ordem que em CL Brener.
Esses dados foram confirmados pela amplificação de fragmentos de
TcTS-SAPA provenientes dos dois clones (figura 16-A). Foram gerados muitos
amplicons para o clone CL Brener, devido ao fato da grande quantidade de
repetições SAPA, as quais podem anelar entre si, formando novas fitas de
DNAs que servem de substrato para a DNA polimerase que acaba gerando
fragmentos de diversos tamanhos. Observa-se, devido a esse fenômeno, um
arraste no gel, aonde deveriam ser encontradas bandas bem definidas, como o
que acontece com o clone CL-14, o qual, por possuir poucas repetições SAPA
não provoca esse resultado.
Southern blots com sondas que hibridizam com as repetições SAPA
foram realizados sobre membranas transferidas de eletroforese a partir de
digestões de DNA total dos clones com enzimas que clivam as repetições
SAPA. Foi possível observar para as três digestões realizadas que quantidades
maiores de bandas foram hibridizadas nas amostras de CL Brener em
comparação com CL-14. Além disso, as intensidades de sinais obtidos nas
hibridizações das digestões com as enzimas PuvII e HpaII são maiores em CL
Brener e também o tamanho das bandas é diferente, confirmando os dados in
91
silico, nos quais foi predito que o clone CL-14 possui menor quantidade de
cópias das repetições SAPA.
Com o intuito de verificar a expressão das proteínas TcTS-SAPA, foram
realizados westerns blots com anticorpos anti-SAPA e anti-transi-sialidases nas
fases de vida epimastigota e tripomastigota de ambos os clones (figura 18). Os
anticorpos anti-trans-sialidases reconhecem proteínas de tripomastigotas de
CL-14
com
intensidade
semelhante
às
amostras
de
proteínas
de
tripomastigotas do clone CL Brener. A intensidade semelhante mostra que os
genes que codificam trans-sialidases estão presentes em ambos os clones.
Não são observados sinais nas amostras de proteínas de epimastigotas, como
esperado, pois em T. cruzi, as trans-sialidases são proteínas de superfície que
interagem com o hospedeiro vertebrado e não com o vetor invertebrado. Ao
realizar o experimento com anti-corpos anti-SAPA, observam-se hibridizações
com intensos sinais e várias bandas no clone CL Brener na fase tripomastigota,
porém sinal fraco e número bem menor de bandas para o clone CL-14 na fase
tripomastigota. Este resultado também confirma nossas observações in silico
de que o clone CL-14 possui TcTS-SAPA, funcionais, porém com número
menor de repetições SAPA e, provavelmente, número também menor de TcTSSAPA.
As trans-sialidases do subgrupo I estão presentes nos cromossomos 33
do T. cruzi. Assim como na montagem dos outros cromossomos, é possível
identificar sequências de ambos haplótipos que foram concatenadas, grandes
92
gaps e clusters de sequências repetitivas que podem ou não estar
corretamente representadas. Além da possibilidade de ter sido feita uma
montagem incorreta de partes do cromossomo, há também o viés da não
representação correta das repetições SAPA. Uma de nossas perspectivas é
sequenciar novamente esses cromossomos com sequenciamento de reads
longas (>10Kb) (English et al., 2012) e montar corretamente os mesmos. Isso
possibilitará verificar a sequência exata dessas moléculas e comparar com
maior confiança as diferenças entre as TcTS-SAPA dos clones CL Brener e
CL-14. Além disso, poderemos verificar se houve uma grande deleção de
clusters inteiros no genoma do clone CL-14, em comparação com o genoma do
clone CL Brener, deleção apenas das repetições SAPA de alguns genes que
codificam para trans-sialidases do subgrupo I, ou mesmo adição dessas
repetições nos genes de CL Brener.
Com poucas diferenças genômicas encontradas que possam ser
relacionadas à virulência, temos também como objetivo realizar o estudo de
transcriptômica comparativa. Para tal, os níveis de expressão dos genes entre
os clones aqui estudados serão avaliados nos diferentes estágios de vida,
principalmente na diferenciação entre as fases amastigota e tripomastigota,
que são fases da vida do parasito onde este está intimamente relacionado e
em contato com o hospedeiro vertebrado. Até este momento, temos
sequenciados cDNAs provenientes dos mRNAs do clone CL-14, nos estágios
epimastigota, amastigota e tripomastigota, pela tecnologia RNA-seq. Esses
93
sequenciamentos foram realizados na plataforma Illumina, a qual gera grande
quantidade de reads, na proporção correta de sequências presentes no
transcriptoma do parasito, o que é ideal para avaliar os níveis de expressão
dos transcritos.
Em colaboração com o grupo do Dr. Najib M. El-Sayed, responsável pelo
estudo do transcriptoma do clone CL Brener, faremos as análises comparativas
desses dois transcriptomas a fim de verificar diferenças significativas entre os
níveis de expressão entre os genes dos clones, no que diz respeito à
capacidade de infecção e desenvolvimento no organismo do hospedeiro
vertebrado. Essas análises encontram-se em andamento.
94
6. Referências Bibliográficas
Andrade, S.G. Caracterização de cepas do Trypanosoma cruzi isoladas no
Recôncavo Baiano. Rev Patol Trop 3: 65-121, 1974.
Andrews, S.; http://www.bioinformatics.babraham.ac.uk/projects/fastqc/, 2010.
Araújo, P. R.; Teixeira, S. M. Regulatory elements involved in the posttranscriptional control of stage-specific gene expression in Trypanosoma cruzi:
a review. Mem. Inst. Oswaldo Cruz, Rio de Janeiro, v. 106, n. 3, 2011 .
Archer, S.K.; Inchaustegui, D.; Queiroz, R.; Clayton, C. The Cell Cycle
Regulated Transcriptome of Trypanosoma brucei. PLoS ONE 6(3), 2011.
Arner, E.; Kindlund, E.; Nilsson, D.; Farzana, F.; Ferella, M.; Tammi, M. T.;
Andersson, B.; Database of Trypanosoma cruzi repeated genes: 20,000
additional gene variants. BMC Genomics, 8:391, 2007.
Atayde V.D.; Neira, I.; Cortez, M.; Ferreira, D.; Freymuller, E.; Yoshida,
N.;Molecular basis of non-virulence of Trypanosoma cruzi clone CL-14. Inter. J.
Parasitol., 34: 851-60, 2004.
Ausubel, F.M., Brent, R. and Kingston, R.E. (1995). Current Protocols in
Molecular Biology. New York Greene Publishing Associates and WileyInterscience.
Berriman, M.; Ghedin, E.; Hertz-Fowler, C.; Blandin, G.; Renauld,
H.;Bartholomeu, C. C.; Lennard, N. J.; Caler E. et al. The genome of the
African trypanosome Trypanosoma brucei, Science 309, pp. 416–422, 2005.
Branche, C., Ochaya, S., Aslund, L., Andersson, B., 2006. Comparative
karyotyping as a tool for genome structure analysis of Trypanosoma cruzi. Mol.
Biochem. Parasitol. 147, 30–38.
Brandao, A., Urmenyi, T., Rondinelli, E., Gonzalez, A., de Miranda, A.B.,
Degrave, W., 1997.Identification of transcribed sequences (ESTs) in the
Trypanosoma cruzi genome project. Mem. Inst. Oswaldo Cruz 92, 863–866.
Brener, Z.; Chiari, E. Variações morfológicas observadas em diferentes
amostras de Trypanosoma cruzi, Rev. Inst. Med. Trop. São Paulo 5, pp. 220–
224, 1963.
95
Brener, Z.; Andrade, Z.;Barral-Neto, M. Trypanosoma cruzi e doença de
Chagas. Guanbara-Koogan, 2ª.Edição, 2000.
Burgos, J.M.et al. Direct molecular profiling of minicircle signatures and
lineages of Trypanosoma cruzi bloodstream populations causing congenital
Chagas disease, International Journal of Parasitology 37 (12), pp. 1319–1327,
2007.
Buscaglia, C. A.; Alfonso, J.; Campetella, O.; Frasch, A. C.; Tandem amino acid
repeats from Trypanosoma cruzi shed antigens increase the half-life of proteins
in blood. Blood, 93, 2025-2032, 1999.
Broutin, H.; Tarrieu, F.; Tibayrenc, M.; Oury, B.; Barnabé, C.; Phylogenetic
analysis of the glucose-6-phosphate isomerase gene in Trypanosoma cruzi.
Experimental Parasitol. 113:1–7, 2006.
Campos, P. C.;Bartholomeu, D. C.; Da Rocha, W. D.;Cerqueira, G. C.;Teixeira,
S.M.R. Sequences involved in mRNA processing in Trypanosoma cruzi,
International Journal for Parasitology, Volume 38, Issue 12, Pages 1383-1389,
2008.
Cano, M.I., Gruber, A., Vazquez, M., Cortes, A., Levin, M.J., Gonzalez, A., et
al., 1995. Molecular karyotype of clone CL Brener chosen for the Trypanosoma
cruzi genome project. Mol. Biochem. Parasitol. 71, 273–278.
Cerqueira G. C.; Bartholomeu, D. C.; Da Rocha, W. D.;Hou, L.;Freitas-Silva, D.
M.;Machado, C. R.;El-Sayed, N. M.;Teixeira, S. M. R. Sequence diversity and
evolution of multigene families in Trypanosoma cruzi, Molecular and
Biochemical Parasitology, Volume 157, Issue 1, Pages 65-72, 2008.
Cerqueira, G.C., Da Rocha, W.D., Campos, P.C., Zouain, C.S., Teixeira, S.M.,
2005. Analysis of expressed sequence tags from Trypanosoma cruzi
amastigotes. Mem. Inst. Oswaldo Cruz 100, 385–389.
Chervitz, S. S.; Dagdigian, C.; Fuellen, G.; Gilbert, J. G.; Korf, I.; Lapp, H. et al.
The Bioperl toolkit: Perl modules for the life sciences. Genome
Res.;12(10):1611–1618, 2002.
Chevreux, B; Pfisterer, T.; Drescher, B.; Driesel, A. J.; Mülle,r W. E.; Wetter, T.;
Suhai, S.Using the miraEST assembler for reliable and automated mRNA
transcript assembly and SNP detection in sequenced ESTs. Genome Res
14(6): 1147–1159, 2004.
96
Chiari E. (1981). Diferenciação do Trypanosoma cruzi em cultura. PhD thesis,
Universidade Federal de Minas Gerais, Belo Horizonte.
Cribb, P.; Serra, E. One and two-hybrid analysis of the interactions between
components of the Trypanosoma cruzi spliced leader RNA gene promoter
binding complex. Int J Parasitol 39: 525-532, 2008.
Da Rocha, W.D.; Otsu, K.; Teixeira, S. M.; Donelson, J. E.; Tests of
cytoplasmic RNA interference (RNAi) and construction of a tetracyclineinducible T7 promoter system in Trypanosoma cruzi. Mol Biochem Parasitol
133: 175–186. 2004.
Dc-Rubin, S. S.; Schenkman, S.; Trypanosoma cruzi trans-sialidase as a
multifunctional enzyme in Chagas’ disease. Cellular Microbiology, v. 14, 2012.
De Freitas, J. M.; Augusto-Pinto, L.; Pimenta, J. R.; Bastos-Rodrigues, L.;
Goncalves, V. F.et al. Ancestral genomes, sex, and the population structure of
Trypanosoma cruzi. PLoS Pathog. 2:e24, 2006.
El-Sayed, N. M.; Myler, P. J.; Bartholomeu, D. C.; Nilsson, D.; Aggarwal, G.et
al. The genome sequence of Trypanosoma cruzi, etiologic agent of Chagas
disease. Science 309: 409–415, 2005a.
El-Sayed, N.M.; Myler, P. J.; Blandin, G.; Berriman, M.; Crabtree, J.; Aggarwal,
G.; Caler, E.; Renauld, H.; Worthey, E. A.; Hertz-Fowler, C.; Ghedin, E.;
Peacock, C.; Bartholomeu, D. C.et al. Comparative genomics of trypanosomatid
parasitic protozoa. Science 309:404-409, 2005b.
English, A. C.; Richards, S.; Gibbs, R. A. Mind the Gap: Upgrading Genomes
with Pacific Biosciences RS Long-Read Sequencing Technology. Plos One,
7(11), 2012.
Franzén O.; Arner, E.; Ferella, M.; Nilsson, D.; Respuela, P.; Carninci, P.;
Hayashizaki, Y.; Aslund, L.; Andersson, B.; Daub, C. O. The Short NonCoding
Transcriptome of the Protozoan Parasite Trypanosoma cruzi. PLoS Neglected
Tropical Diseases, 5(8): e1283. 2011.
Franzén, O.; Ochaya, S.; Sherwood, E.; Lewis, M. D.; Llewellyn, M. S. et
al. Shotgun Sequencing Analysis of Trypanosoma cruzi I Sylvio X10/1 and
Comparison with T. cruzi VI CL Brener. PLoS Negl Trop Dis 5(3): e984, 2011.
97
Franzén O, Talavera-López C, Ochaya S, Butler CE, Messenger LA, Lewis MD,
Llewellyn MS, Marinkelle CJ, Tyler KM, Miles MA, Andersson B.; Comparative
genomic analysis of human infective Trypanosoma cruzi lineages with the batrestricted subspecies T. cruzi marinkellei. BMC Genomics. 2012 Oct 5;13:531.
Freitas, J.M.; Lages-Silva, E.; Crema, E.; Pena, S. D. J.; Macedo, A. M.; Real
time PCR strategy for the identification of major lineages of Trypanosoma cruzi
directly in chronically infected human tissues. Int J Parasitol. 35:411–41, 2005.
Freitas, J. M.; Augusto-Pinto, L.; Pimenta, J. R.;
Gonçalves, V. F.; Teixeira, S. M.; Chiari,E.; Junqueira,
Macedo, A. M.; Machado, C. R.; Pena, S. D. Ancestral
population structure of Trypanosoma cruzi. PLoS Pathog
Bastos-Rodrigues, L.;
A. C.; Fernandes, O.;
genomes, sex and the
2: e24, 2006.
Fullwood, M.
J.; Wei, C. L.; Liu, E. T.; Ruan, Y. Next-generation DNA
sequencing of paired-end tags (PET) for transcriptome and genome
analyses. Genome Res.;19:521-532, 2009.
Galardini, M.; Biondi, E. G.; Bazzicalupo, M.; Mengoni, A.; CONTIGuator: A
Bacterial Genomes Finishing Tool for Structural Insights on Draft Genoms.
Source Code for Biology and Medicine, 6:11, 2011.
Hartmann, C.; Hotz, H. R.; McAndrew, M,; Clayton, C. Effect of multiple
downstream splice sites on polyadenylation in Trypanosoma brucei. Mol
Biochem Parasitol 93: 149-152, 1998.
Henriksson, J., Porcel, B., Rydaker, M., Ruiz, A., Sabaj, V., Galanti, N., et al.,
1995. Chromosome specific markers reveal onserved linkage groups in spite of
extensive chromosomal size variation in Trypanosoma cruzi. Mol. Biochem.
Parasitol. 73, 63–74.
Herrera,C.; Bargues, M. D.; Fajardo, A.; Montilla, M.;Triana, O.; Vallejo, G. A.;
Guhl, F.Identifying four Trypanosoma cruzi I isolate haplotypes from different
geographic regions in Colombia. Infect Genet Evol 7: 535-539, 2007.
Hotez, P. J.;Molyneux,D. H.; Fenwick, A.et al. Control of neglected tropical
diseases, N Engl J Med 357, pp. 1018–1027, 2007.
Huang, X.;Madan, A. CAP3: A DNA sequence assembly program.Genome
Res., 9 868-877, 1999.
Ivens, A.C.; Peacock, C. S.;Worthey, E. A.; Murphy, L.; Aggarwal, G.; Berriman,
M.; Sisk, E.; Rajandream, M. A. et al. The genome of the
kinetoplastid parasite, Leishmania major. Science 309, pp. 436–442, 2005.
98
Jackson, A. P.; Sanders, M.; Berry, A.; McQuillan, J.; Aslett, M. A.; Quail, M. A.;
Chukualim, B.; Capewell, P.; MacLeod, A.; Melville, S. E.; Gibson, W.; Barry, J.
D.; Berriman, M.; Hertz-Fowler, C.The genome sequence of Trypanosoma
brucei gambiense, causative agent of chronic human african trypanosomiasis.
PLoS Negl Trop Dis. 4:e658, 2010.
Junqueira, C.; Gerrero, A. T.; Galvão-Filho, B.; Andrade, W. A.; Salgado, A. P.;
Cunha, T. M.; Robert, C.; Campos, M. A.; Penido, M. L.; Mendonça-Previato, L.;
Previato, J. O.; Ritter, G.; Cunha, F. Q.; Gazzinelli, R. T.;
Trypanosoma cruzi adjuvants potentiate T cell-mediated immunity induced by a
NY-ESO-1 based antitumor vaccine. Plos One, vol. 7, 2012
Kangussu-Marcolino, M. M.; de Paiva, R. C.; Araújo, P. R.; Mendonça-Neto, R.
P.; Lemos, L.; Bartholomeu, D. C.; Mortara, R. A.,; DaRocha, W. d., Teixeira, S.
M. T.; Distinct genomic organization, mRNA expression and cellular localization
of members of two amastin sub-families present in Trypanosoma cruzi. BMC
Microbiology, v. 13, p. 10, 2013.
Kim, D.; Pertea, G.; Trapnell, C.; Pimentel, H.; Saizberg, S.; TopHat2: accurate
alignment of transcriptomes in the presence of insertions, deletions and gene
fusions, Genome Biology, 14, R36, 2013.
Kim, K.S.; Teixeira, S.M.; Kirchhoff, L.V.; Donelson, J.E.; Transcription and
editing of cytochrome oxidase II RNAs in Trypanosoma cruzi. J Biol Chem, 2,
1994.
Kirchhoff, L. V.; Epidemiology of American Trypanosomiasis. In: Weiss, L. M.;
Tanowitz, H. B.; Kirchhoff, L. V.; Advances In Parasitology: Chagas Disease.
Elsevier, 2011. 1-14.
Kirchhoff, L. V.; Hieny, S.; Shiver, G. M.; Snary, D.; Sher, A. Cryptic epitope
explains the failure of a monoclonal antibody to bind to certain isolates of
Trypanosoma cruzi. J. Immunol. 133, 2731–2735, 1984.
Kolev, N. G.; Franklin, J. B.; Carmi, S.; Shi, H.; Michaeli, S. et al. The
Transcriptome of the Human Pathogen Trypanosoma brucei at SingleNucleotide Resolution. PLoS Pathog 6(9), 2009.
Larkin, M.A.; Blackshields, G.; Brown, N.P.; Chenna, R.; McGettigan, P.A.;
McWilliam, H.; Valentin, F.; Wallace, I.M.; Wilm, A.; Lopez, R.; Thompson, J.D.;
Gibson, T.J.;Higgins, D.G. ClustalW and ClustalX version 2. 2948, 2007.
Li H.; Durbin, R.; Fast and accurate long-read alignment with Burrows-Wheeler
transform. Bioinformatics, 26, 589-595, 2010.
99
Li, H.; Handsaker, B.; Wysoker, A.; Fennell, T.; Ruan, J.; Homer, N.; Marth, G.;
Abecasis, G.; Durbin, R.; 1000 Genome Project Data Processing Subgroup;
The Sequence alignment/map (SAM) format and SAMtools. Bioinformatics, 25,
2078-9, 2009.
Liang, X.H.; Haritan, A.; Uliel, S.; Michaeli, S. Trans and cis splicing in
trypanosomatids: mechanism, factors, and regulation. Eukaryot Cell 2: 830-840,
2003.
Lima, M. T.; Jansen, A. M.; Rondinelli, E.; Gattass, C. R. Trypanosoma cruzi:
properties of a clone isolated from the CL strain. Parasitol. Res., 77: 77-81,
1990.
Lima, M. T.; Lenzi, H. L.; Gattass, C. R. Negative tissue parasitism in mice
injected with a non-infective clone of Trypanosoma cruzi. Parasitol. Res. 81: 612, 1995.
López-Estraño, C.; Tschudi, C.; Ullu, E.Exonic sequences in the 5' untranslated
region of alpha-tubulin mRNA modulate trans splicing in Trypanosoma brucei.
Mol Cell Biol 18: 4620-4628, 1995.
Machado, C. A.; Ayala, F. J. Nucleotide sequences provide evidence of genetic
exchange among distantly related lineages of Trypanosoma cruzi. Proc Natl
Acad Sci USA, 98:7396-7401, 2001.
Martínez-Calvillo, S.; Yan, S.; Nguyen, D.; Fox, M.; Stuart, K.; Myler, P. J.
Transcription of Leishmania major Friedlin chromosome 1 initiates in both
directions within a single region. Mol Cell 11: 1291-1299, 2003.
Martínez-Calvillo, S.; Nguyen, D.; Stuart, K.; Myler, P. J.Transcription initiation
and termination on Leishmania major chromosome 3.Eukaryot Cell 3: 506-517,
2004.
Mendes, T. A. O.; Lobo, F. P.; Rodrigues, T. S.; Rodrigues-Luiz, G. F.;
DaRocha, W. D.; Fujiwara, R. T.; Teixeira, S. M. R.; Bartholomeu, D. C.;
Repeat-Enriched Proteins Are Related to Host Cell Invasion and Immune
Evasion in Parasitic Protozoa. Mol Biol Evol v. 30, p. 951-963, 2013.
Miles, M. A.; Cedillos, R. A.; Povoa, M. M.; Souza, A. A.; de Prata, A. A.;
Macedo, V.Do radically dissimilar Trypanosoma cruzi strains (zymodemes)
cause Venezuelan and Brazilian forms of Chagas disease? Lancet 317: 13381340, 1981.
100
Minning, T. A.; Bua, J.; Garcia, G. A.; McGraw, R. A.; Tarlenton, R. L.
Microarray profiling of gene expression during trypomastigote to amastigote
transition in Trypanosoma cruzi. BMC Genomics, 131:55-64, 2003.
Minning, T. A.; Weatherly, D. B.; Atwood, J. 3rd; Orlando, R.; Tarleton, R. L.The
steady-state transcriptome of the four major life-cycle stages of Trypanosoma
cruzi.BMC Genomics.7;10:370, 2009.
Morel, C.; Chiari, E.; Camargo, E. A.; Mattei, D. M.; Romanha, A. J.; Simpson,
L.Strains and clones of Trypanosoma cruzi can be characterized by pattern of
restriction endonuclease. Proc Natl Acad Sci USA 77: 6810-6814, 1980.
Motta MC, Martins AC, de Souza SS, Catta-Preta CM, Silva R, Klein CC, de
Almeida LG, de Lima Cunha O, Ciapina LP, Brocchi M, Colabardini AC, de
Araujo Lima B, Machado CR, de Almeida Soares CM, Probst CM, de Menezes
CB, Thompson CE, Bartholomeu DC, Gradia DF, Pavoni DP, Grisard EC,
Fantinatti-Garboggini F, Marchini FK, Rodrigues-Luiz GF, Wagner G, Goldman
GH, Fietto JL, Elias MC, Goldman MH, Sagot MF, Pereira M, Stoco PH, de
Mendonça-Neto RP, Teixeira SM, Maciel TE, de Oliveira Mendes T. A, Ürményi
TP, de Souza W, Schenkman S, de Vasconcelos AT.; Predicting the proteins of
Angomonas deanei, Strigomonas culicis and their respective endosymbionts
reveals new aspects of the trypanosomatidae family. PLoS One. 2013
Najafabadi, H. S.; Lu, Z.; MacPherson, C.; Mehta, V.; Adoue, V.; Pastinen, T.;
Salavati, R.; Global identification of conserved post-transcriptional regulatory
programs in trypanosomatids. Nucleic Acids Research, online, July, 2013.
Nozaki, T.;Cross, G. A. M. Effects of 3' untranslated and intergenic regions on
gene expression in Trypanosoma cruzi, Molecular and Biochemical
Parasitology, Volume 75, Issue 1, Pages 55-67, 1995.
Ochs DE, Otsu K, Teixeira SM, Moser DR, Kirchhoff LV: Maxicircle genomic
organization and editing of an ATPase subunit 6 RNA in Trypanosoma cruzi.
Mol Biochem Parasitol, 76(1-2), 1996.
Paiva, C. N., Castelo-Branco,
M. T., Rocha,
J. A., Lannes-Vieira,
J,
eGattass, C. R; Trypanosoma cruzi: lack of T cell abnormalities in mice
vaccinated with live
trypomastigotes.
Parasitol
Res, p. 1012-1017,
1999.
Pays, E.; Vanhamme, L.; Pérez-Morga, D.; Antigenic variation in Trypanosoma
brucei: facts, challenges and mysteries. Current Opinion in Microbiology, vol. 7,
p. 369–374, 2004.
101
Peacock, C. S.; Seeger, K.; Harris, D.; Murphy, L.; Ruiz, J. C.; Quail, M. A.;
Peters, N.; Adlem, E.; Tivey, A. et al. Comparative genomic analysis of three
Leishmania species that cause diverse human disease. Nat Genet. 39(7):83947, 2007.
Pena, S. D. J.; Machado, C. R.; Macedo, A. M. Trypanosoma cruzi: ancestral
genomes and population structure. Mem. Inst. Oswaldo Cruz, Rio de Janeiro,
2011.
Pitcovsky, T. A., Buscaglia, C. A., Mucci, J., Campetella, O.; A functional
network of intramolecular cross-reacting epitopes delays the elicitation of
neutralizing anti- bodies to Trypanosoma cruzi trans-sialidase. J Infect Dis 186:
397–404, 2002.
Pollevick, G. D.; Affranchino, J. L.; Frasch, A. C. C.; Sanchez, D. O.; The
complete sequence of a shed acute-phase antigen of Trypanosoma cruzi. Mol
Biochem Parasitol 47: 247–250, 1991.
Porcel, B.M., Aslund, L., Pettersson, U., Andersson, B., 2000. Trypanosoma
cruzi: a putative vacuolar ATP synthase subunit and a CAAX prenyl proteaseencoding gene, as examples of gene identification in genome projects. Exp.
Parasitol. 95, 176–186.
Porcile PE, Santos MR, Souza RT, Verbisck NV, Brandão A, Urmenyi T, Silva
R, Rondinelli E, Lorenzi H, Levin MJ, Degrave W, Franco da Silveira J. A
refined
molecular
karyotype
for
the
reference
strain
of
the
Trypanosoma cruzi genome project (clone CL Brener) by assignment of
chromosome markers. Gene. 2003 Apr 10;308:53-65.
Pyrrho, A. S.; Moraes, J. L.; Peçanha, L. M.; eGattass, C. R; Trypanosoma
cruzi: IgG1 and IgG2b are the main immunoglobulins produced by
vaccinated mice." Parasitol Res p. 333- 337, 1998.
Ray DS; Conserved sequence blocks in kinetoplast minicircles from diverse
species of trypanosomes. Mol Cell Biol, 9(3), 1989.
Raymond F., Boisvert S., Roy G., et al.; Genome sequencing of the lizard
parasite Leishmania tarentolae reveals loss of genes associated to the
intracellular
stage
of
human
pathogenic
species. Nucleic
Acids
Res. 2012;40:1131-47.
Rassi, A.; Jr, R. A.; Marin-Neto, J. A. Chagas disease.Lancet.375:1388, 2010.
Real F, Vidal RO, Carazzolle MF, Mondego JM, Costa GG, Herai RH, Würtele
M, de Carvalho LM, E Ferreira RC, Mortara RA, Barbiéri CL, Mieczkowski P, da
102
Silveira JF, Briones MR, Pereira GA, Bahia D.; The Genome Sequence
of Leishmania amazonensis: Functional Annotation and Extended Analysis of
Gene Models. DNA Res. 2013 Jul 15.
Risso, M. G.; Pitcovsky, T. A.; Caccuri, R. L.; Campetella, O.; Leguizamon, M.
S.; Immune system pathogenesis is prevented by the neutralization of the
systemic trans-sialidase from Trypanosoma cruzi during severe infections.
Parasitology 134: 503–510, 2007.
Rochette, A.; McNicoll, F.;Girard, F.et al. Characterization and developmental
gene regulation of a large gene family encoding amastin surface proteins in
Leishmania spp, Mol Biochem Parasitol 140, pp. 205–220, 2005.
Ronaghi, M; Improved Performance of Pyrosequencing Using Single-Stranded
DNA-Binding Protein. Analytical Biochemistry, 286, 2, 2000.
Rice, P.; Longden, I.; Bleasby, A. EMBOSS: The European Molecular Biology
Open Software Suite.Trends in Genetics, vol 16, No 6.pp.276-277, 2000.
Schauer, R.; Reuter, G.; Muhlpfordt, H.; Andrade, A. F.; Pereira, M. E.; The
occurrence of N-acetyl- and N-glycoloylneuraminic acid in Trypanosoma cruzi.
Hoppe Seylers Z Physiol Chem 364: 1053–1057, 1983.
Schofield, C. J.; Jannin, J.; Salvatella , R.; The future of Chagas disease
control. Trends in Parasitology - Vol. 22, p 583-588, 2006.
Shapiro, T. A.; Kinetoplast DNA maxicircles: networks within networks. PNAS,
v. 16, p. 7809-7813, 1993.
Siegel, T. N.; Tan, K. S.; Cross, G. A..; Systematic study of sequence motifs for
RNA trans-splicing in Trypanosoma brucei. Mol. Cell Biol. 25:9586-9594, 2005.
Siegel, T. N.; Kapila, G.; George, A.M.; Cross, T. O. Gene expression in
Trypanosoma brucei: lessons from high-throughput RNA sequencing, Trends in
Parasitology, In Press, Corrected Proof, 2011.
Singh, N.; Chikara, S.; Sundar, S.; SOLiD™ Sequencing of Genomes of Clinical
Isolates of Leishmania donovani from India Confirm Leptomonas Co-Infection
and Raise Some Key Questions. Plos One, 2013, v8.2
Soares, M. B.; Goncalves, R.; et al; Balanced cytokine-producing pattern in
mice immunized with an avirulent Trypanosoma cruzi. An Acad Bras Cienc, p.
167-172, 2003.
103
Souto, R. P.;Fernandes, O.;Macedo, A. M.;Campbell, D. A.;Zingales, B. DNA
markers define two major phylogenetic lineages of Trypanosoma cruzi, Mol.
Biochem. Parasitol. 83, pp. 141–152, 1996.
Souza, R. T.; Lima, F. M.; Barros, R. M.; Cortez, D. R.; Santos, M. F.; Cordero,
E. M.; Ruiz, J. C.; Goldenberg, S.; Teixeira, M. M. G.; Franco da Silveira, J.;
Genome Size, Karyotype Polymorphism and Chromosomal Evolution in
Trypanosoma cruzi. PLoS ONE 6(8): e23042, 2011.
Souza, W. Novel Cell Biology of Trypanosoma cruzi In American
Trypanosomiasis World Class Parasites: Volume 7. Edited by Miles MATKM.
Boston , Springer; 13-24, 2003.
Stajich, J. E.; Block, D.; Boulez, K.; Brenner; Tamura, K.; Peterson, D.;
Peterson, N.; Stecher, G.; Nei, M.;Kumar, S. MEGA5: Molecular Evolutionary
Genetics Analysis using Maximum Likelihood, Evolutionary Distance, and
Maximum Parsimony Methods. Molecular Biology and Evolution, 2011.
Stephen, F.; Altschul, T. L.; Madden, A.A.; Jinghui Z.; Zheng, Z.; Miller,
W.;Lipman, D. J.Gapped BLAST and PSI-BLAST: a new generation of protein
database search programs, Nucleic Acids Res. 25:3389-3402, 1997.
Schuler, G. D.; Sequence mapping by electronic PCR. Genome Res 7: 541–
550, 1997.
Teixeira, S. M. R.; da Rocha, W. D. Control of gene expression and genetic
manipulation in the Trypanosomatidae. Genet Mol Res 2: 148-158, 2003.
Teixeira, S.M.R.; Russell, D.G.; Kirchhoff, L.V.;Donelson, J.E.A differentially
expressed gene family encoding "amastin", a surface glycoprotein of
Trypanosoma cruzi amastigotes. J. Biol. Chem.269: 20509-20516, 1994.
Thorvaldsdóttir, H.; Robinson, J. T.; Mesirov, J. P.; Integrative Genomics Viewer
(IGV): high-performance genomics data visualization and exploration. Briefings
in Bioinformatics 2012.
Tibayrenc, M.; Ayala, F. J.;Towards a population genetics of microorganisms:
the clonal theory of parasitic protozoa. Parasitol Today 7: 228-232, 1991.
Verdun, R.E., Di Paolo, N., Urmenyi, T.P., Rondinelli, E., Frasch, A.C.,
Sanchez, D.O., 1998. Gene discovery through expressed sequence Tag
sequencing in Trypanosoma cruzi. Infect. Immun. 66, 5393–5398.
104
Wang, Z.; Mark, G.; Snyder, M. RNA-Seq: a
transcriptomics. Nature 10(1): 57–63,2009.
revolutionary tool for
Weatherly, D. B.; Boehlke, C.; Tarleton, R. L. Chromosome level assembly of
the hybrid Trypanosoma cruzi genome. BMC Genomics 10: 255, 2009.
Weinkauf, C., Salvador, R., and Pereiraperrin, M.; Neurotrophin receptor TrkC
is an entry receptor for Trypanosoma cruzi in neural, glial and epithelial cells.
Infect Immun 79: 4081–4087, 2011.
Westenberger, S. J.; Cerqueira, G. C.; El-Sayed, N. M.; Zingales. B.; Campbell,
D. A.; Sturm, N. R. Trypanosoma cruzi mitochondrial maxicircles display
species- and strain-specific variation and possess a conserved element in the
non-coding region. BMC Genomics. 7:60. doi: 10.1186/1471-2164-7-60, 2006.
WHO; A human rights-based approach to neglected tropical diseases. WHO.
2013. Disponível em http://www.who.int/tdr/publications/documents/humanrights.pdf. 10/10/2013
WHO. Chagas disease (American Trypanosomiaisis). Fact Sheet no 340.
Disponível em http://www.who.int/mediacentre/factsheets/fs340/en/. 28/06/2013
Yeo, M.; Mauricio, I. L.; Messenger, L. A.; Lewis, M. D.; Llewellyn, M. S. et
al. Multilocus Sequence Typing (MLST) for Lineage Assignment and High
Resolution Diversity Studies in Trypanosoma cruzi. PLoS Negl Trop
Dis 5(6): e1049, 2011.
Zingales, B.; Pereira, M. E.; Almeida, K. A.; Umezawa, E. S.; Nehme, N. S.;
Oliveira, R. P.; Macedo, A.; Souto,R. P.Biological parameters and molecular
markers of clone CL Brener, the reference organism of the Trypanosoma cruzi
genome project. Mem Inst Oswaldo Cruz. 92(6):811-4, 1997.
Zingales B, Stolf BS, Souto RP, Fernandes O, Briones MR. Epidemiology,
biochemistry and evolution of Trypanosoma cruzi lineages based on ribosomal
RNA sequences. Mem Inst Oswaldo Cruz. 94:159–164. 1999
Zingales, B. et al. A new consensus for Trypanosoma cruzi intraspecific
nomenclature: second revision meeting recommends TcI to TcVI. Mem. Inst.
Oswaldo Cruz, Rio de Janeiro, v. 104, n. 7, Nov. 2009.
105
Predicting the Proteins of Angomonas deanei,
Strigomonas culicis and Their Respective Endosymbionts
Reveals New Aspects of the Trypanosomatidae Family
Maria Cristina Machado Motta1, Allan Cezar de Azevedo Martins1, Silvana Sant’Anna de Souza1,2, Carolina
Moura Costa Catta-Preta1, Rosane Silva2, Cecilia Coimbra Klein3,4,5, Luiz Gonzaga Paula de Almeida3,
Oberdan de Lima Cunha3, Luciane Prioli Ciapina3, Marcelo Brocchi6, Ana Cristina Colabardini7, Bruna de
Araujo Lima6, Carlos Renato Machado9, Célia Maria de Almeida Soares10, Christian Macagnan Probst11,12,
Claudia Beatriz Afonso de Menezes13, Claudia Elizabeth Thompson3, Daniella Castanheira Bartholomeu14,
Daniela Fiori Gradia11, Daniela Parada Pavoni12, Edmundo C. Grisard15, Fabiana Fantinatti-Garboggini13,
Fabricio Klerynton Marchini12, Gabriela Flávia Rodrigues-Luiz14, Glauber Wagner15, Gustavo
Henrique Goldman7, Juliana Lopes Rangel Fietto16, Maria Carolina Elias17, Maria Helena S. Goldman18,
Marie-France Sagot4,5, Maristela Pereira10, Patrı́cia H. Stoco15, Rondon Pessoa de Mendonça-Neto9,
Santuza Maria Ribeiro Teixeira9, Talles Eduardo Ferreira Maciel16, Tiago Antônio de Oliveira Mendes14,
Turán P. Ürményi2, Wanderley de Souza1, Sergio Schenkman19*, Ana Tereza Ribeiro de Vasconcelos3*
1 Laboratório de Ultraestrutura Celular Hertha Meyer, Instituto de Biofı́sica Carlos Chagas Filho, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Rio de Janeiro, Brazil,
2 Laboratório de Metabolismo Macromolecular Firmino Torres de Castro, Instituto de Biofı́sica Carlos Chagas Filho, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Rio de
Janeiro, Brazil, 3 Laboratório Nacional de Computação Cientı́fica, Laboratório de Bioinformática, Petrópolis, Rio de Janeiro, Brazil, 4 BAMBOO Team, INRIA Grenoble-Rhône-Alpes,
Villeurbanne, France, 5 Laboratoire de Biométrie et Biologie Evolutive, Université de Lyon, Université Lyon 1, CNRS, UMR5558, Villeurbanne, France, 6 Departamento de Genética,
Evolução e Bioagentes, Instituto de Biologia, Universidade Estadual de Campinas, Campinas, São Paulo, Brazil, 7 Departamento de Ciências Farmacêuticas, Faculdade de Ciências
Farmacêuticas de Ribeirão Preto, Universidade de São Paulo, Ribeirão Preto, São Paulo, Brazil, 8 Laboratório Nacional de Ciência e Tecnologia do Bioetanol, Campinas, São Paulo,
Brazil, 9 Departamento de Bioquı́mica e Imunologia, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil, 10 Laboratório de
Biologia Molecular, Instituto de Ciências Biológicas, Universidade Federal de Goiás, Goiânia, Goiás, Brazil, 11 Laboratório de Biologia Molecular de Tripanossomatı́deos, Instituto
Carlos Chagas/Fundação Oswaldo Cruz, Curitiba, Paraná, Brazil, 12 Laboratório de Genômica Funcional, Instituto Carlos Chagas/Fundação Oswaldo Cruz, Curitiba, Paraná, Brazil,
13 Centro Pluridisciplinar de Pesquisas Quı́micas, Biológicas e Agrı́colas, Universidade Estadual de Campinas, Campinas, São Paulo, Brazil, 14 Departamento de Parasitologia,
Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil, 15 Laboratórios de Protozoologia e de Bioinformática, Departamento de
Microbiologia, Imunologia e Parasitologia, Centro de Ciências Biológicas, Universidade Federal de Santa Catarina, Florianópolis, Santa Catarina, Brazil, 16 Departamento de
Bioquı́mica e Biologia Molecular, Centro de Ciências Biológicas e da Saúde, Universidade Federal de Viçosa, Viçosa, Minas Gerais, Brazil, 17 Laboratório Especial de Ciclo Celular,
Instituto Butantan, São Paulo, São Paulo, Brazil, 18 Departamento de Biologia, Faculdade de Filosofia, Ciências e Letras de Ribeirão Preto, Universidade de São Paulo, Ribeirão Preto,
São Paulo, Brazil, 19 Departamento de Microbiologia, Imunologia e Parasitologia, Escola Paulista de Medicina, Universidade Federal de São Paulo, São Paulo, São Paulo, Brazil
Abstract
Endosymbiont-bearing trypanosomatids have been considered excellent models for the study of cell evolution because the
host protozoan co-evolves with an intracellular bacterium in a mutualistic relationship. Such protozoa inhabit a single
invertebrate host during their entire life cycle and exhibit special characteristics that group them in a particular
phylogenetic cluster of the Trypanosomatidae family, thus classified as monoxenics. In an effort to better understand such
symbiotic association, we used DNA pyrosequencing and a reference-guided assembly to generate reads that predicted
16,960 and 12,162 open reading frames (ORFs) in two symbiont-bearing trypanosomatids, Angomonas deanei (previously
named as Crithidia deanei) and Strigomonas culicis (first known as Blastocrithidia culicis), respectively. Identification of each
ORF was based primarily on TriTrypDB using tblastn, and each ORF was confirmed by employing getorf from EMBOSS and
Newbler 2.6 when necessary. The monoxenic organisms revealed conserved housekeeping functions when compared to
other trypanosomatids, especially compared with Leishmania major. However, major differences were found in ORFs
corresponding to the cytoskeleton, the kinetoplast, and the paraflagellar structure. The monoxenic organisms also contain a
large number of genes for cytosolic calpain-like and surface gp63 metalloproteases and a reduced number of
compartmentalized cysteine proteases in comparison to other TriTryp organisms, reflecting adaptations to the presence of
the symbiont. The assembled bacterial endosymbiont sequences exhibit a high A+T content with a total of 787 and 769
ORFs for the Angomonas deanei and Strigomonas culicis endosymbionts, respectively, and indicate that these organisms
hold a common ancestor related to the Alcaligenaceae family. Importantly, both symbionts contain enzymes that
complement essential host cell biosynthetic pathways, such as those for amino acid, lipid and purine/pyrimidine
metabolism. These findings increase our understanding of the intricate symbiotic relationship between the bacterium and
the trypanosomatid host and provide clues to better understand eukaryotic cell evolution.
PLOS ONE | www.plosone.org
1
April 2013 | Volume 8 | Issue 4 | e60209
Predicting Proteins of A. deanei and S. culicis
Citation: Motta MCM, Martins ACdA, de Souza SS, Catta-Preta CMC, Silva R, et al. (2013) Predicting the Proteins of Angomonas deanei, Strigomonas culicis and
Their Respective Endosymbionts Reveals New Aspects of the Trypanosomatidae Family. PLoS ONE 8(4): e60209. doi:10.1371/journal.pone.0060209
Editor: John Parkinson, Hospital for Sick Children, Canada
Received October 16, 2012; Accepted February 22, 2013; Published April 3, 2013
Copyright: � 2013 Motta et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits
unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Funding: This work was supported by Fundação Carlos Chagas Filho de Amparo à Pesquisa do Estado do Rio de Janeiro (FAPERJ), Fundação de Amparo à
Pesquisa do Estado de São Paulo (FAPESP) and Conselho Nacional de Desenvolvimento Cientı́fico e Tecnológico (CNPq). The work of CCK as part of her PhD is
funded by the ERC AdG SISYPHE coordinated by MFS. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of
the manuscript.
Competing Interests: The co-author Maria Carolina Elias is a PLOS ONE Editorial Board member. This does not alter the authors’ adherence to all the PLOS ONE
policies on sharing data and materials.
* E-mail: [email protected] (ATRdV); [email protected] (SS)
the insect host, which seems to be mediated by gp63 proteases,
sialomolecules, and mannose-rich glycoconjugates [20,21].
Molecular data support the grouping of all endosymbiontcontaining trypanosomatids together in a single phylogenetic
branch. Moreover, studies based on rRNA sequencing suggest that
symbionts from different protozoan species share high identities
and are most likely derived from an ancestor of a b-proteobacterium of the genus Bordetella, which belongs to the Alcaligenaceae
family [2,22,23]. Taken together, these results suggest that a single
evolutionary event gave rise to all endosymbiont-bearing trypanosomatids, recapitulating the process that led to the formation of
the mitochondrion in eukaryotic cells [24].
In this work, we analyzed the predicted protein sequences of A.
deanei and S. culicis and their respective symbionts. This is the first
time that genome databases have been generated from endosymbiont-containing trypanosomatids, which represent an excellent
biological model to study eukaryotic cell evolution and the
bacterial origin of organelles. The analysis presented here also
clarifies aspects of the evolutionary history of the Trypanosomatidae family and helps us to understand how these protozoa
maintain a close symbiotic relationship.
Introduction
Protists of the Trypanosomatidae family have been intensively
studied because some of them are agents of human illnesses such as
Chagas’ disease, African sleeping sickness, and leishmaniasis,
which have a high incidence in Latin America, Sub-Saharan
Africa, and parts of Asia and Europe, together affecting
approximately 33 million people. Some species are also important
in veterinary medicine, seriously affecting animals of economic
interest such as horses and cattle. In addition, some members of
the Phytomonas genus infect and kill plants of considerable
economical interest such as coconut, oil palm, and cassava. These
organisms circulate between invertebrate and vertebrate or plant
hosts. In contrast, monoxenic species, which predominate in this
family, inhabit a single invertebrate host during their entire life
cycle [1].
Among the trypanosomatids, six species found in insects bear a
single obligate intracellular bacterium in their cytoplasm [2], with
Angomonas deanei and Strigomonas culicis (previously named as Crithidia
deanei and Blastocrithidia culicis, respectively) representing the species
better characterized by ultrastructural and biochemical approaches [3]. In this obligatory association, the endosymbiont is unable to
survive and replicate once isolated from the host, whereas
aposymbiotic protozoa are unable to colonize insects [4,5]. The
symbiont is surrounded by two membrane units and presents a
reduced peptidoglycan layer, which is essential for cell division and
morphological maintenance [6]. The lack of a typical gramnegative cell wall could facilitate the intense metabolic exchange
between the host cell and the symbiotic bacterium.
Biochemical studies revealed that the endosymbiont contains
enzymes that complete essential metabolic pathways of the host
protozoan for amino acid production and heme biosynthesis, such
as the enzymes of the urea cycle that are absent in the protozoan
[7,8,9,10,11]. Furthermore, the bacterium enhances the formation
of polyamines, which results in high rates of cell proliferation in
endosymbiont-bearing trypanosomatids compared to other species
of the family [12]. Conversely, the host cell supplies phosphatidylcholine, which composes the endosymbiont envelope [5], and
ATP produced through the activity of protozoan glycosomes [13].
The synchrony in cellular division is another striking feature of
this symbiotic relationship. The bacterium divides in coordination
with the host cell structures, especially the nucleus, with each
daughter cell carrying only one symbiont [14]. The presence of the
prokaryote causes ultrastructural alterations in the host trypanosomatid, which exhibits a reduced paraflagellar structure and a
typical kinetoplast DNA network [15,16,17]. The endosymbiontharboring strains exhibit a differential surface charge and
carbohydrate composition than the aposymbiotic cells obtained
after antibiotic treatment [18,19]. Furthermore, the presence of
the symbiotic bacterium influences the protozoan interaction with
PLOS ONE | www.plosone.org
Materials and Methods
Materials and methods are described in the Text S1.
Nucleotide Sequence Accession Numbers
The sequences of Angomonas deanei, Strigomonas culicis, Candidatus
Kinetoplastibacterium crithidii and Candidatus Kinetoplastibacterium blastocrithidii were assigned as PRJNA169008,
PRJNA170971, CP003978 and CP003733, respectively, in the
DDBJ/EMBL/GenBank.
Results and Discussion
General Characteristics
A 454-based pyrosequencing generated a total of 3,624,411
reads with an average length of 365 bp for A. deanei and a total of
2,666,239 reads with an average length of 379 bp for S. culicis
(Table 1). A total of 16,957 and 12,157 ORFs were obtained for A.
deanei and S. culicis genomes using this strategy, while their
respective endosymbionts held a total of 787 and 769 ORFs,
respectively. The total number of ORFs includes non-coding
protein tRNA and rRNA genes. Tables 1 and 2 present the
number of known proteins, hypothetical and partial ORFs for the
two trypanosomatids and their endosymbionts, respectively.
The tRNA genes representing all 20 amino acids were identified
in both trypanosomatids and their respective symbionts. At least
one copy of the rRNA genes (18S, 5.8S and 28S) was identified in
the genomes of A. deanei and S. culicis. We found that bacterial
2
April 2013 | Volume 8 | Issue 4 | e60209
Predicting Proteins of A. deanei and S. culicis
Table 1. Protein Reference Sequence-Guided Assembly data
of A. deanei and S. culicis genomes.
Parameter
A. deanei
S. culicis
Reads
3,624,411
2,666,239
Average reads length (bp)
365
379
Steps
3
5
Genes in contigs (protein reference
sequence)
12,469
9,902
Genes in exclusive contigs
4,435
2,202
Number of known protein ORFs
7,912
6,192
Number of hypothetical ORFs
8,791
5,700
Number of partial ORFs
206
217
Total number of genes
(including tRNAs and rRNAs)
16,957
12,157
doi:10.1371/journal.pone.0060209.t001
endosymbiont genomes also contain at least three copies of the
rRNA operon.
Figure 1. Venn diagram illustrating the distribution of MCL
protein clusters. The diagram shows the cluster distribution
comparing endosymbiont-bearing trypanosomatids (group A), Leishmania sp. (group B) and Trypanosoma sp. (group C). Protein clusters
with less clear phylogenetic distributions are identified as others.
doi:10.1371/journal.pone.0060209.g001
General Protein Cluster Analysis
A total of 16,648 clusters were identified. Of those, 2,616
(16.4%) contained proteins from all species analyzed. To provide a
more comprehensive coverage of the phylogenetic distribution, we
have separated the species into three groups: endosymbiontbearing trypanosomatids (A, s = 2 species), Leishmania sp. (B, s = 5)
and Trypanosoma sp. (C, s = 4), and we considered a protein cluster
to be present in the group even if zero, two or one species were
missing, respectively. The protein cluster distribution is shown in
Figure 1.
In this way, 2,979 protein clusters (17.9%) were identified in all
groups, with 130 (0.8%) identified only in groups A and B (AB
group), 31 (0.2%) only in groups A and C (AC group), and 501
(3.2%) only in groups B and C (BC group). The AB group
represents the proteins that are absent in the Trypanosoma sp.
branch. These proteins are mainly related to general metabolic
function (p = 46 proteins), hypothetical conserved (p = 37) or
transmembrane/surface proteins (p = 33). The AC group is fourfold smaller than the AB group, in accordance with the closer
relationship between endosymbiont-bearing trypanosomatids and
Leishmania sp [25]. The proteins in the AC group are mainly
related to general metabolic function (p = 11), transmembrane/
surface proteins (p = 8) and hypothetical conserved proteins (p = 7),
and the relative distribution between these categories is very
similar to the distribution in the AB group. The BC group is
almost four-fold larger than the AB group, and mainly consists of
conserved hypothetical proteins. One hypothesis to explain these
different levels of conservation could be that organisms from the
genera Trypanosoma and Leishmania inhabit insect and mammalian
hosts, while the symbiont-bearing protozoa are mainly insect
parasites. Thus, different surface proteins would be involved in
host/protozoa interactions and distinct metabolic proteins are
required for survival in these diverse environments.
Only a small fraction of protein clusters (n = 54, 0.3%) was
identified in group A. This finding is in striking contrast to protein
clusters identified only in group B (n = 889, 5.3%) or only in group
C (n = 679, 4.5%), which represent specializations of the Leishmania
or Trypanosoma branches. This small set is mainly composed of
hypothetical proteins without similar proteins in the GenBank
database. Only three of the group A clusters are similar to
bacterial proteins, with two of these similar to Bordetella (clusters
04518 and 05756). The third one is similar to the bacterial-type
glycerol dehydrogenase of Crithidia sp. (cluster 07344).
Of all the clusters that are present in all species except for one
(n = 1,274, 7.6%), 694 (54.5%) are missing in S. culicis, followed by
T. congolense (n = 211, 16.6%), A. deanei (n = 201, 15.8%) and T.
vivax (n = 104, 8.0%). The fact that endosymbiont-bearing species
are better represented in these sets could be due to unidentified
proteins in the assembly and/or cluster analysis. This is reinforced
by the fact that among clusters containing proteins from just one
species (n = 9,477; 56.9%), most (73.9%) are from species with
genomes that are not completely assembled (T. vivax, n = 1,881,
19.8%; T. congolense, n = 1,845, 19.5%; A. deanei, n = 1,745, 18.4%;
Table 2. General characteristics of the A. deanei and S. culicis
symbionts.
Parameter
A. deanei symbiont S. culicis symbiont
Length (BP)
821,813
820,037
G+C (%)
30.96%
32.55%
Number of known protein CDSs 640
637
Number of hypothetical CDSs
94
78
Coding region (% of genome
size)
88
87
Average CDSs length (bp)
987 bp
1,004 bp
rRNA
9
9
rRNA 16 s
3
3
rRNA 23 s
3
3
rRNA 5 s
3
3
tRNA
44
45
Total number of genes
787
769
doi:10.1371/journal.pone.0060209.t002
PLOS ONE | www.plosone.org
3
April 2013 | Volume 8 | Issue 4 | e60209
Predicting Proteins of A. deanei and S. culicis
Wolbachia and T. asinigenitalis (E). Alignments were performed with the
ACT program based on tblastx analyses. Red (direct similarity) and blue
lines (indirect similarity) connect similar regions with at least 700 bp
and a score cutoff of 700. The numbers on the right indicate the size of
the entire sequence for each organism.
doi:10.1371/journal.pone.0060209.g002
S. culicis, n = 1,530, 16.1%). T. brucei and T. cruzi also account for
significant numbers of clusters with only a single species (n = 1,094,
11.5% and n = 1,071, 11.3%, respectively), and these clusters
mainly consist of multigenic surface proteins.
Our data support the idea that endosymbiont-bearing trypanosomatids share a larger proportion of their genes with the
Leishmania sp. in accordance with previous phylogenetic studies
[2,25]. Only one fifth of all trypanosomatid protein clusters are
shared among most of the species analyzed here. This proportion
increases to one fourth if we only analyze the Leishmania and
Trypanosoma genera; however, the number of clusters specific for
endosymbiont-bearing kinetoplastids is a relatively small proportion (0.6%) of all clusters, indicating that the specialization of genes
in the species following this evolutionary process was relatively
small.
Genomic Characteristics of the A. deanei and S. culicis
Endosymbionts
The endosymbiont genomes. Table 2 summarizes the
genome analyses of both symbionts. The genome of the A. deanei
endosymbiont contains 821,813 bp, with almost 31% G+C
content and 787 CDSs. Of these, 640 (81.3%) were characterized
as known CDSs, 94 (11.9%) as hypothetical, and 53 (6.7%) as
rRNA or tRNA. The average CDS length is 987 bp, and coding
regions account for 88% of the genome, indicating that the
genome is highly compact. There are three copies of each rRNA
and 44 tRNAs, suggesting a functional translation metabolism.
The endosymbiont of S. culicis has a genome composed of
820,037 bps and 769 CDSs, 637 (83.5%) coding for known
proteins, 78 (9.5%) annotated as hypothetical proteins, and 54
(6.0%) as rRNA or tRNA. The G+C content (32.6%) is similar to
but slightly higher than that of the A. deanei endosymbiont
(30.96%). A. deanei and S. culicis endosymbiont genomes are
composed of 88 and 87% of CDSs with few regions formed by
non-coding sequences.
A direct comparison between the two endosymbionts indicated
that they share 507 genes that meet the criteria for inclusion in a
cluster as described in the Materials and Methods. This represents
approximately 70% of the annotated genes in both genomes,
indicating a certain degree of genetic similarity. Figure 2A shows
the full alignment of the A. deanei and S. culicis symbionts. This
alignment indicates the occurrence of an inversion involving
approximately one half of the genomes. However, this inversion
would be validated by experimental work. The observed
differences agree with phylogenetic analyses suggesting the
classification of these symbionts as different species, Candidatus
Kinetoplastibacterium crithidii and Candidatus Kinetoplastibacterium blastocrithidii [2,23].
The origins of symbionts in trypanosomatids. Previous
phylogenetic studies based on sequencing of the small-subunit
ribosomal DNA suggested that symbionts of trypanosomatids
descended from a common ancestor, a b-proteobacteria of the
Bordetella genus [2,22,23]. Comparisons of the endosymbiont
genomes with the KEGG database revealed eight organisms that
share high numbers of similar CDSs: Bordetella petrii, A. xylosoxidans,
Bordetella avium, Bordetella parapertussis, Pusillimonas, Bordetella bronchiseptica and Taylorella equigenitalis. All these species are phylogenet-
Figure 2. Genome alignments. The figure shows the alignment of
the A. deanei endosymbiont (Endo-A. deanei) and the S. culicis
endosymbiont (Endo-S. culicis) (A); between Endo-A. deanei and T.
asinigenitalis (B), T. equigenitalis (C), or Wolbachia (D); and between
PLOS ONE | www.plosone.org
4
April 2013 | Volume 8 | Issue 4 | e60209
Predicting Proteins of A. deanei and S. culicis
ically related to b-proteobacteria belonging to the Alcaligenaceae
family. The genus Taylorella consists of two species, T. equigenitalis
and T. asinigenitalis, which are microaerophilic, slow-growing
gram-negative bacteria belonging to the family Alcaligenaceae
[26,27]. T. equigenitalis is an intracellular facultative pathogen in
horses that causes contagious equine metritis (CEM), a sexually
transmitted infection [28].
Based on these facts, clustering analysis was performed to
compare these genomes and establish the genetic similarity among
them. The clustering analysis compared the genomes of A. deanei
and S. culicis endosymbionts, T. equigenitalis MCE9, T. asinigenitalis
MCE3, B. petrii DSM 12804, A. xylosoxidans A8 and Wolbachia
pipiens (WMel). For the A. deanei endosymbiont, the highest
numbers of shared clusters are observed for A. xylosoxidans (490
clusters) and B. petrii (483 clusters), followed by T. asinigenitalis (376
clusters) and T. equigenitalis (375 clusters). However, considering the
genome length, T. equigenitalis and T. asinigenitalis had the greater
proportion of genes in clusters (24.1 and 24.67% of the annotated
genes, respectively). The values for A. xylosoxidans and B. petrii are
7.59 and 9.61%, respectively. Note that the A. xylosoxidans plasmids
pA81 and pA82 are not included in these comparisons. The S.
culicis endosymbiont shares a high number of clusters (74%) with
other genomes; considering 714 annotated genes (rRNA and
tRNA genes were not taken into account), 544 (76.19%) were
similar to genes of the other microorganisms. The highest number
of clusters is shared between A. xylosoxidans (501 clusters) and B.
petrii (495 clusters), followed by T. asinigenitalis (390) and T.
equigenitalis (388 clusters). Using W. pipiens (wMel), an endosymbiont of Drosophila melanogaster, as an out-group, we found 70 clusters
for A. deanei and 73 clusters for S. culicis. Wolbachia also shares a
lower number of clusters with T. asinigenitalis (79) and T. equigenitalis
(81).
T. equigenitalis MCE9 and T. asinigenitalis MCE3 contain
1,695,860 and 1,638,559 bps, respectively. Therefore, the A.
deanei and S. culicis symbiont genomes are reduced when compared
to Taylorella, which also have reduced genomes when compared to
Bordetella or Achromobacter [26,27]. Alignments indicate the
existence of similar sequences between the Taylorella and the
kinetoplastid symbionts (Figure 2B and C), corroborating the
results obtained in the clustering analyses. Much less similarity is
observed between A. deanei and W. pipientis wMel, as well as
between W. pipientis and T. asinigenitalis using the same alignment
parameters (Figure 2D and E). Both Taylorella genomes are ATrich (37.4 and 38.3% for T. equigenitalis and T. asinigenitalis,
respectively), a characteristic also shared with both symbionts.
Therefore, it is possible that the process of adaptation to
intracellular life involved substantial base-composition modification, as most symbiotic bacteria are AT-rich [29,30].
The degree of similarity and even identity of the endosymbionts
with Taylorella genomes and even with genomes of other species
such as Bordetella and Achromobacter reinforce the origin of both
endosymbionts from an ancestor of the Alcaligenaceae group.
Both endosymbionts are similar to T. equigenitalis, T. asinigenitalis, B.
petrii, and A. xylosoxidans and to other species of this family to
different degrees. In absolute numbers, B. petrii and A. xylosoxidans
have the highest numbers of clusters in common with the
symbionts. However, considering the genome length, Taylorella
species have the highest proportions of clusters in common with
the A. deanei and S. culicis endosymbionts. A phylogenomic analysis
using 235 orthologs was performed in order to establish the
evolutionary history among A. xylosoxidans A8, B. petrii DSM
12804, T. asinigenitalis MCE3, T. equigenitalis MCE9, Ca. K.
blastocrithidii and Ca. K. crithidii. The results indicated that
symbionts present in both trypanosomatid species are closely
PLOS ONE | www.plosone.org
related to the Alcaligenaceae family (Figure S1). Pseudomonas
aeruginosa PA7 was the Gammaproteobacteria used as outgroup.
These data corroborate the results from Alves et al. 2011 [11].
Although the genome lengths of both trypanosomatid bacteria
are slightly larger than those of Buchnera sp. [31], they are several
fold larger than those of symbiotic bacteria, which have extremely
reduced genomes [32]. Analysis of the B. pertussis and B.
parapertussis genomes revealed a process of gene loss during host
adaptation [33,34]. This process was proposed to be associated
with mobile DNA elements such as Insertion Sequences (IS) and
the presence of pseudo genes [33,34]. However, the mechanism(s)
involved in the length reduction observed for the genomes of the
two symbionts studied here needs further investigation. Our data
enable future studies examining the relationship between endosymbiosis in trypanosomatids and the origin of organelles in
eukaryotic cells.
Host Trypanosomatid Characteristics
The microtubule cytoskeleton and flagellum of the host
trypanosomatids. The cytoskeleton is composed of structures
such as the microtubular subpelicular corset, the axoneme, the
basal body, and the paraflagellar rod [35]. Thus, the cytoskeleton
controls several characteristics of trypanosomatids such as their
shape, the positions of structures, the flagellar beating and the host
colonization. The presence of the symbiont has been related to
unique characteristics of the host trypanosomatid.
Six members of the tubulin superfamily (a, b, d, c, e and f) are
present in A. deanei and S. culicis. Accordingly, d and e-tubulins are
present in organisms that possess basal bodies and flagella [36]. ctubulin is localized in the basal body of A. deanei [14] as in other
trypanosomatids [35]. Additionally, in common with other
trypanosomatids, five centrins were identified in A. deanei and S.
culicis. Furthermore, symbiont-containing trypanosomatids contain
e-tubulin, as in algae genomes, which can be related to the
replication and inheritance of the centriole and basal bodies
[37,38]. Interestingly, the absence of microtubules that form the
subpelicular corset in areas where the mitochondrion touches the
plasma membrane is unique to symbiont-containing trypanosomatids [15]. However, we cannot explain this atypical microtubule
distribution based on database searches. Moreover, no classical
eukaryotic microtubule associated proteins (MAPs) or intermediate
filament homologues were identified in symbiont-bearing or other
trypanosomatids, except for TOG/MOR1 and Asp.
Actin and other protein homologues that play roles in the
binding and nucleation of actin filaments are present in A. deanei
and S. culicis. However, the ARP 2/3 complex, which is involved in
the nucleation of actin, is absent in symbiont-bearing species. As
actin seems to be necessary for endocytosis in trypanosomatids
[39], the absence of some proteins involved in actin nucleation
may be related to the low rates of endocytosis of these protozoa
(unpublished data). Indeed, both symbiont-bearing trypanosomatids have low nutritional requirements, as the symbiotic bacterium
completes essential metabolic routes of the host cell [3].
Trypanosomatids are the only organisms from the orders
Euglenida and Kinetoplastida that have a paraflagellar rod. This
structure is continuously associated with axoneme and it contains
two major proteins designated PFR1 and PFR2 [35]. Importantly,
only PFR1 was identified in A. deanei and S. culicis. Perhaps we
missed PFR2 since these PFR proteins are highly repetitive and
their assemblies are difficult. Nevertheless, these species have a
reduced paraflagellar rod located at the proximal area of the
flagellum [15,16], although the same pattern of flagellar beating
described for other trypanosomatids is observed for A. deanei [40].
The paraflagellar rod components (PFC) 4, PFC 10, PFC 16, and
5
April 2013 | Volume 8 | Issue 4 | e60209
Predicting Proteins of A. deanei and S. culicis
PFC 18 were detected in the A. deanei database, whereas in S. culicis
PFC 11 was also identified. Other minor components of the
paraflagellar rod could not be detected. Accordingly, RNA
interference (RNAi) knockdown of PFCs such as PFC3 does not
impair the flagellar movement of T. brucei [41], differently from
PFC4 and PFC6 depletion [42].
Several other minor flagellar proteins detected in these and
other trypanosomatids are absent in A. deanei and S. culicis,
especially the flagellar membrane proteins and those involved in
intraflagellar transport (kinesins). Symbiont-containing species had
adenylate kinase B (ADKB) but not ADKA, in contrast to other
trypanosomatids, which express both. These proteins are involved
in the maintenance of ATP supply to the distal portion of the
flagellum [43,44].
Taken together, the differences in the composition and function
of the cytoskeleton in symbiont-containing trypanosomatids seem
to represent adaptations to incorporate the endosymbiont. Further
exploration of these differences could enable a better understanding of how endosymbiosis was established.
The kinetoplast. The kinetoplast is an enlarged portion of
the single mitochondrion that contains the mitochondrial DNA,
which exhibits an unusual arrangement of catenated circles that
form a network. The kinetoplast shape and the kDNA topology
vary according to species and developmental stage. Endosymbiont-containing trypanosomatids show differences in the morphology and topology of the kDNA network when compared to other
species of the same family. Both species present a loose kDNA
arrangement, but in A. deanei, the kinetoplast has a trapezoid-like
shape with a characteristic transversal electron-dense band,
whereas in S. culicis the disk shape structure is wider at the center
in relation to the extremities [2,17].
Differences in kDNA arrangement are related to low molecular
weight basic proteins such as kinetoplast-associated protein (KAP),
taking part in the organization and segregation of the kDNA
network [45,46]. Our data indicate that KAP4 and KAP3
homologues are present in A. deanei, while KAP4, KAP2
homologues, and ScKAP-like protein are found in S. culicis (Table
S1). In addition, a conserved nine amino acid domain in the Nterminal region, most likely a mitochondrial import signal [47,48],
is found in AdKAP4 and ScKAP4 (amino acid positions 10 to 16)
(Figure S2). Furthermore, ScKAP2 has a conserved domain called
the High Mobility Group (HMG), indicating that this protein may
be involved in protein-protein interactions. These KAPs might be
related to the typical kDNA condensation of symbiont-bearing
trypanosomatids.
Housekeeping genes. Histones, which are responsible for
structuring the chromatin, are highly conserved proteins that
appeared in the eukaryotic branch of evolution. Although well
conserved, Trypanosomatidae histones display differences in the N
and C-terminal sequences, sites of post-translational modifications,
when compared to other eukaryotes. Phylogenetic analysis
revealed that histones and their variants in both A. deanei and S.
culicis are clustered in a separate branch, between the Trypanosoma
and Leishmania species (Figure 3A). Similar phylogenetic distribution is seen for the dihydrofolate reductase-thymidylate synthase
when we performed the analysis using nucleotide sequences
(Figure 3B). Nevertheless, the symbiont-bearing species show
conservation in the sites of post-translation when compared to
other trypanosomes as shown in supplementary Figure S3. In A.
deanei and S. culicis the proteins related to the chromatin assembly
are also maintained, including histones and histone-modifying
enzymes as shown in Tables S2–S7 and Figure S4 of the
supporting information. For a more detailed analysis about
housekeeping genes of A. deanei and S. culicis see Text S1.
PLOS ONE | www.plosone.org
DNA replication, repair, transcription, translation and signal
transduction in A. deanei and S. culicis functions can be respectively
attributed at least to 914 ORFs and 643 ORFs (Table 3). Most of
the genes are exclusive to the protozoan and are absent in the
endosymbiont (Table 4), thus indicating that these processes are
exclusive to the host organism as shown in the supplementary
Tables S8–S13, typically containing a conserved spliced-leader
RNA as found in other trypanosomes (see Figure S5 for more
information). A total of 133 and 130 proteins with similar
functions are detectable in the endosymbionts of both species, with
up to 95% amino acid identity to proteins of Bordetella sp. and A.
xylosoxidans.
Similar DNA repair proteins are present in both eukaryote and
prokaryote predicted sequences. These findings demonstrate that
the endosymbionts conserved essential housekeeping proteins
despite their genome reduction. Some differences were found in
mismatch repair (MMR) between symbiont-bearing trypanosomatid genomes. As microsatellite instability is considered the
molecular fingerprint of the MMR system, we compared the
abundance of tandem repeats in the genomes of A. deanei and S.
culicis and their respective endosymbionts. We noticed that the
genomes of S. culicis and its endosymbiont are more repetitive than
the genomes of A. deanei and its endosymbiont (Figure 4A).
However, the higher repetitive content of the genomes of S. culicis
and its endosymbiont is not only due to the higher number of
microsatellite loci (Figure 4B) but also to the expansion of the size
of the microsatellite sequences. These data suggest that microsatellites of S. culicis and its endosymbiont evolved faster than those of
A. deanei and its endosymbiont. Interestingly, we identified some
missing components of the MMR machinery in S. culicis that are
present in A. deanei, such as exonuclease I (Exo I), a 59-39
exonuclease that is implicated in the excision step of the DNA
mismatch repair pathway (Table S9). Several studies have
correlated the silencing of the ExoI protein and/or mutations of
the ExoI gene and microsatellite instability with development of
lymphomas and colorectal cancer [49,50,51]. Therefore, we
speculate that deficiencies in the MMR machinery in S. culicis
may be related to the high proportion of microsatellites in its
genome. The association between microsatellite instability and
MMR deficiency has already been described for T. cruzi strains
[52,53]. The same variability pattern is observed for each
symbiont, despite the fact that the MMR machinery seems to be
complete in both symbiotic bacteria (Table S10). It is tempting to
speculate that this finding may indicate that the parasite and its
endosymbiont are exposed to the same environment and therefore
may be subjected to similar selective pressures imposed by an
external oxidative condition.
A. deanei and S. culicis have 607 and 421 putative kinaseencoding genes, respectively (Table 5). Thirty one of the A. deanei
kinases were classified in the AGC family, 31 as atypical, 49 as
CAMK, 15 as CK1, 108 as CMGC, 64 as STE, 1 as TKL, 81 as
others, and 227 that could not be classified in any of these families.
No typical tyrosine kinases (TK) are present in A. deanei or S. culicis,
as in other trypanosomes, although tyrosine residues are subjected
to phosphorylation [54,55]. Several phosphatases have also been
described in trypanosomes, pointing toward their regulatory role
in the development of these organisms. The T. brucei PTP
(TbPTP1) is associated with the cytoskeleton and has been
reported to be intrinsically involved in this parasite’s cycle [56].
Similar sequences are found in the A. deanei genome, including
PTP1, which is not found in the S. culicis database. Additionally, a
large number of other PTPs appear in both genomes, including
ectophosphatases (Table S14).
6
April 2013 | Volume 8 | Issue 4 | e60209
Predicting Proteins of A. deanei and S. culicis
PLOS ONE | www.plosone.org
7
April 2013 | Volume 8 | Issue 4 | e60209
Predicting Proteins of A. deanei and S. culicis
Figure 3. Phylogenetic of histones of A. deanei, S. culicis, and other trypanosomatids. Histone protein (panel A) and nucleotide (panel B)
sequences were generated by MUSCLE tool using 10 iterations in the Geneious package [120]. Trees were constructed using the Geneious Tree
Builder, by employing Jukes-Cantor genetic distance model with a neighbor-joining method and no out-groups. The consensus trees were generated
from 100 bootstrap replicates of all detected histone genes, as shown below. Scale bars are indicated for each consensus tree. The trees in panel A
are based in a collection of sequences of all trypanosomatids. The nucleotide sequences used for dihydrofolate reductase-thymidylate synthase are: T.
cruzi, XM_810234; T. brucei, XM_841078; T. vivax, HE573023; L. mexicana, FR799559; L. major, XM_001680805; L. infantum, XM_001680805; and C.
fasciculata, M22852.
doi:10.1371/journal.pone.0060209.g003
A. deanei sequences codify enzymes involved in RNAi, a
mechanism described in various organisms that promotes the
specific degradation of mRNA. RNAi is initiated by the
recognition of double-stranded RNA through the action of
endoribonucleases known as Dicer and Slicer, members of the
Argonaut (Ago) protein family (RNase H-type) [58]. The cleavage
of double-stranded RNA results in a complex that specifically
cleaves mRNA molecules that are homologous to the doublestranded sequence. A. deanei contains the gene coding Dicer-like
protein II (AGDE14022) and Ago1 (AGDE11548), homologous to
enzymes in T. brucei and Leishmania braziliensis (Ngo et al., 1998; Lye
et al., 2010). In addition, A. deanei contains the RNA interference
factor (RIF) 4 (AGDE09645) with an exonuclease domain of the
DnaQ superfamily, as described in T. brucei. A fragmented RIF5
sequence was also found in the sequence AGDE15656. These
proteins were shown to interact with Ago1 as was recently
demonstrated in T. brucei [59], suggesting that RNAi might be
active in A. deanei. None of these sequences were found in the S.
culicis database.
Two major signal transduction pathways are described in
trypanosomatids: one is the cyclic AMP-dependent route and the
other is the mitogen-activated protein kinase pathway [57]. The
major components of these pathways, including phosphatidylinositol signaling, mTOR and MAPK signaling pathways are
identified in A. deanei and S. culicis. These pathways may regulate
cellular activities such as gene expression, mitosis, differentiation,
and cell survival/apoptosis (Table 6).
Most genes encoding heat shock proteins are present in
symbiont-bearing species, as was previously described in other
trypanosomatids (Table S15). Genes for redox molecules and
antioxidant enzymes, which are part of the oxidative stress
response, are also present in the A. deanei and S. culicis genomes.
Both contain slightly more copies of ascorbate peroxidase,
methionine sulfoxide reductase, glucose-6-phosphate dehydrogenase, and trypanothione reductase genes than L. major. In
particular, several genes related to the oxidative stress response
are present in higher copy numbers in symbiont-bearing
trypanosomatids than in L. major (Figure 5).
Table 3. Numbers of ORFs identified in A. deanei and S. culicis and their symbionts, according to the mechanisms of DNA
replication and repair, signal transduction, transcription and translation.
Number of ORFs
Mechanism
A. deanei
S. culicis
A. deanei symbiont
S. culicis symbiont
Replication and Repair
178
148
56
54
Base excision repair
34
34
9
9
DNA replication
54
32
11
11
Homologous recombination
11
11
16
15
Mismatch repair
28
29
12
12
Non-homologous end-joining
8
7
–
–
Nucleotide excision repair
43
35
8
7
Signal Transduction
136
46
1
1
Phosphatidylinositol signaling system
23
17
–
–
–
mTOR signaling pathway
113
29
–
Two component system
–
–
1
1
Transcription
96
61
3
3
Basal transcription factors
15
4
–
–
RNA polymerase
28
16
3
3
Spliceosome
53
41
–
–
Translation
504
388
73
72
Aminoacyl-tRNA biosynthesis
63
56
25
25
mRNA surveillance pathway
43
45
–
–
Ribosome proteins
231
152
48
47
Ribosome biogenesis in eukaryotes
84
66
–
–
RNA transport
83
69
–
–
TOTAL
914
643
133
130
doi:10.1371/journal.pone.0060209.t003
PLOS ONE | www.plosone.org
8
April 2013 | Volume 8 | Issue 4 | e60209
Predicting Proteins of A. deanei and S. culicis
Table 4. Summary of the origin of ORFs found in A. deanei and S. culicis.
A. deanei
Functional Classification
Symbiont
Prokaryotes*
Eukaryotes**
P/E***
Replication and Repair
Base excision repair
5
11
4/0
Nucleotide excision repair
2
16
9/0
Non-homologous end-joining
1
5
N
Mismatch repair
2
13
8/0
Homologous recombination
2
9
10/0
DNA replication
3
22
10/0
Signal Transduction
Two-component system
N
N
1
Phosphatidylinositol signaling system
0
16
N
mTOR signaling pathway
0
8
N
MAPK signaling pahway - yeast
0
1
N
Transcription
Spliceosome
0
20
N
RNA polymerase
0
16
3/0
Basal transcription factors
0
5
N
Translation
RNA transport
0
31
N
Ribosome biogenesis in eukaryotes
0
27
N
Ribosome
0
75
48/0
mRNA surveillance pathway
0
17
N
Aminoacyl-tRNA biosynthesis
0
22
23
S. culicis
Functional Classification
Symbiont
Prokaryotes
Eukaryotes
P/E
Base excision repair
2
6
5/0
Nucleotide excision repair
2
10
7/0
Non-homologous end-joining
1
1
N
Mismatch repair
1
5
8/0
Replication and Repair
Homologous recombination
1
4
11/0
DNA replication
2
15
9/0
Signal Transduction
Two-component system
N
N
1
Phosphatidylinositol signaling system
0
11
N
mTOR signaling pathway
0
8
N
MAPK signaling pathway - yeast
0
0
N
Transcription
Spliceosome
0
13
RNA polymerase
0
11
Basal transcription factors
0
2
3/0
Translation
RNA transport
0
19
N
Ribosome biogenesis in eukaryotes
0
20
N
Ribosome
0
53
46/0
mRNA surveillance pathway
0
16
N
Aminoacyl-tRNA biosynthesis
0
18
23
*Number of genes with identity to Prokaryotes.
**Number of genes with identity to Eukaryotes.
***Ratio of the number of genes with identity to Prokaryotes/Eukaryotes.
doi:10.1371/journal.pone.0060209.t004
PLOS ONE | www.plosone.org
9
April 2013 | Volume 8 | Issue 4 | e60209
Predicting Proteins of A. deanei and S. culicis
Table 6. Representative ORFs involved in the signal
transduction pathways in A. deanei and S. culicis.
Cell cycle control in host trypanosomes. In eukaryotes,
DNA replication is coordinated with cell division by a cyclin-CDK
complex that triggers DNA duplication during the S phase of the
cell cycle. Multiple copies of the CRK gene (cdc2-related protein
kinase) are found in A. deanei and four genes coding for two
different CRKs are present in S. culicis. Both proteins exhibit
structural features of the kinase subunits that make up the CDK
complex, as they contain the cyclin-binding PSTAIRE motif, an
ATP-binding domain and a catalytic domain. These motifs and
domains are not the same in different CRKs (Figure S6), strongly
Table 5. Kinase families identified in trypanosomatids.
AGC
31
23
Atypical
31
21
CAMK
49
39
CK1
15
8
CMGC
108
77
STE
64
31
TKL
1
0
Other
81
58
No hits found
227
164
TOTAL
607
421
S. culicis
doi:10.1371/journal.pone.0060209.t005
PLOS ONE | www.plosone.org
S. culicis
AGDE02036
STCU01612
Diacylglycerol kinase
AGDE02361
STCU00226
CDP-diacylglycerol-inositol-3phosphatidyltransferase
AGDE04835
STCU01286
Myo-inositol-1(or 4) monophosphatase
AGDE08470
STCU02993
Phospholipase C
AGDE12052
STCU02439
Phosphatidylinositol 4-phosphate
5-kinase alpha
AGDE09669
STCU03909
Inositol-1,4,5-trisphosphate (IP3) 5-phosphatase
AGDE06690
nd
phosphatidate cytidylyltransferase
AGDE09922
nd
Mitogen-activated protein kinase 5
AGDE00259
STCU00603
Protein kinase A
AGDE06073
STCU01525
TP53 regulating kinase
AGDE08400
nd
Serine/threonine-protein kinase CTR1
AGDE00613
nd
Casein kinase
AGDE11868
STCU01611
Phosphoinositide-specific phospholipase C
nd
STCU09903
suggesting that these CRKs might control different stages of the
cell cycle. A. deanei contains four genes coding for cyclins. Three of
these genes are homologues to mitotic cyclin from S. cerevisiae and
T. brucei. However, none of them contain the typical destruction
domain present in T. brucei mitotic cyclin [60]. The fourth codes
for a S. cerevisiae Clb5 homolog, an S-phase cyclin. These data
indicate that more than one CRK and more than one cyclin would
be involved in the cell cycle control of symbiont-containing
trypanosomatids, suggesting that tight regulation must occur to
guarantee the precise maintenance of only one symbiont per cell
[14].
Cell cycle control in the endosymbionts. Bacterial cell
division is a highly regulated event that mainly depends on two
structures, the peptidoglycan layer and the Z ring. The first step in
the segregation of the bacterium is the formation of a polymerized
Z ring at the middle of the cell. This structure acts as a platform
for the recruitment of other essential proteins named Filament
Temperature Sensitive (Fts), which are mainly involved in the
formation and stabilization of the Z ring [61,62] and in
establishing the peptidoglycan septum formation site in most
bacteria [63] (Figure 6A).
Two fts sequences were identified in A. deanei and S. culicis
symbionts based on Bordetella genes (Table 7). One of them is FtsZ,
which requires integral membrane proteins such as Zip A and
FtsA for anchoring. However, these sequences are absent in the
symbionts. FtsZ should also interact with FtsE, which is absent in
both symbionts. This protein is homologous to the ATP-binding
cassette of ABC transporters and co-localizes with the division
septum [64]. The lack of these proteins could be related to the
absence of a classical Z ring in these symbionts. The other
sequence is FtsK that docks FtsQ, FtsB and FtsL, which are related
to the formation of the peptidoglycan layer in E. coli and B. subtilis
[65,66,67], but these proteins are absent in symbionts, as in most
bacteria that exhibit reduced peptidoglycan production [64].
RodA, a homologous integral membrane protein involved in
bacterial cell growth, is detected in the endosymbionts. RodA
could replace FtsW, which is absent in both symbionts. FtsW is
The Coordinated Division of the Bacterium during the
Host Protozoan Cell Cycle
A. deanei
A. deanei
Calmodulin
nd: not determined.
doi:10.1371/journal.pone.0060209.t006
Figure 4. Microsatellite content in the genomes of A. deanei, S.
culicis, and their endosymbionts. Panel (A) shows the percentage of
repetitive nucleotides for each repeat length. The total numbers of
nucleotides are derived from microsatellite sequences divided by the
total number of assembled nucleotides. Panel (B) shows the microsatellite density. The values indicate the number of microsatellite loci
divided by the genome length6100.
doi:10.1371/journal.pone.0060209.g004
Kinase family
Product
10
April 2013 | Volume 8 | Issue 4 | e60209
Predicting Proteins of A. deanei and S. culicis
Figure 5. Oxidative stress-related genes in the genomes of A. deanei, S. culicis and L. major. The figure shows the number of ORFs for the
indicated enzymes for each species.
doi:10.1371/journal.pone.0060209.g005
Figure 6. Schematic representation of the cell division machinery found in the endosymbionts. Panel (A) indicates the basic model
derived from a gram-negative bacterium with the localization of each component (shown on the right). Panel (B) represents the components found
in the endosymbiont of A. deanei, and Panel (C) shows the steps in the assembly of the Z-ring. The missing components of the A. deanei
endosymbiont are drawn in red.
doi:10.1371/journal.pone.0060209.g006
PLOS ONE | www.plosone.org
11
April 2013 | Volume 8 | Issue 4 | e60209
Predicting Proteins of A. deanei and S. culicis
Table 7. Members of the Fts family and PBPs that are present in endosymbionts of A. deanei and S. culicis.
Function
Protein
A. deanei
S. culicis
Stabilization and attachment of FtsZ polymers to the inner membrane
FtsA
nd
nd
FtsE
nd
nd
ZipA
nd
nd
FtsK
CKCE00084
CKBE00632
FtsQ
nd
nd
FtsB
Nd
nd
FtsL
nd
nd
FtsN
nd
nd
Lipid II flippase
FtsW(RodA)
CKCE 00486
CKBE00079
Forms a dynamic cytoplasmic ring structure at midcell
FtsZ
CKCE00034
CKBE00683
Penicillin binding proteins (PBPs)
PBP1A
CKCE00524
CKBE00119
PBP2
CKCE00487
CKBE00080
Interaction with peptidogycan synthases PBPs
FtsI/PBP3
CKCE00487
CKBE00080
PBP4
nd
nd
PBP5/dacC
CKCE00510
CKBE00105
PBP6
nd
nd
PBP6B
nd
nd
PBP7
nd
nd
nd: not determined.
doi:10.1371/journal.pone.0060209.t007
division by the host protozoan [6]. These losses could be
understood since the host trypanosomatid is controlling the
number of symbiotic bacteria per cell. This phenomenon has
been described for obligatory intracellular bacteria that co-evolve
in eukaryotic cells, as well as for the organelles of prokaryotic
origin, the chloroplast and the mitochondrion [74,75].
essential for the localization of FtsI (PBP3) in the Z ring [68],
which is absent in the symbiotic bacteria.
Endosymbionts have only one bifunctional synthase (PBP1A),
while E. coli has PBP1A, PBP1B, and PBP1C. Cells require at least
one of these synthases for viability. The peptidoglycan layer is
functional in trypanosomatid symbionts, as shown by treatment
with b-lactam antibiotics affecting the division of the bacterium,
generating filamentous structures and culminating in cell lysis.
PBP1 and PBP2 have also been detected at the symbiont envelope
[6]. PBP1B interacts with the two essential division proteins, FtsN
and PBP3/FtsI, which are absent in the symbiont. PBP1B can also
interact with PBP2 that is identified in both symbiont databases
(see Table 7).
A sequence encoding a minor PBP described in E. coli was also
identified in the symbionts. This protein is known as a putative
PBP precursor (PBP5/dacC). This PBP is involved in the
regulation of the peptidoglycan structure, along with 3 other
minor PBPs described in E. coli, but these are absent from the
symbiont (Table 7). On the other hand, all the enzymes involved
in the synthesis of activated nucleotide precursors for the assembly
of the peptidoglycan layer are present in the symbiont genome,
except for Braun’s lipoprotein (Lpp), which forms the lipidanchored disaccharide-pentapeptide monomer subunit [69]. In E.
coli strains, mutations in Lpp genes result in a significant reduction
of the permeability barrier, although small effects on the
maintenance of the cell growth and metabolism were observed
in these cells [70,71].
Taken together, we consider that gene loss in the dcw cluster
[72] (represented in Figure 6) explains the lack of the FtsZ ring in
the endosymbiont during its division process [73]. Moreover, the
symbiont envelope contains a reduced peptidoglycan layer and
lacks a septum during its division process, which can be related to
the facilitation of metabolic exchanges, as well as to the control of
PLOS ONE | www.plosone.org
Metabolic Co-evolution of the Bacterium and the Host
Trypanosomatid
Symbiosis in trypanosomatids is characterized as a mutual
association where both partners benefit. These symbiont-bearing
protozoa have low nutritional requirements, as intense metabolic
exchanges occur. Our data corroborate previous biochemical and
ultrastructural analyses showing that the bacterium has enzymes
and metabolic precursors that complete important biosynthetic
pathways of the host [76].
Oxidative phosphorylation. FoF1-ATP synthase and the
entire mitochondrial electron transport chain are present in A.
deanei and S. culicis, although some subunits are missing (Table 8).
These species have a rotenone-insensitive NADH:ubiquinone
oxidoredutase in complex I, as do other trypanosomatids [77].
Ten complex II (succinate:ubiquinone reductase) subunits of the
twelve identified in T. cruzi [78] are also present in both
trypanosomatids. Many subunits from complex III, composed of
cytochrome c reductase, are found in A. deanei and S. culicis. In
addition, these protozoa contain genes for cytochrome c, as
previously suggested by biochemical studies in other symbiontcontaining trypanosomatids [3,79].
Both symbionts contain sequences with hits for all subunits of
complex I, NADH:ubiquinone oxidoredutase, similar to E. coli
(Table 8). Complexes II and III, including cytochrome c, and
complex IV (cytochrome c oxidase, succinate:ubiquinone reductase and cytochrome c reductase, respectively) are not found in
12
April 2013 | Volume 8 | Issue 4 | e60209
Predicting Proteins of A. deanei and S. culicis
Table 8. Respiratory chain complexes identified in the predicted proteome of A. deanei, S. culicis and their respective
endosymbionts.
Complex I
A. deanei
A. deanei endosymbiont
S. culicis
S. culicis endosymbiont
33
0
33
0
Complex II
10
0
10
0
Complex III
5
0
4
0
Complex IV
10
2*
2
2*
Complex V
10
8
3
8
*The complex IV of the endosymbionts might be a cytochrome d ubiquinol oxidase identified in both organisms, instead a classical cytochrome c oxidase.
doi:10.1371/journal.pone.0060209.t008
intensive metabolic exchanges, reducing the nutritional requirements of these trypanosomatids when compared to species without
the symbiotic bacterium, or to aposymbiotic strains. Several
biochemical studies have been carried out analyzing the biosynthetic pathways involved in this intricate relationship as recently
reviewed [76], and our genomic data corroborate these findings. A
schematic description of the potential metabolic interactions
concerning the metabolism of amino acids, vitamins, cofactors,
and hemin is provided in Figure 7.
Both symbiotic bacteria have genes potentially encoding for all
necessary enzymes for lysine, phenylalanine, tryptophan and
tyrosine synthesis, in agreement with previous experimental data
[40]. Tyrosine is required in the growth medium of A. deanei [81],
but it is not essential for S. oncolpelti or S. culicis [41,82,83]. Here, in
the symbiotic bacteria, we found enzymes involved in tyrosine
synthesis, as well as indications that phenylalanine and tyrosine
can be interconverted. In fact, protozoan growth is very slow in
absence of phenylalanine and tryptophan [81], which may
either symbiont. However, we detected the presence of cytochrome d as found in Allochromatium vinosum, and also a cytochrome
d oxidase with a sequence close to that of B. parapertussis. All
portions of the FoF1-ATP synthase were identified in symbionts,
although not every subunit of each portion was found.
Lipid metabolism. The sphingophospholipid (SPL) content
in A. deanei and its symbiont has been previously described, with
phosphatidylcholine (PC) representing the major SPL in the host,
whereas cardiolipin predominates in the symbiotic bacterium
[5,80]. The synthetic pathway of phosphatidylglycerol from
glycerol phosphate is present in both host trypanosomatids (Table
S16). The biosynthetic pathways of PC and PE from CDP-choline
and CDP-ethanolamine (Kennedy pathways), that synthesize PC
and PE respectively, are incomplete in A. deanei and S. culicis.
Nevertheless, the methylation pathway (Greenberg pathway),
which converts PE in PC, seems to be absent in both
trypanosomatids, even though one enzyme sequence was identified in A. deanei.
The symbiont of A. deanei exhibits two routes for phosphatidylethanolamine (PE) synthesis, starting from CDP-diacylglycerol and
producing phosphatidylserine as an intermediate (Table S17).
Interestingly, this last step of the pathway is not found in the S.
culicis endosymbiont. Importantly, both symbionts lack genes that
encode proteins of PC biosynthetic pathways, reinforcing the idea
that this phospholipid is mainly obtained from the host protozoa
[5]. Remarkably, phoshpatidylglycerophosphatase A, which produces the intermediate phosphatidylglycerol necessary for cardiolipin biosynthesis, was not found in either protozoa but is present
in both symbionts. As cardiolipin is present in the inner
membranes of host mitochondria, the symbionts may complete
cardiolipin biosynthesis.
Pathways for sphingolipid production, including the synthesis of
ceramide from sphingosine-1P, are present in A. deanei, while S.
culicis lacks enzymes of this pathway (Table S16). Both host
trypanosomatids have glycerol kinase and 3-glycerophosphate
acyltransferase, enzymes for the synthesis of 1,2-diacyl-sn-glycerol
and triacylglycerol from D-glycerate. In endosymbionts, glycerolipid metabolism seems to be reduced to two enzymes: 3glycerophosphate acyltransferase and 1-acylglycerol-3-phosphate
O-acyltransferase (Table S17), suggesting metabolic complementation between partners.
Furthermore, both hosts contain enzymes of the biosynthesis
pathway for ergosterol production from zymosterol, as well as the
pathway of sterol biosynthesis that produces lanosterol from
farnesyl-PP. These pathways are only complete in A. deanei. The
symbionts do not have enzymes for sterol biosynthesis, in
accordance with our previous biochemical analysis [80].
Figure 7. Main metabolic exchanges between host and
endosymbionts. Schematic representation of the amino acids,
vitamins, and cofactors exchanged between A. deanei and S. culicis
and their respective symbionts. Dotted lines indicate pathways that
have or might have contributions from both partners, whereas
metabolites inside one of the circles, representing the symbiont or
host, indicate that one partner holds candidate genes coding for
enzymes of the whole biosynthetic pathway. *Candidate genes were
only found for the symbiont of S. culicis and not for the symbiont of A.
deanei. BCAA (branched-chain amino acids) are leucine, isoleucine and
valine.
doi:10.1371/journal.pone.0060209.g007
Metabolism of amino acids, vitamins, cofactors and
hemin. Symbiosis in trypanosomatids is characterized by
PLOS ONE | www.plosone.org
13
April 2013 | Volume 8 | Issue 4 | e60209
Predicting Proteins of A. deanei and S. culicis
However, we cannot discard the possibility that adenosine is
transported to the intracellular medium by carriers of monophosphate nucleoside or by the presence of other enzymes that have the
same function as 59-nucleotidase. On the other hand, the lack of
59-nucleotidase in A. deanei and S. culicis can be related to the fact
that such protozoa are only insect parasites. According to this idea,
several studies have shown the importance of ectonucleotidases in
the establishment of infection by some trypanosomatid species
[91]. The high activity of ectonucleotidases with concomitant
production of adenosine, a known immune system inhibitor, lead
to high susceptibility to Leishmania infection because adenosine can
induce anti-inflammatory effects on the host [92,93].
Nucleoside transporters can take up nucleosides and nucleobases generated by ectonucleotidase activity. Genes encoding
nucleoside transporters are present in both trypanosomatid
genomes (Table S19), enabling cells to obtain exogenous purines
from the medium. Furthermore, A. deanei and S. culicis contain
intracellular enzymes that can convert purines to nucleotides, such
as adenine phosphoribosyltransferase, hypoxanthine-guanine
phosphoribosyltransferase, adenylate kinase, AMP deaminase,
inosine monophosphate dehydrogenase and GMP synthetase.
These data indicate that these organisms can interconvert
intracellular purines into nucleotides. In contrast, both endosymbionts lack all the genes encoding enzymes related to purine
salvage. Nevertheless, the symbiotic bacteria have genes encoding
all the enzymes expected to participate in the de novo synthesis of
purine nucleotides as previously proposed [94,95]. One interesting
possibility is that the symbiotic bacterium is able to supply the host
trypanosomatid with purines. According to this idea, the
endosymbiont participates in the de novo purine nucleotide pathway
of A. deanei, as the aposymbiotic strain is unable to utilize glycine
for the synthesis of purine nucleotides, only for pyrimidine
nucleotide production [87].
Protozoa are generally, but not universally considered to be
capable of synthesizing pyrimidines from glutamine and aspartic
acid, which are used as precursors. Our results indicate that both
symbiont-bearing trypanosomatids carry out de novo pyrimidine
synthesis (Table S19). Interestingly, in silico analyses also revealed
the presence of all the genes for de novo pyrimidine synthesis in both
symbiont genomes, but not for the pyrimidine salvage pathway. A
previous report indicated that A. deanei was able to synthesize
purine and pyrimidine nucleotides from glycine (‘‘de novo’’
pathway) and purine nucleotides from adenine and guanine
(‘‘salvage’’ pathway). Adenine would be incorporated into both
adenine and guanine nucleotides, whereas guanine was only
incorporated into guanine nucleotides, suggesting a metabolic
block at the level of GMP reductase [87].
Deoxyribonucleotides are derived from the corresponding
ribonucleotides by reactions in which the 29-carbon atom of the
D-ribose portion of the ribonucleotide is directly reduced to form
the 29-deoxy derivative. This reaction requires a pair of hydrogen
atoms that are donated by NADPH via the intermediate-carrying
protein thioredoxin. The disulfide thioredoxin is reduced by
NADPH in a reaction catalyzed by thioredoxin reductase,
providing the reducing equivalents for the ribonucleotide reductase, as observed for the endosymbionts that could provide 29deoxy derivatives. In folate metabolism, the formation of thymine
nucleotides requires methylation of dUMP to produce dTMP, a
reaction catalyzed by thymidilate kinase, which is present in A.
deanei, S. culicis, and their respective endosymbionts. Figure 8
summarizes the purine and pyrimidine metabolisms in A. deanei
and S. culicis considering the metabolic complementarity between
the protozoan and the endosymbiont.
indicate that larger amounts of these amino acids are required for
rapid cell proliferation.
Our data indicate that branched-chain amino acid (BCAA)
synthesis mainly occurs in the symbionts except for the last step,
with the branched-chain amino acid aminotransferase found in the
host protozoan.
Among the pathways that (might) involve contributions from
both partners, two have previously been characterized in detail,
the urea cycle and heme synthesis. The urea cycle is complete in
both symbiont-harboring trypanosomatids. Symbiotic bacteria
contribute with ornithine carbamoyltransferase, which converts
ornithine to citrulline, and with ornithine acetyltransferase, which
transforms acetylornithine in ornithine. Conversely, aposymbiotic
strains and symbiont-free Crithidia species need exogenous arginine
or citrulline for cell proliferation [8] [68]. Our genomic data
corroborate these studies.
Contrary to symbiont-free trypanosomatids, A. deanei and S.
culicis do not require any source of heme for growth because the
bacterium contains the required enzymes to produce heme
precursors that complete the heme synthesis pathway in the host
cell [7,9,10,11,84]. Our results support the idea that heme
biosynthesis is mainly accomplished by the endosymbiont, with
the last three steps of this pathway performed by the host
trypanosomatid, and in most cases also by the bacterium as
described in [11]. Furthermore, this metabolic route may
represent the result of extensive gene loss and multiple lateral
gene transfer events in trypanosomatids [11].
According to our genomic analyses, the symbiotic bacteria also
perform the synthesis of histidine, folate, riboflavin, and coenzyme
A, but one step is missing in the middle of each pathway, making
them candidates for metabolic interchange with the host. In the
case of folate and coenzyme A biosynthesis, one candidate gene
was found in the host trypanosomatid. Moreover, none of these
four metabolites are required in the growth medium of A. deanei
and S. culicis [85], suggesting that these pathways are fully
functional.
Candidate genes for the ubiquinone biosynthetic pathway were
found in S. culicis but none for A. deanei endosymbionts. For the
route with chorismate as precursor, only the first out of nine steps
is missing in the S. culicis endosymbiont; moreover a candidate
gene for that step is found in S. culicis genome. Only a few steps of
these pathways are absent in A. deanei and S. culicis host organisms.
In L. major, the ubiquinone ring synthesis has been described as
having either acetate (via chorismate as in prokaryotes) or
aromatic amino acids (as in mammalian cells) as precursors [45].
Methionine is considered essential for the growth of A. deanei, S.
culicis and S. oncopelti [41,81,82]. We were not able to identify one
enzyme among the four involved in the synthesis of methionine
from either pyruvate or serine via cysteine in the genomes of A.
deanei and S. culicis. No candidate to complement this pathway was
found in the symbiotic bacteria.
Purine and pyrimidine metabolism for nucleotide
production. Trypanosomatids are not able to synthesize the
purine ring de novo [86,87,88]. We observed that endosymbiontbearing trypanosomatids contain sequences encoding ectonucleotidases from the E-NTPDase family and the adenosine deaminase
family (Table S18), which are required for the hydrolysis and
deamination of extracellular nucleotides [89,90]. Interestingly,
sequences encoding 59-nucleotidases are not found in either
symbiont-bearing trypanosomatid. The absence of this enzyme
can be related to the presence of the endosymbiont, which can
supply adenosine to the host cell, as we found all genes involved in
the de novo pathway in the symbionts, indicating that they are able
to complement the purine requirements of the host (Figure 8).
PLOS ONE | www.plosone.org
14
April 2013 | Volume 8 | Issue 4 | e60209
Predicting Proteins of A. deanei and S. culicis
Figure 8. Purine production, acquisition, and utilization in A. deanei and S. culicis. The figure illustrates the production, acquisition and
utilization of purines in the host trypanosomes considering the presence of endosymbiont enzymes. This model suggests that the trypanosomatid
acquires purines from the symbiont, which synthesizes them de novo. Some ecto-localized proteins, such as apyrase (APY) and adenosine deaminase
(ADA), could be responsible for the generation of extracellular nucleosides, nucleobases, and purines. Nucleobases and purines could be acquired by
the parasite through membrane transporters (T) or diffusion and could be incorporated into DNA, RNA, and kDNA molecules after ‘‘purine salvage
pathway’’ processing. Abbreviations: NTP (nucleoside tri-phosphate), NDP (nucleoside di-phosphate), NMP (nucleoside mono- phosphate), N
(nucleobase), ADO (adenosine), INO (inosine).
doi:10.1371/journal.pone.0060209.g008
In this way, both symbiont-containing protozoa express a
unique complement of nutritionally indispensable salvage and
interconversion enzymes that enable the acquisition of purines
from the medium. The intracellular purines can be acquired
through the medium by the action of ectonucleotidases and
nucleoside transporters.
residues is the dolichyl-diphosphooligosaccharide-protein glycosyltransferase (DDOST), an oligosaccharyltransferase (OST) that is
not classified in any of the above-mentioned families. The A. deanei
and S. culicis DDOSTs contain the STT3 domain, a subunit
required to establish the activity of the oligosaccharyl transferase
(OTase) complex of proteins, and they are orthologous to the
human DDOST. These OTase complexes are responsible for
transferring lipid-linked oligosaccharides to the asparagine side
chain of the acceptor polypeptides in the endoplasmic reticulum
[101], suggesting a conserved N-glycosylation among the trypanosomatids.
Five different GalfT sequences are also present in the
endosymbiont-bearing trypanosomatids, and all of them contain
the proposed catalytic site, indicating genetic redundancy.
Redundancy of GalfTs is commonly observed in many different
trypanosomatid species, as different transferases are used for each
linkage type [102]. As b-galactofuranose (b-Galf) has been shown
to participate in trypanosome-host interactions [103], their
presence in A. deanei and S. culicis might also indicate a role in
the interaction with the insect host. However, no enzymes
involved in synthesis of b-Galf-containing glycoconjugates are
detected in our A. deanei dataset, despite reports of enzymes
involved in b-Galf synthesis in Crithidia spp. [104,105,106].
Surface proteins and protease gene families. One
remarkable characteristic of trypanosomatid genomes is the large
expansion of gene families encoding surface proteins [107].
Experimental data indicated that these genes encode surface
proteins involved in interactions with the hosts. We selected eight
gene families encoding surface proteins present in T. cruzi, T. brucei
and Leishmania spp. to search for homologous sequences in the
genomes of the two symbiont-bearing trypanosomatids. Because
the draft assemblies of these genomes are still fragmented, we also
used a read-based analysis to search for sequences with homology
to these multigene families. It is well known that misassemblies
frequently occur for tandemly repeated genes, as most repetitive
copies collapse into only one or two copies. A total of 3,624,411
reads (corresponding to 1,595 Mb of sequences) from the A. deanei
genome and 2,666,239 reads (corresponding to 924 Mb) from the
S. culicis genome were used in this comparison. In A. deanei and S.
culicis, we identified gene families encoding amastins, gp63, and
Factors Involved in Protozoa-host Interactions
Monoxenic trypanosomatids only parasitize invertebrates,
especially insects belonging to the orders Diptera and Hemiptera
[1]. These organisms have been found in Malphigian tubules, in
the hemolymph and hemocoel, and in the midgut, which is
considered the preferential site for protozoal multiplication and
colonization [1,96,97]. S. culicis, for example, is able to colonize the
insect midgut, to invade the hemocoel and to reach the salivary
glands [97,98]. The presence of the symbiotic bacterium has been
shown to influence the interactions between trypanosomatid cells
and insect cell lines, explanted guts and host insects [4,20]. This
seems to occur because the endosymbiont influences the glycoprotein and polysaccharide composition of the host, the exposure
of carbohydrates on the protozoan plasma membrane, and the
surface charge [18,19,20,21].
Several glycosyltransferases from the two major families (GT-A
and GT-B [99]) and members of the family 25 (glycosyltransferases involved in lipo-oligosaccharide protein biosynthesis) are
present in both A. deanei and S. culicis genomes (Table S20). Other
glycosyltransferases with no characteristic domains that are thus
not classified as belonging to the GT-A or GT-B families are also
found in the A. deanei and S. culicis genomes. Importantly, 1,2fucosyltransferase transferase is present in A. deanei but not in the S.
culicis dataset, and fucose residues were found in high amounts on
glycoinositolphospholipid (GIPL) molecules of A. deanei, different
from the observations for other trypanosomatids (data not
published). Although the role of fucose is unknown, fucose and
arabinose transfer to lipophosphoglycan (LPG) of Leishmania is
noticed when the culture medium is supplemented with this
carbohydrate [100], suggesting that fucose might have a specific
role in A. deanei-insect interactions.
Another glycosyltransferase found in both A. deanei and S. culicis
genomes and involved in the N-glycosylation of asparagine
PLOS ONE | www.plosone.org
15
April 2013 | Volume 8 | Issue 4 | e60209
Predicting Proteins of A. deanei and S. culicis
similarity to any other known protein, its function remains
unknown. In this work, we identified 31 genes with sequences
belonging to all four sub-families of amastins in the genome of A.
deanei and 14 copies of amastin genes in S. culicis. Similar to
Leishmania, members of all four amastin subfamilies were identified
in symbiont-containing species (see Figure S7).
cysteine peptidases (Table S21). As expected, we could not identify
sequences homologous to mucin-like glycoproteins typical of T.
cruzi [108], variant surface glycoprotein (VSG) characteristic of
African trypanosomes, or trans-sialidases present in the genomes
of all Trypanosoma species.
Calpain-like cysteine peptidases constitute the largest gene
family identified in the A. deanei (85 members) and S. culicis (62
members) genomes, and they are also abundant in trypanosomatids [46]. The presence of the N-terminal fatty acid acylation motif
was found in some members of calpain-like cysteine peptidases,
indicating that some of these peptidases are associated with
membranes, as has also been shown for other members of the
family [109,110]. The relatively large amount of calpain-like
peptidases may be related to the presence of the endosymbiont,
which would require a more complex regulation of the cell cycle
and intracellular organelle distribution [14], as cytosolic calpains
were found to regulate cytoskeletal remodeling, signal transduction, and cell differentiation [46].
A second large gene family in the A. deanei and S. culicis genomes
encoding surface proteins with proteolytic activity is gp63. In our
genomic analyses, we identified 37 and 9 genes containing
sequences homologous to the gp63 of Leishmania and Trypanosoma
spp. in the genomes of A. deanei and S. culicis, respectively. Proteins
belonging to this group of zinc metalloproteases, also known as
major surface protease (MSP) or leishmanolysin, have been
characterized in various species of Leishmania and Trypanosoma
[111]. Extensive studies on the role of this family in Leishmania
indicate that they are involved in several aspects of host-parasite
interaction including resistance to complement-mediated lysis, cell
attachment, entry, and survival in macrophages [112]. Gene
deletion studies in T. brucei indicated that the TbMSP of
bloodstream trypanosomes acts in concert with phospholipase C
to remove the variant surface protein from the membrane,
required for parasite differentiation into the procyclic insect form
[113]. Gp63-like molecules have been observed on the cell surface
of symbiont-harboring trypanosomatids [114]. Importantly, the
symbiont containing A. deanei displays a higher amount (2-fold) of
leishmanolysin-like molecules at the surface compared to the
aposymbiotic strain, which are unable to colonize insects [4]. As
anti-gp63 antibodies decrease protozoan-insect interactions [21],
our results reinforce the idea that the presence of such interactions
caused the expansion of this gene family in endosymbiont-bearing
organisms.
In contrast, only two copies of lysosomal cathepsin-like cysteine
peptidases were identified in the A. deanei (AGDE05983 and
AGDE10254) and S. culicis genomes (STCU01417 and
STCU06430). The two A. deanei sequences encode identical
cathepsin-B-like proteins, whereas the two S. culicis genes encode
proteases of the cathepsin-L-like group. This class of cysteine
peptidase is represented by cruzain or cruzipain, major lysosomal
proteinases of T. cruzi expressed by parasites found in insect and
vertebrate hosts, and encoded by a large gene family [115,116]. In
T. cruzi, these enzymes have important roles in various aspects of
the host/parasite relationship and in intracellular digestion as a
nutrient source [115]. Conversely, the low copy number of this
class of lysosomal peptidase in symbiont-containing trypanosomatids seems to be related to their low nutritional requirements.
Amastins constitute a third large gene family in the A. deanei and
S. culicis genomes that encodes surface proteins. Initially described
in T. cruzi [117], amastin genes have also been identified in various
Leishmania species [118], in A. deanei and in another related insect
parasite, Leptomonas seymouri [119]. In Leishmania, amastins constitute the largest gene family with gene expression that is regulated
during the parasite life cycle. As amastin has no sequence
PLOS ONE | www.plosone.org
Conclusion
The putative proteome of symbiont-bearing trypanosomatids
revealed that these microorganisms exhibit unique features when
compared to other protozoa of the same family and that they are
most closely related to Leishmania species. Most relevant are the
differences in the genes related to cytoskeleton, paraflagellar and
kinetoplast structures, along with a unique pattern of peptidase
gene organization that may be related to the presence of the
symbiont and of the monoxenic life style. The symbiotic bacteria
of A. deanei and S. culicis are phylogenetically related with a
common ancestor, most likely a b-proteobacteria of the Alcaligenaceae family. The genomic content of these symbionts is highly
reduced, indicating gene loss and/or transfer to the host cell
nucleus. In addition, we confirmed that both bacteria contain
genes that encode enzymes that complement several metabolic
routes of the host trypanosomatids, supporting the fitness of the
symbiotic relationship.
Supporting Information
Figure S1 Evolutionary history of endosymbionts obtained through a phylogenomic approach. The figure
indicates analysis using the Neighbor joining (NJ) (A) and
Maximum parsimony (MP) (B) methods. For NJ and MP, the
percentage of replicate trees in which the associated taxa clustered
together in the bootstrap test (1,500 replicates) is shown next to the
branches. The scale bar represents amino acids substitutions per
site.
(TIF)
Figure S2 Amino acid alignment of Kinetoplast Associ-
ated Proteins. Panel (A) shows the KAP4 ClustalW alignment of
A. deanei (AdKAP-4), S. culicis (ScKAP-4) and C. fasciculata (CfKAP4). Panel (B) shows the ClustalW alignment of KAP2 of S. culicis
and C. fasciculata (CfKAP2-2, GenBank Q9TY84 and CfKAP2-1
GeneBank Q9TY83). Black color highlight is 100% similar gray is
80 to 99% similar light gray is 60 to 79% similar white is less than
59% similar.
(TIF)
Figure S3 Comparison of the histone sequences of A.
deanei and S. culicis with other trypanosomes. Residues
indicated in red correspond to lysines that are acetylated and
green, methylated in T. cruzi and T. brucei [121]. Residues
indicated in blue are predicted site for phosphorylation upon DNA
damage as shown in T. brucei [122].
(TIF)
Figure S4 Phylogenetic tree of sirtuins from Trypanosomatids. The numbers represent bootstrap values. The proteins
from each species are grouped in nuclear and mitochondrial Sir2
based on the sequences of S. cerevisiae (nuclear), and the similarity
with S. coelicolor and S. enterica.
(TIF)
Figure S5 Phylogenetic tree of spliced leader (SL)
sequences of A. deanei and S. culicis. A neighbor-joining
tree (1000 bootstraps) obtained by MEGA 5.0 using the SL gene
from the A. deanei and S. culicis genome sequences and sequences
16
April 2013 | Volume 8 | Issue 4 | e60209
Predicting Proteins of A. deanei and S. culicis
Table S11 Identified ORFs involved in DNA transcription and
RNA splicing in the genome of A. deanei and S. culicis.
(DOC)
retrieved from GenBank (S. culicis DQ860203.1, L. pyrrhocoris
JF950600.1, H. samuelpessoai X62331.1, H. mariadeanei
AY547468.1, A. deanei EU099545.1, T. rangeli AF083351 and T.
cruzi AY367127).
(TIF)
Table S12 Transcription related proteins in the endosymbionts
of A. deanei and S. culicis.
(DOC)
Figure S6 Comparison between the amino acid sequences of S. culicis CRK sequences. The figure shows a
ClustalW alignment with the ATP binding domains boxed in
yellow, PSTAIRE motifs boxed in blue, and the catalytic domain
boxed in pink. Red residues indicate the observed variations in the
amino acids involved in the activity.
(TIF)
Table S13 Main ORFs detected participating in ribosomal
biogenesis and translation in A. deanei and S. culicis.
(DOC)
Table S14
Table S15 Number of heat shock and stress response proteins in
A. deanei and S.culicis.
(DOC)
Tree showing the distribution of amastin subfamilies in A. deanei. The amastins are grouped as deltaamastin (red), gamma-amastins (yellow), alpha-amastins (dark
blue) and beta-amastins (light blue).
(TIF)
Figure S7
Table S16
Glycerophospholipids (GPL) enzymes of A.
deanei and S. culicis endosymbionts.
(DOC)
Table S17
protein (KAPs) in A. deanei and S. culicis.
(DOC)
Table S2 Histone acetyltransferases of the MYST
family present in A. deanei and S. culicis compared to
other trypanosomes.
(DOC)
Table S18 Ectonucleotidases families and identification
of ORFs found in A. deanei and S. culicis.
(DOC)
Table S19 ORFs encoding enzymes involved in purine
and pyrimidine metabolism of A. deanei, S. culicis and
their symbionts.
(DOC)
Table S3 Distribution of Sirtuins in the protozoan and
endosymbiont species.
(DOC)
Table S4 Histone deacetylase identified in A. deanei
and S. culicis.
(DOC)
Table S20
Glysosyltransferases found in A. deanei and
S. culicis.
(DOC)
Table S5 Histone methyltransferase in A. deanei and S.
Table S21
culicis.
(DOC)
Surface proteins of A. deanei e S. culicis.
(DOC)
Text S1
Histone chaperones identified in A. deanei and S. culicis.
(DOC)
(DOC)
Table S7
Glycerophospholipids (GPL) enzymes of A. deanei and
S. culicis1.
(DOC)
Table S1 ORFs identified as Kinetoplast-associated
Table S6
Identified phosphatases in A. deanei and S. culicis.
(DOC)
Acknowledgments
Bromodomain proteins found in A. deanei and
S. culicis.
(DOC)
We would like to dedicate this paper to professors Erney Camargo and
Marta Teixeira who have made important contributions related to the
study of basic aspects of the biology of trypanosomatids, especially those
harboring an endosymbiont, and identified several new species of this
relevant and interesting group of eukaryotic microorganism.
Table S8 Components of replication mechanism of the kDNA
identified in A. deanei and S. culicis and similar endosymbionts
ORFs.
(DOC)
Author Contributions
Table S9
Identified ORFs related to DNA replication and DNA
repair in A. deanei and S. culicis.
(DOC)
Conceived and designed the experiments: MCMM WS SS ATRV.
Analyzed the data: MCMM ACAM SSAS CMCCP RS CCK LGPA OLC
LPC MB ACC BAL CRM CMAS CMP CBAM CET DCB DFG DPP
ECG FFG FKM GFRL GW GHG JLRF MCE MHSG MFS MP PHS
RPMN SMRT TEFM TAOM TPÜ WS SS ATRV. Contributed
reagents/materials/analysis tools: ATRV LGPA OLC WS. Wrote the
paper: MCMM SS ATRV.
Table S10 DNA replication and repair ORFs found in the A.
deanei and S. culicis endosymbionts.
(DOC)
References
3. Edwards C, Chance B (1982) Evidence for the presence of two terminal
oxidases in the trypanosomatid Crithidia oncopelti. Journal of General
Microbiology 128: 1409–1414.
4. Fampa P, Correa-da-Silva MS, Lima DC, Oliveira SM, Motta MC, et al.
(2003) Interaction of insect trypanosomatids with mosquitoes, sand fly and the
respective insect cell lines. International Journal for Parasitology 33: 1019–
1026.
1. Wallace FG (1966) The trypanosomatid parasites of insects and arachnids.
Experimental Parasitology 18: 124–193.
2. Teixeira MM, Borghesan TC, Ferreira RC, Santos MA, Takata CS, et al.
(2011) Phylogenetic validation of the genera Angomonas and Strigomonas of
trypanosomatids harboring bacterial endosymbionts with the description of
new species of trypanosomatids and of proteobacterial symbionts. Protist 162:
503–524.
PLOS ONE | www.plosone.org
17
April 2013 | Volume 8 | Issue 4 | e60209
Predicting Proteins of A. deanei and S. culicis
31. Shigenobu S, Watanabe H, Hattori M, Sakaki Y, Ishikawa H (2000) Genome
sequence of the endocellular bacterial symbiont of aphids Buchnera sp. APS.
Nature 407: 81–86.
32. McCutcheon JP, Moran NA (2012) Extreme genome reduction in symbiotic
bacteria. Nature Reviews Microbiology 10: 13–26.
33. Parkhill J, Sebaihia M, Preston A, Murphy LD, Thomson N, et al. (2003)
Comparative analysis of the genome sequences of Bordetella pertussis, Bordetella
parapertussis and Bordetella bronchiseptica. Nature Genetics 35: 32–40.
34. Cummings CA, Brinig MM, Lepp PW, van de Pas S, Relman DA (2004)
Bordetella species are distinguished by patterns of substantial gene loss and host
adaptation. Journal of Bacteriology 186: 1484–1492.
35. Gull K (1999) The cytoskeleton of trypanosomatid parasites. Annual Review of
Microbiology 53: 629–655.
36. Berriman M, Ghedin E, Hertz-Fowler C, Blandin G, Renauld H, et al. (2005)
The Genome of the African Trypanosome Trypanosoma brucei. Science 309:
416–422.
37. Beech PL, Heimann K, Melkonian M (1991) Development of the Flagellar
Apparatus during the Cell-Cycle in Unicellular Algae. Protoplasma 164: 23–
37.
38. Lange BM, Gull K (1996) Structure and function of the centriole in animal
cells: progress and questions. Trends in Cell Biology 6: 348–352.
39. Garcia-Salcedo JA, Perez-Morga D, Gijon P, Dilbeck V, Pays E, et al. (2004) A
differential role for actin during the life cycle of Trypanosoma brucei. The EMBO
Journal 23: 780–789.
40. Gadelha C, Wickstead B, Gull K (2007) Flagellar and ciliary beating in
trypanosome motility. Cell Motility and the Cytoskeleton 64: 629–643.
41. Portman N, Gull K (2010) The paraflagellar rod of kinetoplastid parasites: from
structure to components and function. International Journal for Parasitology
40: 135–148.
42. Lacomble S, Vaughan S, Gadelha C, Morphew MK, Shaw MK, et al. (2009)
Three-dimensional cellular architecture of the flagellar pocket and associated
cytoskeleton in trypanosomes revealed by electron microscope tomography.
Journal of Cell Science 122: 1081–1090.
43. Oberholzer M, Marti G, Baresic M, Kunz S, Hemphill A, et al. (2007) The s
cAMP phosphodiesterases TbrPDEB1 and TbrPDEB2: flagellar enzymes that
are essential for parasite virulence. The FASEB Journal 21: 720–731.
44. Ginger ML, Portman N, McKean PG (2008) Swimming with protists:
perception, motility and flagellum assembly. Nature Reviews Microbiology 6:
838–850.
45. Xu C, Ray DS (1993) Isolation of proteins associated with kinetoplast DNA
networks in vivo. Proceedings of the National Academy of Sciences of the
United States of America 90: 1786–1789.
46. Ersfeld K, Barraclough H, Gull K (2005) Evolutionary relationships and
protein domain architecture in an expanded calpain superfamily in
kinetoplastid parasites. Journal of Molecular Evolution 61: 742–757.
47. Avliyakulov NK, Lukes J, Ray DS (2004) Mitochondrial histone-like DNAbinding proteins are essential for normal cell growth and mitochondrial
function in Crithidia fasciculata. Eukaryotic Cell 3: 518–526.
48. Cavalcanti DP, Shimada MK, Probst CM, Souto-Padron TC, de Souza W, et
al. (2009) Expression and subcellular localization of kinetoplast-associated
proteins in the different developmental stages of Trypanosoma cruzi. BMC
Microbiology 9: 120.
49. Wei K, Clark AB, Wong E, Kane MF, Mazur DJ, et al. (2003) Inactivation of
Exonuclease 1 in mice results in DNA mismatch repair defects, increased
cancer susceptibility, and male and female sterility. Genes & Development 17:
603–614.
50. Wu Y, Berends MJ, Post JG, Mensink RG, Verlind E, et al. (2001) Germline
mutations of EXO1 gene in patients with hereditary nonpolyposis colorectal
cancer (HNPCC) and atypical HNPCC forms. Gastroenterology 120: 1580–
1587.
51. Kim YR, Yoo NJ, Lee SH (2010) Somatic mutation of EXO1 gene in gastric
and colorectal cancers with microsatellite instability. Acta oncologica 49: 859–
860.
52. Augusto-Pinto L, Teixeira SM, Pena SD, Machado CR (2003) Singlenucleotide polymorphisms of the Trypanosoma cruzi MSH2 gene support the
existence of three phylogenetic lineages presenting differences in mismatchrepair efficiency. Genetics 164: 117–126.
53. Machado CR, Augusto-Pinto L, McCulloch R, Teixeira SM (2006) DNA
metabolism and genetic diversity in Trypanosomes. Mutation Research 612:
40–57.
54. Andreeva AV, Kutuzov MA (2008) Protozoan protein tyrosine phosphatases.
International Journal for Parasitology 38: 1279–1295.
55. Brenchley R, Tariq H, McElhinney H, Szoor B, Huxley-Jones J, et al. (2007)
The TriTryp phosphatome: analysis of the protein phosphatase catalytic
domains. BMC Genomics 8: 434.
56. Szoor B, Wilson J, McElhinney H, Tabernero L, Matthews KR (2006) Protein
tyrosine phosphatase TbPTP1: a molecular switch controlling life cycle
differentiation in trypanosomes. The Journal of Cell Biology 175: 293–303.
57. Huang H (2011) Signal transduction in Trypanosoma cruzi. Advances in
Parasitology 75: 325–344.
58. Atayde VD, Tschudi C, Ullu E (2011) The emerging world of small silencing
RNAs in protozoan parasites. Trends in Parasitology 27: 321–327.
5. de Azevedo-Martins AC, Frossard ML, de Souza W, Einicker-Lamas M, Motta
MC (2007) Phosphatidylcholine synthesis in Crithidia deanei: the influence of the
endosymbiont. FEMS Microbiology Letters 275: 229–236.
6. Motta MCM, Leal LHM, Souza WD, De Almeida DF, Ferreira LCS (1997)
Detection of Penicillin-binding Proteins in the Endosymbiont of the
Trypanosomatid Crithidia deanei. The Journal of Eukaryotic Microbiology 44:
492–496.
7. Chang KP, Chang CS, Sassa S (1975) Heme biosynthesis in bacteriumprotozoon symbioses: enzymic defects in host hemoflagellates and complemental role of their intracellular symbiotes. Proceedings of the National
Academy of Sciences of the United States of America 72: 2979–2983.
8. Camargo EP, Freymuller E (1977) Endosymbiont as supplier of ornithine
carbamoyltransferase in a trypanosomatid. Nature 270: 52–53.
9. Galinari S, Camargo EP (1978) Trypanosomatid protozoa: survey of
acetylornithinase and ornithine acetyltransferase. Experimental Parasitology
46: 277–282.
10. Salzman TA, Batlle AM, Angluster J, de Souza W (1985) Heme synthesis in
Crithidia deanei: influence of the endosymbiote. The International Journal of
Biochemistry 17: 1343–1347.
11. Alves JM, Voegtly L, Matveyev AV, Lara AM, da Silva FM, et al. (2011)
Identification and phylogenetic analysis of heme synthesis genes in trypanosomatids and their bacterial endosymbionts. PLoS One 6: e23518.
12. Frossard ML, Seabra SH, DaMatta RA, de Souza W, de Mello FG, et al.
(2006) An endosymbiont positively modulates ornithine decarboxylase in host
trypanosomatids. Biochemical and Biophysical Research Communications 343:
443–449.
13. Motta MC, Soares MJ, Attias M, Morgado J, Lemos AP, et al. (1997)
Ultrastructural and biochemical analysis of the relationship of Crithidia deanei
with its endosymbiont. European Journal of Cell Biology 72: 370–377.
14. Motta MC, Catta-Preta CM, Schenkman S, Azevedo Martins AC, Miranda K,
et al. (2010) The bacterium endosymbiont of Crithidia deanei undergoes
coordinated division with the host cell nucleus. PLoS One 5: e12415.
15. Freymuller E, Camargo EP (1981) Ultrastructural differences between species
of trypanosomatids with and without endosymbionts. The Journal of
Protozoology 28: 175–182.
16. Gadelha C, Wickstead B, de Souza W, Gull K, Cunha-e-Silva N (2005) Cryptic
paraflagellar rod in endosymbiont-containing kinetoplastid protozoa. Eukaryotic Cell 4: 516–525.
17. Cavalcanti DP, Thiry M, de Souza W, Motta MC (2008) The kinetoplast
ultrastructural organization of endosymbiont-bearing trypanosomatids as
revealed by deep-etching, cytochemical and immunocytochemical analysis.
Histochemistry and Cell Biology 130: 1177–1185.
18. Dwyer DM, Chang KP (1976) Surface membrane carbohydrate alterations of a
flagellated protozoan mediated by bacterial endosymbiotes. Proceedings of the
National Academy of Sciences of the United States of America 73: 852–856.
19. Oda LM, Alviano CS, Filho FCS, Angluster J, Roitman I, et al. (1984) Surface
Anionic Groups in Symbiote-Bearing and Symbiote-Free Strains of Crithidia
deanei. The Journal of Eukaryotic Microbiology 31: 131–134.
20. d9Avila-Levy CM, Silva BA, Hayashi EA, Vermelho AB, Alviano CS, et al.
(2005) Influence of the endosymbiont of Blastocrithidia culicis and Crithidia deanei
on the glycoconjugate expression and on Aedes aegypti interaction. FEMS
Microbiology Letters 252: 279–286.
21. d9Avila-Levy CM, Santos LO, Marinho FA, Matteoli FP, Lopes AH, et al.
(2008) Crithidia deanei: influence of parasite gp63 homologue on the interaction
of endosymbiont-harboring and aposymbiotic strains with Aedes aegypti midgut.
Experimental Parasitology 118: 345–353.
22. Du Y, Maslov DA, Chang KP (1994) Monophyletic origin of beta-division
proteobacterial endosymbionts and their coevolution with insect trypanosomatid protozoa Blastocrithidia culicis and Crithidia spp. Proceedings of the National
Academy of Sciences of the United States of America 91: 8437–8441.
23. Du Y, McLaughlin G, Chang KP (1994) 16S ribosomal DNA sequence
identities of beta-proteobacterial endosymbionts in three Crithidia species.
Journal of Bacteriology 176: 3081–3084.
24. Martin W, Hoffmeister M, Rotte C, Henze K (2001) An overview of
endosymbiotic models for the origins of eukaryotes, their ATP-producing
organelles (mitochondria and hydrogenosomes), and their heterotrophic
lifestyle. Biological chemistry 382: 1521–1539.
25. Hollar L, Lukes J, Maslov DA (1998) Monophyly of endosymbiont containing
trypanosomatids: phylogeny versus taxonomy. The Journal of Eukaryotic
Microbiology 45: 293–297.
26. Hebert L, Moumen B, Duquesne F, Breuil MF, Laugier C, et al. (2011)
Genome sequence of Taylorella equigenitalis MCE9, the causative agent of
contagious equine metritis. Journal of Bacteriology 193: 1785.
27. Hebert L, Moumen B, Pons N, Duquesne F, Breuil MF, et al. (2012) Genomic
characterization of the Taylorella genus. PLoS One 7: e29953.
28. Sugimoto C, Isayama Y, Sakazaki R, Kuramochi S (1983) Transfer of
Haemophilus equigenitalis Taylor et al. 1978 to the genusTaylorella gen. nov. as
Taylorella equigenitalis comb. nov. Current Microbiology 9: 155–162.
29. Moran NA, McCutcheon JP, Nakabachi A (2008) Genomics and evolution of
heritable bacterial symbionts. Annual Review of Genetics 42: 165–190.
30. Toft C, Andersson SG (2010) Evolutionary microbial genomics: insights into
bacterial host adaptation. Nature Reviews Genetics 11: 465–475.
PLOS ONE | www.plosone.org
18
April 2013 | Volume 8 | Issue 4 | e60209
Predicting Proteins of A. deanei and S. culicis
91. Sansom FM, Robson SC, Hartland EL (2008) Possible effects of microbial ectonucleoside triphosphate diphosphohydrolases on host-pathogen interactions.
Microbiology and Molecular Biology Reviews 72: 765–781.
92. Maioli TU, Takane E, Arantes RM, Fietto JL, Afonso LC (2004) Immune
response induced by New World Leishmania species in C57BL/6 mice.
Parasitology Research 94: 207–212.
93. Marques da Silva C, Miranda Rodrigues L, Passos da Silva Gomes A,
Mantuano Barradas M, Sarmento Vieira F, et al. (2008) Modulation of P2X7
receptor expression in macrophages from mineral oil-injected mice. Immunobiology 213: 481–492.
94. Rebora K, Desmoucelles C, Borne F, Pinson B, Daignan-Fornier B (2001)
Yeast AMP pathway genes respond to adenine through regulated synthesis of a
metabolic intermediate. Molecular and Cellular Biology 21: 7901–7912.
95. Zalkin H, Nygaard P (1996) Biosynthesis of purine nucleotides. In: Frederick
Carl N, editor. Escherichia coli and Salmonella : cellular and molecular biology. 2
ed. Washington, D.C.: ASM Press. 561–579.
96. Podlipaev SA (2000) Insect trypanosomatids: the need to know more.
Memorias do Instituto Oswaldo Cruz 95: 517–522.
97. Correa-da-Silva MS, Fampa P, Lessa LP, Silva Edos R, dos Santos Mallet JR,
et al. (2006) Colonization of Aedes aegypti midgut by the endosymbiont-bearing
trypanosomatid Blastocrithidia culicis. Parasitology Research 99: 384–391.
98. Nascimento MT, Garcia MC, da Silva KP, Pinto-da-Silva LH, Atella GC, et
al. (2010) Interaction of the monoxenic trypanosomatid Blastocrithidia culicis with
the Aedes aegypti salivary gland. Acta Tropica 113: 269–278.
99. Lairson LL, Henrissat B, Davies GJ, Withers SG (2008) Glycosyltransferases:
structures, functions, and mechanisms. Annual Review of Biochemistry 77:
521–555.
100. Mengeling BJ, Turco SJ (1998) Microbial glycoconjugates. Current Opinion in
Structural Biology 8: 572–577.
101. Schwarz F, Aebi M (2011) Mechanisms and principles of N-linked protein
glycosylation. Current Opinion in Structural Biology 21: 576–582.
102. Oppenheimer M, Valenciano AL, Sobrado P (2011) Biosynthesis of
galactofuranose in kinetoplastids: novel therapeutic targets for treating
leishmaniasis and chagas9 disease. Enzyme research 2011: 415976.
103. de Lederkremer RM, Colli W (1995) Galactofuranose-containing glycoconjugates in trypanosomatids. Glycobiology 5: 547–552.
104. Moraes CT, Bosch M, Parodi AJ (1988) Structural characterization of several
galactofuranose-containing, high-mannose-type oligosaccharides present in
glycoproteins of the trypanosomatid Leptomonas samueli. Biochemistry 27:
1543–1549.
105. Mendelzon DH, Previato JO, Parodi AJ (1986) Characterization of proteinlinked oligosaccharides in trypanosomatid flagellates. Molecular and Biochemical Parasitology 18: 355–367.
106. Mendelzon DH, Parodi AJ (1986) N-linked high mannose-type oligosaccharides in the protozoa Crithidia fasciculata and Crithidia harmosa contain
galactofuranose residues. The Journal of Biological Chemistry 261: 2129–2133.
107. El-Sayed NM, Myler PJ, Blandin G, Berriman M, Crabtree J, et al. (2005)
Comparative genomics of trypanosomatid parasitic protozoa. Science 309:
404–409.
108. El-Sayed NM, Myler PJ, Bartholomeu DC, Nilsson D, Aggarwal G, et al.
(2005) The genome sequence of Trypanosoma cruzi, etiologic agent of Chagas
disease. Science 309: 409–415.
109. Tull D, Vince JE, Callaghan JM, Naderer T, Spurck T, et al. (2004) SMP-1, a
member of a new family of small myristoylated proteins in kinetoplastid
parasites, is targeted to the flagellum membrane in Leishmania. Molecular
Biology of the Cell 15: 4775–4786.
110. Galetovic A, Souza RT, Santos MR, Cordero EM, Bastos IM, et al. (2011) The
repetitive cytoskeletal protein H49 of Trypanosoma cruzi is a calpain-like protein
located at the flagellum attachment zone. PLoS One 6: e27634.
111. Yao C, Li Y, Donelson JE, Wilson ME (2010) Proteomic examination of
Leishmania chagasi plasma membrane proteins: Contrast between avirulent and
virulent (metacyclic) parasite forms. Proteomics Clinical applications 4: 4–16.
112. Yao C, Donelson JE, Wilson ME (2003) The major surface protease (MSP or
GP63) of Leishmania sp. Biosynthesis, regulation of expression, and function.
Molecular and Biochemical Parasitology 132: 1–16.
113. Grandgenett PM, Otsu K, Wilson HR, Wilson ME, Donelson JE (2007) A
function for a specific zinc metalloprotease of African trypanosomes. PLoS
Pathogens 3: 1432–1445.
114. Nogueira de Melo AC, d9Avila-Levy CM, Dias FA, Armada JL, Silva HD, et
al. (2006) Peptidases and gp63-like proteins in Herpetomonas megaseliae: possible
involvement in the adhesion to the invertebrate host. International Journal for
Parasitology 36: 415–422.
115. Cazzulo JJ (2002) Proteinases of Trypanosoma cruzi: patential targets for the
chemotherapy of Chagas desease. Current Topics in Medicinal Chemistry 2:
1261–1271.
116. Caffrey CR, Lima AP, Steverding D (2011) Cysteine peptidases of kinetoplastid
parasites. Advances in experimental medicine and biology 712: 84–99.
117. Teixeira SM, Russell DG, Kirchhoff LV, Donelson JE (1994) A differentially
expressed gene family encoding ‘‘amastin,’’ a surface protein of Trypanosoma
cruzi amastigotes. The Journal of Biological Chemistry 269: 20509–20516.
118. Wu Y, El Fakhry Y, Sereno D, Tamar S, Papadopoulou B (2000) A new
developmentally regulated gene family in Leishmania amastigotes encoding a
homolog of amastin surface proteins. Molecular and Biochemical Parasitology
110: 345–357.
59. Barnes RL, Shi H, Kolev NG, Tschudi C, Ullu E (2012) Comparative
genomics reveals two novel RNAi factors in Trypanosoma brucei and provides
insight into the core machinery. PLoS Pathogens 8: e1002678.
60. Van Hellemond JJ, Neuville P, Schwarz RT, Matthews KR, Mottram JC
(2000) Isolation of Trypanosoma brucei CYC2 and CYC3 cyclin genes by rescue of
a yeast G(1) cyclin mutant. Functional characterization of CYC2. The Journal
of Biological Chemistry 275: 8315–8323.
61. Carballido-Lopez R, Errington J (2003) A dynamic bacterial cytoskeleton.
Trends in Cell Biology 13: 577–583.
62. Pichoff S, Lutkenhaus J (2002) Unique and overlapping roles for ZipA and FtsA
in septal ring assembly in Escherichia coli. The EMBO Journal 21: 685–693.
63. Harry E, Monahan L, Thompson L (2006) Bacterial cell division: the
mechanism and its precison. International Review of Cytology 253: 27–94.
64. Margolin W (2005) FtsZ and the division of prokaryotic cells and organelles.
Nature Reviews Molecular Cell Biology 6: 862–871.
65. Buddelmeijer N, Beckwith J (2004) A complex of the Escherichia coli cell division
proteins FtsL, FtsB and FtsQ forms independently of its localization to the
septal region. Molecular Microbiology 52: 1315–1327.
66. Chen JC, Beckwith J (2001) FtsQ, FtsL and FtsI require FtsK, but not FtsN, for
co-localization with FtsZ during Escherichia coli cell division. Molecular
Microbiology 42: 395–413.
67. Chen JC, Weiss DS, Ghigo JM, Beckwith J (1999) Septal localization of FtsQ,
an essential cell division protein in Escherichia coli. Journal of Bacteriology 181:
521–530.
68. Mercer KL, Weiss DS (2002) The Escherichia coli cell division protein FtsW is
required to recruit its cognate transpeptidase, FtsI (PBP3), to the division site.
Journal of Bacteriology 184: 904–912.
69. Bouhss A, Trunkfield AE, Bugg TD, Mengin-Lecreulx D (2008) The
biosynthesis of peptidoglycan lipid-linked intermediates. FEMS microbiology
reviews 32: 208–233.
70. Ni Y, Chen R (2009) Extracellular recombinant protein production from
Escherichia coli. Biotechnology Letters 31: 1661–1670.
71. Ni Y, Reye J, Chen RR (2007) lpp deletion as a permeabilization method.
Biotechnology and Bioengineering 97: 1347–1356.
72. Mingorance J, Tamames J, Vicente M (2004) Genomic channeling in bacterial
cell division. Journal of molecular recognition 17: 481–487.
73. Motta MC, Picchi GF, Palmie-Peixoto IV, Rocha MR, de Carvalho TM, et al.
(2004) The microtubule analog protein, FtsZ, in the endosymbiont of
trypanosomatid protozoa. The Journal of Eukaryotic Microbiology 51: 394–
401.
74. Timmis JN, Ayliffe MA, Huang CY, Martin W (2004) Endosymbiotic gene
transfer: organelle genomes forge eukaryotic chromosomes. Nature Reviews
Genetics 5: 123–135.
75. Pyke KA (2010) Plastid division. AoB plants 2010: plq016.
76. Motta MC (2010) Endosymbiosis in trypanosomatids as a model to study cell
evolution. The Open Parasitology Journal 4: 139–147.
77. Opperdoes FR, Michels PA (2008) Complex I of Trypanosomatidae: does it
exist? Trends in Parasitology 24: 310–317.
78. Morales J, Mogi T, Mineki S, Takashima E, Mineki R, et al. (2009) Novel
mitochondrial complex II isolated from Trypanosoma cruzi is composed of 12
peptides including a heterodimeric Ip subunit. The Journal of Biological
Chemistry 284: 7255–7263.
79. Edwards C (1984) Terminal oxidases of Crithidia oncopelti. FEMS Microbiology
Letters 21: 319–322.
80. Palmie-Peixoto IV, Rocha MR, Urbina JA, de Souza W, Einicker-Lamas M, et
al. (2006) Effects of sterol biosynthesis inhibitors on endosymbiont-bearing
trypanosomatids. FEMS Microbiology Letters 255: 33–42.
81. Mundim MH, Roitman I, Hermans MA, Kitajima EW (1974) Simple nutrition
of Crithidia deanei, a reduviid trypanosomatid with an endosymbiont. The
Journal of Protozoology 21: 518–521.
82. Newton BA (1956) A synthetic growth medium for the trypanosomid flagellate
Strigomonas (Herpetomonas) oncopelti. Nature 177: 279–280.
83. Newton BS (1957) Nutritional requirements and biosynthetic capabilities of the
parasitic flagellate Strigomonas oncopelti. Journal of General Microbiology 17:
708–717.
84. Camargo EP, Coelho JA, Moraes G, Figueiredo EN (1978) Trypanosoma spp.,
Leishmania spp. and Leptomonas spp.: enzymes of ornithine-arginine metabolism.
Experimental Parasitology 46: 141–144.
85. Gill JW, Vogel HJ (1963) A Bacterial Endosymbiote in Crithidia (Strigomonas)
oncopelti: Biochemical and Morphological Aspects. The Journal of Eukaryotic
Microbiology 10: 148–152.
86. Marr JJ, Berens RL, Nelson DJ (1978) Purine metabolism in Leishmania donovani
and Leishmania braziliensis. Biochimica et Biophysica Acta 544: 360–371.
87. Ceron CR, Caldas RD, Felix CR, Mundim MH, Roitman I (1979) Purine
metabolism in trypanosomatids. The Journal of Protozoology 26: 479–483.
88. Berens RL, Krugg EC, Marr JJ (1995) Purine and Pyrimidine Metabolism. In:
Marr JJ, Muller M, editors. Biochemistry and Molecular Biology of Parasites.
London: Academic Press. 89–117.
89. Zimmermann H (2000) Extracellular metabolism of ATP and other
nucleotides. Naunyn-Schmiedeberg9s Archives of Pharmacology 362: 299–309.
90. Plesner L (1995) Ecto-ATPases: identities and functions. International Review
of Cytology 158: 141–214.
PLOS ONE | www.plosone.org
19
April 2013 | Volume 8 | Issue 4 | e60209
Predicting Proteins of A. deanei and S. culicis
121. Schenkman S, Pascoalino Bdos S, Nardelli SC (2011) Nuclear Structure of
Trypanosoma cruzi. Advances in Parasitology 75: 251–283.
122. Glover L, Horn D (2012) Trypanosomal histone gammaH2A and the DNA
damage response. Molecular and Biochemical Parasitology 183: 78–83.
119. Jackson AP (2010) The evolution of amastin surface glycoproteins in
trypanosomatid parasites. Molecular Biology and Evolution 27: 33–45.
120. Drummond AJ, Ashton B, Buxton S, Cheung M, A C, et al. (2011) Geneious
v5.5.
PLOS ONE | www.plosone.org
20
April 2013 | Volume 8 | Issue 4 | e60209
Kangussu-Marcolino et al. BMC Microbiology 2013, 13:10
http://www.biomedcentral.com/1471-2180/13/10
RESEARCH ARTICLE
Open Access
Distinct genomic organization, mRNA expression
and cellular localization of members of two
amastin sub-families present in Trypanosoma cruzi
Monica Mendes Kangussu-Marcolino1, Rita Márcia Cardoso de Paiva2, Patrícia Rosa Araújo2,
Rondon Pessoa de Mendonça-Neto2, Laiane Lemos1, Daniella Castanheira Bartholomeu3, Renato A Mortara4,
Wanderson Duarte daRocha1* and Santuza Maria Ribeiro Teixeira2*
Abstract
Background: Amastins are surface glycoproteins (approximately 180 residues long) initially described in
Trypanosoma cruzi as particularly abundant during the amastigote stage of this protozoan parasite. Subsequently,
they have been found to be encoded by large gene families also present in the genomes of several species of
Leishmania and in other Trypanosomatids. Although most amastin genes are organized in clusters associated with
tuzin genes and are up-regulated in the intracellular stage of T. cruzi and Leishmania spp, distinct genomic
organizations and mRNA expression patterns have also been reported.
Results: Based on the analysis of the complete genome sequences of two T. cruzi strains, we identified a total of 14
copies of amastin genes in T. cruzi and showed that they belong to two of the four previously described amastin
subfamilies. Whereas δ-amastin genes are organized in two or more clusters with alternating copies of tuzin genes,
the two copies of β-amastins are linked together in a distinct chromosome. Most T. cruzi amastins have similar
surface localization as determined by confocal microscopy and western blot analyses. Transcript levels for
δ-amastins were found to be up-regulated in amastigotes from several T. cruzi strains, except in the G strain, which
is known to have low infection capacity. In contrast, in all strains analysed, β-amastin transcripts are more abundant
in epimastigotes, the stage found in the insect vector.
Conclusions: Here we showed that not only the number and diversity of T. cruzi amastin genes is larger than what
has been predicted, but also their mode of expression during the parasite life cycle is more complex. Although
most T. cruzi amastins have a similar surface localization, only δ-amastin genes have their expression up-regulated
in amastigotes. The results showing that a sub-group of this family is up-regulated in epimastigotes, suggest that, in
addition of their role in intracellular amastigotes, T. cruzi amastins may also serve important functions during the
insect stage of the parasite life cycle. Most importantly, evidence for their role as virulence factors was also unveiled
from the data showing that δ-amastin expression is down regulated in a strain presenting low infection capacity.
Keywords: Trypanosoma cruzi, Amastigote, Amastin, mRNA
* Correspondence: [email protected]; [email protected]
1
Departamento de Bioquímica e Biologia Molecular, Universidade Federal do
Paraná, Rua Quinze de Novembro, 1299, Centro Curitiba, PR 80060-000, Brazil
2
Departamento de Bioquímica e Imunologia, Av. Antônio Carlos, 6627,
Pampulha Belo Horizonte, MG 31270-901, Brazil
Full list of author information is available at the end of the article
© 2013 Kangussu-Marcolino et al.; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of
the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use,
distribution, and reproduction in any medium, provided the original work is properly cited.
Kangussu-Marcolino et al. BMC Microbiology 2013, 13:10
http://www.biomedcentral.com/1471-2180/13/10
Background
Trypanosoma cruzi, the protozoan parasite that is the etiologic agent of Chagas disease [1], undergoes four developmental stages during its complex life cycle: epimastigotes
and metacyclic trypomastigotes, present in the insect vector, and intracellular amastigotes and bloodstream trypomastigotes, present in the mammalian host. This parasite
must rely on a broad set of genes that allow it to multiply
in the insect gut, to differentiate into forms that are able to
invade and multiply inside a large number of distinct mammalian cell types and to circumvent the host immune system. To meet the challenges it faces during its life cycle,
complex regulatory mechanisms must control the expression of the T. cruzi repertoire of about 12,000 genes.
Among them, there are several large gene families encoding
surface proteins, which are key players directly involved in
host-parasite interactions (reviewed by Epting et al. [2]).
The amastin gene family was initially reported as a
group of T. cruzi genes encoding 174 amino acid transmembrane glycoproteins and whose mRNA are 60-fold
more abundant in amastigotes than in epimastigotes or
trypomastigotes [3]. The differential expression of amastin
mRNAs during the T. cruzi life cycle has been attributed
to cis-acting elements present in the 3’UTR as well as to
RNA binding proteins that may recognize this sequence
[4,5]. It is also known that amastin genes alternate with
genes encoding a cytoplasmic protein named tuzin [6].
After the completion of the genome sequences of several
Trypanosomatids it was revealed that the amastin gene
family is also present in various Leishmania species as well
as in two related insect parasites, Leptomonas seymouri
and Crithidia spp [7-9]. It has also been reported that this
gene family is actually much larger in the genus Leishmania when compared to other Trypanosomatids. Predicted topology based on sequences found in the genomes
of L. major, L. infantum and T. cruzi indicates that all
amastins have four transmembrane regions, two extracellular domains and N- and C-terminal tails facing the cytosol [8]. Moreover, comparative analyses of amastin genes
belonging to six T. cruzi strains evidenced that sequences
encoding the hydrophilic, extracellular domain, which is
less conserved, have higher intragenomic variability in
strains belonging to T. cruzi group II and hybrid strains
compared to T. cruzi I strains [10]. Based on phylogenetic
analyses of amastin orthologs from various Trypanosomatids, it has been proposed that amastins can be classified
into four subfamilies, named α-, β-, γ-, and δ- amastins.
Importantly, in L. major and L. infantum, in which members of all four sub-families are found, amastin genes
showed differences in genomic positions and expression
patterns of their mRNAs [8,9].
More than fifteen years after their discovery, the function
of amastins remains unknown. Because of the predicted
structure and surface localization in the intracellular stage
Page 2 of 11
of T. cruzi and Leishmania spp, it has been proposed that
amastins may play a role in host-parasite interactions
within the mammalian cell: they could be involved in
transport of ions, nutrients, across the membrane, or
involved with cell signaling events that trigger parasite differentiation [9]. Its preferential expression in the intracellular stage also suggest that it may constitute a relevant
antigen during parasite infection, a prediction that was
confirmed by studies showing that amastins peptides elicit
strong immune response during Leishmanial infection
[11]. Amastin antigens are considered a relevant immune
biomarker of cutaneous and visceral Leishmaniasis as well
as protective antigens in mice [12].
Although complete genome sequences of two strains of
T. cruzi (CL Brener and SylvioX-10) have been reported,
their assemblies were only partially achieved because of
their unusually high repeat content [13,14]. Therefore, for
several multi-gene families, such as the amastin gene family, their exact number of copies is not yet known.
According to the current assembly [15], only four δamastins and two β-amastins were identified in the CL
Brener genome. Herein, we used the entire data set of sequencing reads from the CL Brener [13] and Sylvio X-10
[14] genomes, to analyzed all sequences encoding amastin
orthologues present in the genomes of these two T. cruzi
strains and determine their copy number as well as their
genome organization. Expression of distinct amastin genes
in fusion with the green fluorescent protein, allowed us to
examine the cellular localization of different members of
both amastin sub-families. By determining the levels of
transcripts corresponding to each sub-family in all three
parasite stages of various strains we showed that, whereas
the levels of δ-amastins are up-regulated in amastigotes,
β-amastin transcripts are significantly increased in the epimastigote insect stage. Most importantly, evidence indicating that amastins may constitute T. cruzi virulence
factors was suggested by the analyses showing reduced expression of δ-amastins in amastigotes from strains known
to have lower infection capacity.
Results and discussion
The amastin gene repertoire of Trypanosoma cruzi
In its current assembly, the T. cruzi (CL Brener) genome
exhibits 12 putative amastin sequences. Because of its hybrid nature and the high level of divergence between alleles,
this genome was assembled as two set of contigs, each
corresponding to one haplotype that were denominated
Esmeraldo-like and non-Esmeraldo [13]. Therefore, the 12
amastin sequences annotated in the CL Brener genome
database actually correspond to 6 pairs of alleles. Based on
the analyses of amastin sequences present in the genomes
of different species of Trypanosoma and Leishmania, as
well as in two related insect parasites (Leptomonas seymouri and Crithidia spp.), Jackson (2010) [9] proposed a
Kangussu-Marcolino et al. BMC Microbiology 2013, 13:10
http://www.biomedcentral.com/1471-2180/13/10
classification into four amastin sub-families named α-, β-,
γ- and δ-amastins. In the current annotation of the T. cruzi
CL Brener genoma two genes that belong to the β-amastin
sub-family and four genes belonging to the δ-amastin subfamily can be identified. A phylogenetic tree constructed
with all 12 amastin sequences annotated in the CL Brener
genome plus orthologous sequences obtained from the
genome databases of the Sylvio X-10 strain and from the
partial genome sequence of the Esmeraldo strain shows a
clear division between β-amastin and δ-amastins sequences
(Figure 1). The tree also revealed the presence, in all three
genomes, of one divergent copy of δ-amastin which
we identified, in the CL Brener genome, as the two
alleles annotated as Tc00.1047053511071.40 and Tc00.1047053511903.50, named here as δ-Ama40 and δ-Ama50. It
should be noted that, in the phylogeny proposed by Jackson
(2010) [9], a group of δ-amastins that include all T. cruzi
amastins as well as amastins from Crithidia spp, were
grouped in a branch that was named proto-δ-amastins
from which all Leishmania δ-amastins subsequently
derived. It can also be depicted from the analyses described
by Jackson (2010) [9] and the phylogenetic tree shown on
Figure 1 that the two members of the β-subfamily, named
β1-amastin and β2-amastin are highly divergent. Whereas
among the CL Brener δ-amastins, if we exclude the two divergent alleles (δ-Ama40 and δ-Ama50), the percentage of
identity ranges from 85% to 100% (See Additional file 1:
Figure S1A), the average identities between the two CL
Page 3 of 11
Brener β-amastins range from 25% (between the two copies belonging to the Esmeraldo-like haplotype) and 18%
(between the two non-Esmeraldo β-amastins). Analyses of
additional sequences corresponding to δ-amastins, which
were obtained from the individual reads generated during
the CL Brener genome sequencing (see next paragraph),
also show a sequence variability ranging from 85 to 100%
when compared to the previously described δ-amastins.
Besides the low homology found between β- and δ-amastins, low sequence identity is also found between δ-Ama40
and δ-Ama50 with the other members of the δ-amastin
sub-family. On the other hand, sequence identities between
members of the β-amastins or between members of the δamastin sub-families range from 83% up to 99% even when
we compare amastins from two phylogenetically distant
strains such as CL Brener and Sylvio X-10 (Additional file
1: Figure S1A).
In spite of the sequence divergence, an alignment of
polypeptide sequences belonging to all amastin subfamilies shows increased amino acid conservation within
the putative hydrophobic transmembrane domains.
Within the predicted extracellular domains, two highly
conserved cysteine and one tryptophan residues, that are
part of the 10 amino acid “amastin signature” [8], may
be critical for amastin function (Additional file 1: Figure
S1B). On the other hand, the more variable sequences
present in the two predicted extracellular, hydrophilic
domains suggest that this portion of the protein, which, in
Figure 1 Phylogenetic analyses of amastin sequences from different T. cruzi strains. Amastin amino acid sequences from CL Brener,
Esmeraldo and Sylvio X-10 strains were used to generate a tree rooted with an α-amastin sequence from Crithidia sp. Bootstrap values followed
by branch length are shown in the major basal nodes.
Kangussu-Marcolino et al. BMC Microbiology 2013, 13:10
http://www.biomedcentral.com/1471-2180/13/10
amastigotes, are in contact with the host cell cytoplasm,
may interact with distinct host cell proteins.
Because the assembly of CL Brener genome does not
include its complete sequence, we conducted a read-based
analysis to estimate the total number of amastin genes in
this strain of the parasite. It is well known that the assembly of the CL Brener genome is only accurate for nonrepetitive regions, and for tandemly repeated genes,
misassembles frequently occurred since most repetitive
copies usually collapse into one or two copies. Therefore,
we used the entire dataset of reads generated by the
Tri-Tryp consortium to select reads containing sequences
homologous to amastin and, based on a 13 × genome
coverage [13], we estimated a total number of 14 copies of
amastin genes, 2 β-amastins and 12 δ-amastins in the CL
Brener genome. Similar analyses performed with sequencing reads generated by Franzen et al. (2011) [14] from
the genome of Sylvio X-10 indicated a comparable number of copies in the genome of this T. cruzi I strain.
In the current assembly of the CL Brener genome,
amastin genes are shown to be organized in three loci on
chromosomes 26, 32 and 34. Forty one pairs of homologous chromosomes (corresponding to the Esmeraldo-like
and non-Esmeraldo haplotypes) have been assembled
using the majority of the contigs and scaffolds generated
by the Tri-Tryp consortium and inferences from synteny
maps with the fully assembled T. brucei genome [15].
Based on the chromosome assemblies described by
Weatherley et al. [15], three copies of δ-amastins are presented on chromosome 34 as a tandem array with alternating copies of tuzin genes. Interestingly, the divergent
copy of δ-amastin (which has the Esmeraldo-like
δ-Ama40 allele and the non-Esmeraldo allele δ-Ama50) is
found as a single sequence linked to one tuzin pseudogene
on chromosome 26. In a third chromosome, two copies of
β-amastins are linked together without the association
with tuzin genes. This gene organization is consistent with
the analyses described by Jackson (2010) [9], who found
tuzin genes associated only with δ-amastins. In order to
confirm the proposed genomic organization in CL Brener
genome and also to verify whether similar pattern of distribution of amastin genes occurs in other T. cruzi strains,
we performed Southern blot hybridizations with chromosomal bands from CL Brener (a strain belonging to T.
cruzi VI) as well as from G, Sylvio X-10 and Dm28c
strains (all of them belonging to T. cruzi I) and Y strain
(a T. cruzi II strain) separated by pulsed field gel electrophoresis. As shown in Figure 2A, the presence of two copies of β-amastins in a 900 kb chromosomal band, which is
similar to the predicted size of chromosome 32 [15], has
been confirmed in all T. cruzi strains. Using a probe specific for the δ-Ama40, we detected a chromosomal band
of 800 kb, similar to the size of chromosome 26 in all
strains except for the SylvioX-10, where we detected two
Page 4 of 11
bands of similar sizes (Figure 2B). Since significant differences in sizes of homologous chromosomal bands in
T. cruzi have been frequently described [16], it is possible
that the two bands detected in SylvioX-10 correspond
to size variation of chromosome 26 from this strain.
Compared to β-amastins, the pattern of distribution of
δ-amastins appears to be much more complex and variable: similar to CL Brener, in Dm28c and G strains, a probe
specific for δ-amastin sub-family, which does not recognizes either β-amastins or δ-Ama40/50, hybridizes with
sequences present in three chromosomal bands with
approximately 1.1, 1.3 and 2.3 Mb (Figure 2C). In Sylvio
X-10, Colombiana and Y strains, these sequences were
found in only one or two chromosomal bands. Thus, our
analyses indicates that, in addition to β-amastins, which
are located in chromosome 32, members of the δ-amastin
sub-family are scattered among at least 3 chromosomes in
this parasite strain. Whether two of these chromosomes
correspond to allelic pairs that have significant differences
in size, still needs to be verified. This highly heterogeneous
pattern of distribution of δ-amastin sequences is also in
agreement with previous analyses described by Jackson
(2010) [9], which suggest that δ-amastin sequences are apparently highly mobile. Based on analyses of genomic position as well as the phylogeny of Leishmania amastins, it
was proposed that independent movements of δ-amastins
genes occurred in the genomes of different Leishmania
species. Also consistent with these previous analyses,
when blots containing chromosomal bands were probed
with a sequence encoding one of the tuzin genes, a pattern
of hybridization similar to the pattern obtained with the
δ-amastin probes was observed (Figure 2D). Thus, for
most T. cruzi strains, our results are consistent with the
existence of more than one cluster containing linked copies of δ-amastins and tuzin genes and an additional locus
with two β-amastins linked together. However, a complete
description of genomic organization of amastin genes
could not be attained based solely on PFGE analyses and
gene copy number estimations. Further analyses based on
sequencing data generated from large inserts previously
mapped on specific T. cruzi chromosomes are warranted
to solve this question.
Distinct patterns of amastin gene expression
Because analyses of amastin gene expression have been
limited to members of the δ sub-family and these studies
have not been conducted with different strains of the
parasite, we decided to evaluate by northern blotting the
expression profiles of members of the δ- and β-amastin
sub-families. We also decided to compare the expression
levels of different amastin genes in parasite strains representative of T. cruzi I (Sylvio X-10 and G), T. cruzi II (Y)
and in CL Brener (a T. cruzi VI strain). As shown in
Figure 3, the levels of amastin transcripts derived from
Kangussu-Marcolino et al. BMC Microbiology 2013, 13:10
http://www.biomedcentral.com/1471-2180/13/10
Page 5 of 11
Figure 2 Genomic localization of amastin genes in different T. cruzi strains. Chromosomal bands from different T. cruzi strains, separated by
Pulsed Field Gel Electrophoresis (PFGE) and transferred to membranes, were hybridized with 32P-labelled probes corresponding to β2-amastin (A),
δ-Ama40 (B), δ-amastin (C) and tuzin genes (D). T. cruzi strains or clones are SylvioX-10 (Sylvio), Colombiana (Col.), G and Dm28c, Y and CL Brener
(CLBr). Sizes of yeast chromosomal bands (Sc) are indicated on the left.
CL Br
E T A
CL Br
E T A
δ-amastin
CL Br
Y
G
Sylvio
E T A
E T A
E A
E T A
Sylvio
CL Br
β1-amastin
Y
E T A
G
E T A
E A
E T A
δ-amastin (Ama 40)
Y
G
Sylvio
E T A
E T A
E A
β2-amastin
Y
G
Sylvio
E T A
E T A
E A
Figure 3 Amastin mRNA expression during the T. cruzi life cycle in different parasite strains. Total RNA was extracted from epimatigote
(E), trypomastigote (T) and amastigote forms (A) from CL Brener, Y, G and Sylvio X-10. Electrophoresed RNAs (~10 μg/lane) were transferred to
nylon membranes and probed with the 32P- labelled sequences corresponding to δ-amastin, δ-Ama40, β1- and β2-amastins (top panels). Bottom
panels show hybridization of the same membranes with a fragment of the 24Sα rRNA.
Kangussu-Marcolino et al. BMC Microbiology 2013, 13:10
http://www.biomedcentral.com/1471-2180/13/10
δ- and β- sub-families are differentially modulated
throughout the T. cruzi life cycle. Most importantly,
clear differences in expression levels were found when
different T. cruzi strains are compared: whereas in CL
Brener , Y and Sylvio X-10 strains, transcripts of δamastins are up-regulated in amastigotes, as previously
described in the initial characterization of amastins
performed with the Tulahuen strain (also a T. cruzi VI
strains) [6], the same was not observed with the G
strain. Even though it presents a more divergent
sequence and is transcribed from a different locus in the
genome, the expression of δ-Ama40, similar to other δamastins, is also up-regulated in amastigotes in all
strains analysed except in the G strain. In contrast, in all
parasite strains, the expression of β1- and β2-amastin
transcripts is up-regulated in epimastigotes. Similar to
β2-amastin from CL Brener, two distinct δ-Ama40 transcripts with different sizes were detected in Y and G
strains. It can be speculated that transcripts showing
different sizes derived from δ-Ama40 and β2-amastin
genes may result from alternative mRNA processing
events. Recent reports on RNA-seq analyses indicated
that alternative trans-splicing and poly-adenylation as a
means of regulating gene expression and creating
protein diversity frequently occur in T. brucei [17].
Current analyses of RNA-seq data will help elucidating
mechanism responsible for the size variations observed
for this sub-set of β- and δ-amastins. Moreover, the
striking difference in the expression of δ-amastins
observed in the G strain is also currently being investigated. Because G strain has been largely characterized as
a low virulence strain [18], we speculated that members
of the δ-amastins sub-family may constitute virulence
factors that contributed to the infection capacity and
parasite survival in the mammalian host. This hypothesis
has been recently verified by experiments in which we
over-expressed one δ-amastin gene in the G strain and
showed that the transfected parasites have accelerated
amastigote differentiation into trypomastigotes in
in vitro infections as well as parasite dissemination in
tissues after infection in mice [19]. It is also noteworthy
that both β-amastins exhibited increased levels in
epimastigotes of all strains analysed, indicating that this
amastin isoform may be involved with parasite adaptation to the insect vector. These results are consistent
with previous reports describing microarray and qRTPCR analyses of the steady-state T. cruzi transcriptome,
in which higher levels of β-amastins were detected in
epimastigotes compared to amastigotes and trypomastigote forms [20]. Similar findings were also described for
one Leishmania infantum amastin gene (LinJ34.0730),
whose transcript was detected in higher levels in
promastigotes after five days in contrast to all other
amastin genes that showed higher expression levels in
Page 6 of 11
amastigotes [8]. The generation of knock-out parasites
with the β-amastin locus deleted and pull-down assays
to investigate protein interactions between the distinct
T. cruzi amastins and host cell proteins will help elucidate the function of these proteins.
Also, to investigate the mechanisms controlling the expression of the different sub-classes of amastins, sequence
alignment of the 3’UTR sequences from β- and δamastins were done. Previous work has identified regulatory elements in the 3’ UTR of δ-amastins as well as in
other T. cruzi genes controlling mRNA stability [46,21,22] and mRNA translation [23]. Since we observed
that the two groups of amastin genes have highly divergent sequences in their 3’UTR (not shown), we are preparing luciferase reporter constructs to identify regulatory
elements that might be present in the β-amastin transcripts as well as to identify the factors responsible for the
differences observed in the amastin gene expression in
distinct T. cruzi strains.
Amastin cellular localization
In our initial studies describing a member of the δ-amastin
sub-family, we showed that this glycoprotein localizes in
the plasma membrane of intracellular amastigotes [3]. Here
we examine the cellular localizations of other members of
the amastin family by transfecting epimastigotes of the CL
Brener strain with the pTREXnGFP vector [24] containing sequences of two δ-amastins as well as β1- and β2amastins in fusion with GFP. Using GFP fusion protein we
were able to examine the cellular localization of each individual member of the family. Also, since several attempts of
expressing the recombinant form of the full length proteins
have been largely unsuccessful, it was not possible to
generate specific antibodies that could be used to detect
unambiguously each member of the distinct amastin subfamilies. Confocal images of stably transfected epimastigotes, shown on Figure 4, demonstrated that, whereas GFP
is expressed as a soluble protein present throughout the
parasite cytoplasm, (Figure 4A-C) GFP fusions of β1- and
δ-amastins are clearly located at the cell surface
(Figure 4D-J). Interestingly, a distinct cellular localization,
with a punctuated pattern in the parasite cytoplasm of GFP
fusion of δ-Ama40 as well as a more disperse distribution
within the cytoplasm of the β2- amastin GFP fusion, in
addition to their surface localization was observed
(Figure 4G-I and M-O) Although all amastin sequences
present a N-terminal signal peptide domain, the δ-Ama40
and δ-Ama50 have a C-terminal peptide that is not present
in other members of the amastin family (Additional file 2:
Figure S2). In spite of these differences, all amastin
sequences showed a cellular localization pattern that is consistent with the topology predicted for Leishmania amastins
as transmembrane proteins [8], as well as with our in silico
analyses which confirm the presence of four hydrophobic
Kangussu-Marcolino et al. BMC Microbiology 2013, 13:10
http://www.biomedcentral.com/1471-2180/13/10
Page 7 of 11
Figure 4 Subcellular localization of distinct amastins in fusion with GFP. Images from stable transfected epimastigotes of the CL Brener or G
strains obtained by confocal microscopy using 1000x magnification and 2.2 digital zoom. In panels (A-C), parasites transfected with a vector
containing only GFP; (D-F), parasites transfected with δ-amastinGFP; (G-I), parasites transfected with δ-Ama40GFP; (J-L), parasites transfected β1amastinGFP; (M-O), parasites transfected with β2-amastinGFP. DAPI staining are shown in panels (A, D, G, J and M); GFP fluorescence in panels
(B, E, H, K and N) and merged images in panels (C, F, I, L and O). (Bar = 10 μm).
Kangussu-Marcolino et al. BMC Microbiology 2013, 13:10
http://www.biomedcentral.com/1471-2180/13/10
regions, a hallmark for all amastin sequences (Additional
file 1: Figure S1B). To further examine their cellular
localization, particularly for the δ-Ama40:GFP fusion,
which may be associated with intracellular vesicles, we performed co-localization analysis with the glycosomal protein
phosphoenolpyruvatecarboxykinase (PEPCK) in immunofluorescence assays. As shown by confocal images presented on Additional file 3: Figure S3, the GFP fusion
protein does not co-localize with anti-PEPCK antibodies,
indicating that the vesicles containing δ-Ama40 are not
associated with glycosomal components. Finally, we also
performed immunoblot analyses of sub-cellular fractions of
the parasite and compared the presence of GFP-fusions in
enriched membrane and soluble fractions of transfected
epimastigotes (Figure 5). In agreement with the confocal
analyses, the immunoblot results show that all four amastins that were expressed as GFP fusion proteins are presented in membrane enriched fractions.
kDa T M C
45
δ-ama
Page 8 of 11
Conclusions
Taken together, the results present here provided further
information on the amastin sequence diversity, mRNA
expression and cellular localization, which may help elucidating the function of this highly regulated family of T.
cruzi surface proteins. Our analyses showed that the
number of members of this gene family is larger than
what has been predicted from the analysis of the T. cruzi
genome and actually includes members of two distinct
amastin sub-families. Although most T. cruzi amastins
have a similar surface localization, as initially described,
not all amastins genes have their expression up-regulated
in amastigotes: although we confirmed that transcript
levels of δ-amastins are up-regulated in amastigotes from
different T. cruzi strains, β-amastin transcripts are more
abundant in epimastigotes than in amastigotes or trypomastigotes. Together with the results showing that, in the
G strain, which is known to have lower infection capacity,
expression of δ-amastin is down-regulated, the additional
data on amastin gene expression presented here indicated
that, besides a role in the intracellular, amastigote stage, T.
cruzi amastins may also serve important functions in the
insect stage of this parasite. Hence, based on this more
detailed study on T. cruzi amastins, we should be able to
test several hypotheses regarding their functions using a
combination of protein interaction assays and parasite
genetic manipulation.
Methods
35
δ-ama40
45
β1-ama
45
30
β2-ama
GFP
Figure 5 Distribution of amastin proteins in the parasite
membrane fractions. Immunoblot of total (T), membrane (M) and
cytoplasmic (C) fractions of epimastigotes expressing δ-Ama, δAma40, β1- and β2-amastins in fusion with GFP. All membranes
were incubated with α-GFP antibodies.
Sequence analyses
Amastin sequences were obtained from the genome
databases of T. cruzi CL Brener, Esmeraldo and Sylvio
X-10 strains [25,26]. The sequences, listed in Additional
file 4: Table S1, were named according to the genome
annotation of CL Brener or the contig or scaffold ID for
the Sylvio X10/1 and. All coding sequences were translated and aligned using ClustalW [27]. Amino acid
sequences from CL Brener, Esmeraldo, Sylvio X-10,
and Crithidia sp (ATCC 30255) were subjected to
maximum-likelihood tree building using the SeaView
version 4.4 [28] and the phylogenetic tree was built
using an α-amastin from Crithidia sp as root. Weblogo
3.2 was used to display the levels of sequence conservation throughout the protein [29]. Amino acid sequences
from one amastin from each sub-family were used to
predict trans membrane domains, using SOSUI [30] as
well as signal peptide, using SignalP 3.0 [31]. For copy
number estimations, individual reads from the genome
sequence of T. cruzi CL Brener [13] were aligned by reciprocal BLAST against each amastin coding sequences.
Unique reads showing at least 99.7% of identity were
mapped on the CDS and the coverage for each nucleotide was determined. Coverage values were normalized
Kangussu-Marcolino et al. BMC Microbiology 2013, 13:10
http://www.biomedcentral.com/1471-2180/13/10
Page 9 of 11
through z-score and the copy numbers were determined
after determining the ratios between z-score and the
whole genome coverage.
Parasite culture
T. cruzi strains or clones, obtained from different sources,
were classified according to the nomenclature and genotyping protocols described by [32]. Epimastigote forms of
T. cruzi strains or clones Colombiana, G, Sylvio X-10,
Dm28c, Y and CL Brener were maintained at 28°C in liver
infusion tryptose (LIT) medium supplemented with 10%
fetal calf serum (FCS) as previously described [3]. Tissue
culture derived trypomastigotes and amastigotes were
obtained after infection of LLC-MK2 or L6 cells with
metacyclic trypomastigotes generated in LIT medium as
previously described [3].
Pulse-field gel electrophoresis and Southern blot analyses
Genomic DNA, extracted from 107epimastigotes and
included in agarose blocks were separated as chromosomal bands by pulse-field gel electrophoresis (PFGE)
using the Gene Navigator System (Pharmacia) as described by Cano et al. (1995) [33], with the following
modifications: separation was done in 0.8% agarose gels
using a program with 5 phases of homogeneous pulses
(north/south, east/west) with interpolation for 135 h at
83 V. Phase 1 had pulse time of 90 s (run time 30 h);
phase 2 120 s (30 h); phase 3200 s (24 h); phase 4 350 s
(25 h); phase 5 800 s (26 h). Chromosomes from Saccharomyces cerevisiae (Bio-Rad) were used as molecular
mass standards. Separated chromosomes were transferred to nylon filters and hybridized with 32P labelled
probes prepared as described in the following section.
RNA purification and Northern blot assays
Total RNA was isolated from approximately 5 × 108
epimastigote, trypomastigote and amastigote forms
using the RNeasyW kit (Qiagen) following manufacturer’s
recommendations. RNA samples (15 μg/lane) were separated by denaturing agarose gel electrophoresis, transferred to Hybond-N+ membranes and hybridized with
the 32P labeled fragments corresponding to each T. cruzi
amastin sequence as described [3]. The probes used
were PCR amplified fragments from total genomic DNA
extracted from the CL Brener strain using primers
described in Table 1, in addition to a PCR fragment generated by amplification of the insert cloned in plasmid
TcA21 (corresponding to δ-amastin) and the 24Sα ribosomal RNA[6]. DNA fragments were labeled using the
Megaprime DNA-labeling kit (GE HealthCare) according
to the manufacturer’s protocol. All membranes were
hybridized in a 50% formamide buffer for 18 h at 42°C
and washed twice with 2X SSC/0.1% SDS at 42°C for 30
min each, as previously described [3]. The membranes
were exposed to X-ray films (Kodak) or revealed using the
STORM840 PhosphoImager (GE HealthCare).
Plasmid constructions
To express different amastin genes in fusion with GFP we
initially constructed a plasmid named pTREXAmastinGFP.
The coding sequence of the TcA21 cDNA clone [3] (accession number U04339) was PCR-amplified using a forward
primer (5’-CATCTAGAAAGCAATGAGCAAAC-3’) and a
reverse primer (5’-CTGGATCCCTAGCATACGCAGAA
GCAC-3’) containing the XbaI and BamHI restriction sites
(underlined in the primers), respectively. After digesting the
PCR product with XbaI and BamHI, the fragment was
ligated with the vector fragment of pTREX-GFP [24] that
was previously cleaved with BamHI and XhoI. To generate
the GFP constructions with other amastin genes, their corresponding ORFs were PCR-amplified using the primers
listed in Table 1 and total genomic DNA that was purified
from epimastigote cultures of T. cruzi CL Brener according
to previously described protocols [3]. The PCR products
were cloned initially into pTZ (Qiagen) and the amastin
sequences, digested with the indicated enzymes, were purified from agarose gels with Illustra GFXTM PCR DNA and
Table 1 Sequence of primers used to amplify amastin isoforms ORFs.
Primer name / gene ID
Primer Sequence (5’-3’)
Restriction enzyme
pδ1-amastin (F)
Tc00.1047053511071.40
TTGTTCTAGAGTAGGAAGCAATG
XbaI
pδ1-amastin (R)
Tc00.1047053511071.40
CGCTGGATCCGAACCACGTGCA
BamHI
β1-amastin (F)
Tc00.1047053509965.390
CCTAGGAGGATGTCGAAGAAGAAG
AvrII
β1-amastin (R)
Tc00.1047053509965.390
AGATCTCGAGCACAATGAGGCCCAG
BglII
β2-amastin (F)
Tc00.1047053509965.394
TCTAGATGGGCTTCGAAACGCTTGC
XbaI
β2-amastin (R)
Tc00.1047053509965.394
GGATCCCCAGTGCCAGCAAGAAGACTG
BamHI
The underlined sequences correspond to the restriction sites recognized by the restriction enzyme.
Kangussu-Marcolino et al. BMC Microbiology 2013, 13:10
http://www.biomedcentral.com/1471-2180/13/10
Gel Band Purification Kit (GE Healthcare). The fragment
corresponding to the TcA21 amastin cDNA was removed
from pTREXAmastinGFP after digestion with XbaI/BamHI
and the fragments corresponding to the other amastin
sequences were ligated in the same vector, generating
pTREXAma40GFP, pTREXAma390GFPand pTREXAma394GFP. All plasmids were purified using QIAGEN
plasmid purification kits and sequenced to confirm that the
amastin sequences were properly inserted, in frame with
the GFP sequence.
Parasite transfections and fluorescence microscopy
analyses
Epimastigotes of T. cruzi CL Brener, growing to a density
of 1 to 2 × 107 parasites/mL, were transfected as described
by DaRocha et al., 2004 [24]. After electroporation, cells
were recovered in 5 ml LIT plus 10% FCS 28°C for 24 h
and analysed by confocal microscopy using the ConfocalRadiance2100 (BioRad) system with a 63/100x NA 1.4 oil
immersion objective. To perform co-localization analyses,
transfected parasites expressing amastin-GFP fusions were
prepared for immunofluorescence assays by fixing the
cells for 20 minutes in 4% PFA-PBS at room temperature.
Parasites adhered to poly-L-lysine coverslips (Sigma) were
permeabilized with 0.1% Triton X-100-PBS for 2 minutes,
blocked with 4% BSA-PBS for 1 hour and incubated with
primary antibodies (rabbit polyclonal antibody antiphosphoenolpyruvate carboxykinase (anti-PEPCK, kindly
provided by Stenio Fragoso, Instituto Carlos Chagas,
Curitiba, Brazil) in blocking solution (5.0% non-fat dry
milk) for 1 hour followed by incubation with secondary
anti-rabbit IgG conjugated with Alexa546. Samples
were also stained with 0.1 μg / mL 4’,6-diamidino-2phenylindole dihydrochloride (DAPI, from Sigma) at room
temperature for 5 min before confocal microscopy.
Parasite membrane fractionation and western blot
analyses
Aproximately 109 epimastigotes growing at a cell density
of 2 × 107 parasites/mL were harvest, washed with saline
buffer (PBS) and ressuspended in lysis buffer (Hepes
20mM; KCl 10 mM; MgCl2 1,5 mM; sacarose 250 mM;
DTT 1 mM; PMSF 0,1 mM). After lysing cells with five
cycles of freezing in liquid nitrogen and thawing at
37°C, an aliquot corresponding to total protein (T)
extract was collected. Total cell lysate was centrifuged at
a low speed (2,000 × g) for 10 min and the supernatant
was subjected to ultracentrifugation (100,000 × g) for
one hour. The resulting supernatant was collected and
analysed as soluble, cytoplasmic fraction (C) whereas the
pellet, corresponding to the membrane fraction (M) was
ressuspended in lysis buffer. Volumes corresponding to
10 μg of total parasite protein extract (T), cytoplasmic (C)
Page 10 of 11
and membrane (M) fractions, mixed with Laemmli’s sample buffer, were loaded onto a 12% SDS–PAGE gel, transferred to Hybond-ECLTM membranes (GE HealthCare),
blocked with 5.0% non-fat dry milk and incubated
with anti-GFP antibody (Santa Cruz Biotechnology) or
anti-PEPCK antibody, followed by incubation with peroxidase conjugated anti-rabbit IgG and the ECL Plus reagent
(GE HealthCare).
Additional files
Additional file 1: Comparative sequence analysis of T. cruzi
amastins. (Figure S1A) Percentages of amino acid identities among all T.
cruzi amastin sequences present in the CL Brener and Sylvio X-10
genome databases. (Figure S1B) Conserved amino acid residues and
conserved domains among sequences corresponding to all amastin
genes present in the T. cruzi CL Brener genome are represented using
the WebLogo software. The x axis depicts the amino acid position. The
taller the letter the lesser the variability at the site. Predicted
transmembrane domains are underlined.
Additional file 2: Amino acid sequences of delta- and betaamastins. (Figure S2) Predicted amino acid sequences of one
representative member of δ-amastin, δ-ama40, β1 and β2-amastins
present in the T. cruzi CL Brener genome.
Additional file 3: Subcellular localization of δ-Ama40 fused with
GFP. (Additional file 3: Figure S3) Permeabilized, stable transfected CL
Brener epimastigotes were incubated with anti-PEPCK antibody and a
secondary antibody conjugated to Alexa546. GFP (panels A and D), Alexa
546 (B and E) and merged (C and F) fluorescent images were obtained
by confocal microscopy of parasites expressing δ-Ama40GFP as described
in Figure 4. (Bar = 10 μm).
Additional file 4: Table S1. Amastin sequences presented in Figure 1.
Competing interests
The authors declare that they have no competing interests.
Authors’ contributions
MMK-M, LL and WDR carried out the molecular genetic studies, microscopy
analyses, sequence alignments and phylogenetic analyses. RMCP and PRA
participated in molecular genetic studies. RPM-Neto and DCB participated in
the sequence and phylogenetic analyses. RAM participated in the
microscopy analyses. WDR and SMRT designed and coordinated the study
and drafted the manuscript. All authors have read and approved the final
manuscript.
Acknowledgements
This study was supported by funds from Conselho Nacional de
Desenvolvimento Científico e Tecnológico (CNPq, Brazil), Fundação de
Amparo a pesquisa do Estado de Minas Gerais (FAPEMIG, Brazil) and the
Instituto Nacional de Ciência e Tecnologia de Vacinas (INCTV, Brazil). DCB,
RAM and SMRT are recipients of CNPq fellowships; The work of WDDR,
MMKM and LL is supported by Fundação Araucária, Coordenação de
Aperfeiçoamento de Pessoal de Nível Superior, Coordenação de
Aperfeiçoamento de Pessoal de Nivel Superior (CAPES), PPSUS/MS and CNPq.
Author details
1
Departamento de Bioquímica e Biologia Molecular, Universidade Federal do
Paraná, Rua Quinze de Novembro, 1299, Centro Curitiba, PR 80060-000, Brazil.
2
Departamento de Bioquímica e Imunologia, Av. Antônio Carlos, 6627,
Pampulha Belo Horizonte, MG 31270-901, Brazil. 3Departamento de
Parasitologia Universidade Federal de Minas Gerais, Av. Antônio Carlos, 6627,
Pampulha Belo Horizonte, MG 31270-901, Brazil. 4Departamento de
Microbiologia, Imunologia e Parasitologia, Escola Paulista de Medicina,
Universidade Federal de São Paulo, Brazil, São Paulo 04021-001, Brazil.
Kangussu-Marcolino et al. BMC Microbiology 2013, 13:10
http://www.biomedcentral.com/1471-2180/13/10
Received: 1 October 2012 Accepted: 14 January 2013
Published: 17 January 2013
References
1. Brener Z: Biology of Trypanosoma cruzi. Annu Rev Microbiol 1973, 27:347–382.
2. Epting CL, Coates BM, Engman DM: Molecular mechanisms of host cell
invasion by Trypanosoma cruzi. Exp Parasitol 2010, 126:283–291.
3. Teixeira SM, Russel DG, Kirchhoff LV, Donelson JE: A differentially expressed
gene family encoding “amastin,” a surface protein of Trypanosoma cruzi
amastigotes. J Biol Chem 1994, 269:20509–20516.
4. Coughlin BC, Teixeira SM, Kirchhoff LV, Donelson JE: Amastin mRNA
abundance in Trypanosoma cruzi is controlled by a 3’-untranslated
region position-dependent cis-element and an untranslated regionbinding protein. J Biol Chem 2000, 275:12051–12060.
5. Araújo PR, Burle-Caldas GA, Silva-Pereira RA, Bartholomeu DC, Darocha WD,
Teixeira SM: Development of a dual reporter system to identify
regulatory cis-acting elements in untranslated regions of Trypanosoma
cruzi mRNAs. Parasitol Int 2011, 60:161–169.
6. Teixeira SM, Kirchhoff LV, Donelson JE: Post-transcriptional elements
regulating expression of mRNAs from the amastin/tuzin gene cluster of
Trypanosoma cruzi. J Biol Chem 1995, 270:22586–22594.
7. Wu Y, El Fakhry Y, Sereno D, Tamar S, Papadopoulou B: A new
developmentally regulated gene family in Leishmania amastigotes
encoding a homolog of amastin surface proteins. Mol Biochem Parasitol
2000, 110:345–357.
8. Rochette A, Mcnicoll F, Girard J, Breton M, Leblanc E, Bergeron MG,
Papadopoulou B: Characterization and developmental gene regulation of
a large gene family encoding amastin surface proteins in Leishmania
spp. Mol Biochem Parasitol 2005, 140:205–220.
9. Jackson AP: The evolution of amastin surface glycoproteins in
Trypanosomatid parasites. Mol Biol Evol 2010, 27:33–45.
10. Cerqueira GC, Bartholomeu DC, Darocha WD, Hou L, Freitas-Silva DM,
Machado CR, El-Sayed NM, Teixeira SM: Sequence diversity and
evolution of multigene families in Trypanosoma cruzi. Mol Biochem
Parasitol 2008, 157:65–72.
11. Rafati S, Hassani N, Taslimi Y, Movassagh H, Rochette A, Papadopoulou
B: Amastin peptide-binding antibodies as biomarkers of active
human visceral Leishmaniasis. Clin Vaccine Immunol 2006,
13:1104–1110.
12. Stober CB, Langue UG, Roberts MT, Gilmartin B, Francis R, Almeida R,
Peacock CS, McCann S, Blackwell JM: From genome to vaccines for
Leishmaniasis: screening 100 novel vaccine candidates against
murine Leishmania major infection. Vaccine 2006, 24:2602–2616.
13. El-Sayed NM, Myler PJ, Bartholomeu DC, Nilsson D, Aggarwal G, Tran
AN, Ghedin E, Worthey EA, Delcher AL, Blandin G, Westenberger SJ,
Caler E, Cerqueira GC, Branche C, Haas B, Anupama A, Arner E, Aslund
L, Attipoe P, Bontempi E, Bringaud F, Burton P, Cadag E, Campbell DA,
Carrington M, Crabtree J, Darban H, da Silveira JF, de Jong P, Edwards
K, et al: The genome sequence of Trypanosoma cruzi, etiologic agent
of Chagas disease. Science 2005, 309:409–415.
14. Franzén O, Ochaya S, Sherwood E, Lewis MD, Llewellyn MS, Miles MA,
Andersson B: Shotgun sequencing analysis of Trypanosoma cruzi I
Sylvio X10/1 and comparison with T cruzi VI CL Brener. PLoS Negl
Trop Dis 2011, 5:984–993.
15. Weatherly DB, Boehlke C, Tarleton RL: Chromosome level assembly of
the hybrid Trypanosoma cruzi genome. BMC Genomics 2009,
10:255–268.
16. Souza RT, Lima FM, Barros RM, Cortez DR, Santos MF, Cordero EM,
Ruiz JC, Goldenberg S, Teixeira MMG, Silveira JF: Genome Size.
Karyotype Polymorphism and Chromosomal Evolution in
Trypanosoma cruzi. PLoS One 2011, 6:e23042.
17. Nilsson D, Gunasekera K, Mani J, Osteras M, Farinelli L, Baerlocher L,
Roditi I, Ochsenreiter T: Spliced leader trapping reveals widespread
alternative splicing patterns in the highly dynamic transcriptome of
Trypanosoma brucei. PLoS Pathog 2010, 6(8):e1001037.
18. Yoshida N: Molecular basis of mammalian cell invasion by
Trypanosoma cruzi. An Acad Bras Cienc 2006, 78:87–111.
19. Cruz MC, Souza-Melo N, Vieira-da-Silva C, DaRocha WD, Bahia D,
Araújo PR, Teixeira SMR, Mortara RA: Trypanosoma cruzi: role of deltaamastin on extracellular amastigote cell invasion and differentiation.
PLoS One 2012, 7:e51804.
Page 11 of 11
20. Minning TA, Weatherly DB, Atwood J, Orlando R, Tarleton RL:
The steady-state transcriptome of the four major life-cycle stages of
Trypanosoma cruzi. BMC Genomics 2009, 10:370–385.
21. Araújo PR, Teixeira SM: Regulatory elements involved in the posttranscriptional control of stage-specific gene expression in
Trypanosoma cruzi - A Review. Mem Inst Oswaldo Cruz 2011,
106:257–267.
22. Li ZH, De Gaudenzi JG, Alvarez VE, Mendiondo N, Wang H, Kissinger JC,
Frasch AC, Docampo R: A 43-nucleotide U-rich element in 3’untranslated region of large number of Trypanosoma cruzi transcripts
is important for mRNA abundance in intracellular amastigotes. J Biol
Chem 2012, 287:19058–19069.
23. McNicoll F, Müller M, Cloutier S, Boilard N, Rochette A, Dubé M,
Papadopoulou B: Distinct 3’-untranslated region elements regulate
stage-specific mRNA accumulation and translation in Leishmania.
J Biol Chem 2005, 280:35238–35246.
24. Darocha WD, Silva RA, Bartholomeu DC, Pires SF, Freitas JM, Macedo
AM, Vazquez MP, Levin MJ, Teixeira SM: Expression of exogenous
genes in Trypanosoma cruzi: improving vectors and electroporation
protocols. Parasitol Res 2004, 92:113–120.
25. TriTryp DB: Kinetoplastid genomic resources Database. [http://triTrypdb.
org/common/downloads/release-4.1/Tcruzi/fasta/TriTrypDB].
26. Aslett M, Aurrecoechea C, Berriman M, Brestelli J, Brunk BP, Carrington
M, Depledge DP, Fischer S, Gajria B, Gao X, Gardner MJ, Gingle A, Grant
G, Harb OS, Heiges M, Hertz-Fowler C, Houston R, Innamorato F, Iodice
J, Kissinger JC, Kraemer E, Li W, Logan FJ, Miller JA, Mitra S, Myler PJ,
Nayak V, Pennington C, Phan I, Pinney DF, et al: TriTrypDB: a
functional genomic resource for the Trypanosomatidae. Nucleic Acids
Res 2010, 38:457–462.
27. Larkin MA, Blackshields G, Brown NP, Chenna R, McGettigan PA,
McWilliam H, Valentin F, Wallace IM, Wilm A, Lopez R, Thompson JD,
Gibson TJ, Higgins DG: Clustal W and Clustal X version 20.
Bioinformatics 2007, 23:2947–2948.
28. Gouy M, Guindon S, Gascuel O: SeaView version 4: A multiplatform
graphical user interface for sequence alignment and phylogenetic
tree building. Mol Biol Evol 2010, 27:221–224.
29. Crooks GE, Hon G, Chandonia JM, Brenner SE: WebLogo: a sequence
logo generator. Genome Res 2004, 14:1188–1190.
30. Hirokawa T, Boon-Chieng S, Mitaku S: SOSUI: classification and
secondary structure prediction system for membrane proteins.
Bioinformatics 1998, 14:378–379.
31. Bendtsen JD, Nielsen H, von Heijne G, Brunak S: Improved prediction
of signal peptides: SignalP 30. J Mol Biol 2004, 340:783–795.
32. Zingales B, Andrade SG, Briones MR, Campbell DA, Chiari E,
Fernandes O, Guhl F, Lages-Silva E, Macedo AM, Machado CR, Miles
MA, Romanha AJ, Sturm NR, Tibayrenc M, Schijman AG: A new
consensus for Trypanosoma cruzi intraspecific nomenclature: second
revision meeting recommends TcI to TcVI. Mem Inst Oswaldo Cruz
2009, 104:1051–1054.
33. Cano MI, Gruber A, Vazquez M, Cortés A, Levin MJ, Gonzalez A, Degrave W,
Rondinelli E, Ramirez JL, Alonso C, Requena JM, Franco Da Silveira J:
Molecular karyotype of clone CL Brener chosen for the Trypanosoma
cruzi genome project. Mol Biochem Parasitol 1995, 7:273–278.
doi:10.1186/1471-2180-13-10
Cite this article as: Kangussu-Marcolino et al.: Distinct genomic
organization, mRNA expression and cellular localization of members of
two amastin sub-families present in Trypanosoma cruzi. BMC
Microbiology 2013 13:10.
Genome sequence of a highly attenuate clone of Trypanosoma cruzi identifies
SAPA repeats as a major virulence factor in this human parasite
Rondon Pessoa Mendonça-Netoa, Caroline Junqueirab, Daniella C. Bartolomeu c,
Wanderson D. daRochad, Monica Kangussu-Marcolino d, Gabriela F.R. Luizc, Viviane
Santosa, Luiz Gonzaga Paula de Almeidae, Edmundo Grisardf, Ana Tereza
Vasconcelose, Sergio Schenkmanf, Nilmar S. Morettif, Ricardo Tostes Gazzinellia,b,, and
Santuza M.R. Teixeiraa*
a
Departamento de Bioquímica e Imunologia, Universidade Federal de Minas Gerais,
Belo Horizonte, Minas Gerais, Brazil;
b
Centro de Pesquisas René Rachou, Fundação Oswaldo Cruz, 30190-002 Belo
Horizonte, Minas Gerais, Brazil;
c
Departamento de Parasitologia, Universidade Federal de Minas Gerais, Belo Horizonte,
Minas Gerais, Brazil;
d
Departamento de Bioquímica e Biologia Molecular, Universidade Federal do Paraná,
Curitiba, PR, Brazil;
e
Laboratório Nacional da Ciência da Computação, Patrópolis, RJ, Brazil;
Laboratório de Protozoologia e Bioinformatica, Universidade Federal de Santa Catarina,
Florianópolis, SC, Brazil;
138 Departamento de Microbiologia, Imunologia e Parasitologia, Universidade Federal de
São Paulo, São Paulo, SP, Brazil
*
Corresponding author: Santuza M.R. Teixeira
Departamento de Bioquimica e Imunologia, ICB, Universidade Federal de Minas Gerais
Av. Antônio Carlos 6627, 31270-901, Belo Horizonte, MG, Brasil
Tel : +55(31) 3409-2665; FAX: 55(31) 3409 2614
E-mail: [email protected]
139 Abstract
Trypanosoma cruzi, the etiologic agent of Chagas disease, belongs to a group of
organisms with a peculiar genome in which a massive expansion of surface protein gene
families is present and a large proportion of it is devoted to repetitive sequences. The
completion of the CL Brener reference strain genome reveals several new features
related to the parasite virulence. CL-14 is an avirulent clone derived from the same T.
cruzi CL strain, however, in contrast to CL Brener, CL-14 is neither infective nor
pathogenic in vivo, even when infecting newborn or immune deficient mice.
To
investigate the molecular determinants of T. cruzi virulence, we performed a direct
comparison of the CL Brener and CL-14 genomes, based on the available CL Brener
sequences and sequences we generated from CL-14 using the 454 FLX platform.
Although both genomes were not fully assembled, we found that they have highly
similar nuclear genome organization, almost 100% identical mitochondrial maxi-circle
kDNA, similar numbers of predicted coding sequences as well as number of copies of
members of multi-gene families. PCR analyses as well as phylogenetic inferences
showed that CL-14 is also a hybrid strain that belongs to the same DTU as CL Brener
(TcVI). Southern blot analyses indicate a similar karyotype and, for most multigenic
families, sequence identity among the two clones is higher than 99%. The only major
difference detected between these two genomes is related to a sub-group of the large
Trans-Sialidase gene family (TcTS), known to have a C-terminal domain with 12amino-acid repeats called „shed acute phase antigen‟ or SAPA repeats. At least three
copies of TcTS containing a repetitive domain varying from 19 to 41 repeats, which are
highly immunogenic and promote an increase in the half-life of sheded TcTS protein,
are present in the CL Brener genome, whereas in CL-14, only one copy containing 3
140 SAPA repeats was identified. This reduced amount of SAPA repeats in the CL-14
TcTS, confirmed by southern and western blot analyses, may constitute one of the
factors responsible for the differences in virulence between these two strains.
Key words: Trypanosoma cruzi, genome, CL-14, trans-sialidase, virulence
141 Introduction
Trypanosoma cruzi is the etiological agent of Chagas disease, a malady affecting
at least 8 million people throughout Latin America and for which there are only two
drugs available, both with poor efficacy and harmful side effects(WHO, 2010). T. cruzi
infection begins with metacyclic trypomastigotes that are released in the feces by the
triatominae vector, during a blood meal. After reaching the host bloodstream through
skin cuts and mucosa, trypomastigotes invade different cell types in the mammalian
host. Once in the cytoplasm, they differentiate into replicative and non-flagellate
amastigotes, which undergo several rounds of binary division, before differentiating
again into trypomastigotes and bursting the host cell. Bloodstream trypomastigotes can
be ingested by the vector during another blood meal where they differentiate into
epimastigotes and replicate in the insect gut (Brener , 1973). The T. cruzi population is highly heterogeneous, composed of a pool of strains
with distinct characteristics. This striking intra-specific variation has been extensively
documented by molecular analyses and biological characterization, which showed
distinct morphology, growth rate, curves of parasitemia, virulence, sensitivity to
drugs,antigenic profile, metacyclogenesis and tissue tropism (reviewed by Buscaglia
and DiNoia, 2003). Various studies on the genetic diversity observed among different
isolates recently converged to a classification that proposes the existence of six major
groups in the parasite population, also known as discrete typing units (DTUs) T. cruzi I
to VI (Zingales et al., 2009). These divergent lineages occupy distinct ecological
environments: T. cruziI strains are more frequently associated with the silvatic cycle
whereas T. cruzi II strains are part of the domestic cycle of Chagas disease and are more
frequently isolated from chronic chagasic patients (Buscaglia and DiNoia, 2003).
142 Although resulting from predominant clonal evolution, several evidences indicate that
genetic exchange between parasites has occurred in the past [Buscaglia and DiNoia,
2003; Gaunt et al., 2003; Freitas et al., 2006). Among the strains that are products of
hybridization events is the CL Brener, the reference strain chosen for the T. cruzi
genome project.
The complete sequence of the T.cruzi CL Brener genome, with an estimated
haploid size of 55 Mb and about 12,000 genes, revealed a highly repetitive genome (ElSayed et al., 2005) with protein coding genes organized in long, uni-directional
polycistronic transcription units. Because of its hybrid nature and repetitive content,
which prevented its complete assembly, the CL Brener genome is represented by two
datasets of contigs, each corresponding to one haplotype (El-Sayed et al, 2005,
Weatherly et al., 2009). To help identifying the sequences belonging to each haplotype,
reads from the genome of the cloned Esmeraldo strain, a member of T. cruzi II, which
represents one of the CL Brener parental strain (de Freitas et al., 2006), were generated.
Thus, in the annotation data of the CL Brener genome, the two haplotypes, which were
assembled separately, are referred as “esmeraldo-like” or “non-esmeraldo-like” (ElSayed et al., 2005, Aslettet al., 2010). Because, in trypanosomatids, chromosomes do
not condense during mitosis, karyotype analyses based on pulse field gel electrophoresis
and genome assembly based on the synteny with the Trypanosoma brucei genome
estimate the total number of CL Brener chromosomes in 30 to 41 pairs (Cano et al.,
1995, Weatherly et al., 2009).
At the same time the first T. cruzi genome was published, draft sequences of the
genomes of two other human pathogens, members of the Trypanosomatid family,
Trypanosoma brucei and Leishmania major, were also disclosed (Berrimanet al., 2005,
143 Ivens et al., 2005). Soon thereafter, other species of Leishmania and another T. brucei
sub-species had their genomes sequenced (Peacock et al., 2007;Jackson et al.,2010,
Raymond et al., 2011, Rogers et al., 2011). A draft genome sequence of Sylvio X-10, a
strain belonging to T. cruzi group I, which is the predominant agent of Chagas disease
in Central America and in the Amazon region has also been published (Franzén et al.,
2011). Although rarely isolated from humans in endemic areas in Southern countries of
Latin America where most cases of Chagas disease with mega-syndromes are found, T.
cruzi I strains are highly abundant among wild hosts and vectors (Zingales et al., 1998,
Buscaglia and DiNoia, 2003). The Sylvio X10 genome was found to be smaller and
with several gene families encoding surface molecules presenting fewer copies
compared to the CL Brener genome (Fránzen et al., 2011, Andersson, 2011).
Here we described the sequence analysis of the highly attenuated CL-14 clone,
which, similarly to the CL Brener, was derived from the CL strain of T. cruzi (Lima et
al., 1990). In contrast to CL Brener, CL-14 is neither infective nor pathogenic in vivo,
even when infecting newborn (Soares et al., 2003) or immune deficient, CD8 -/- mice
(Junqueira et al., 2011) that are otherwise highly susceptible to T. cruzi infection.
Although inoculation of CL-14 in adult animals results in no parasitaemia and
detectable tissue parasitism (Lima et al., 1995), it prevents the development of
parasitemia and mortality after challenge with the virulent CL strain (Lima et al., 1999,
Soares et al., 2003). Importantly, since vaccination with live CL-14 induces a potent
and long-lasting parasite-specific antibody and T-cell mediated immunity against
challenge with highly virulent strains of T. cruzi, the immunological adjuvant properties
of the CL-14 clone has being explored as a possible vaccine vector for induction of T
cell mediated immunity against other diseases (Junqueira et al., 2011, 2012). Aiming at
144 investigating the molecular basis of the non-virulent phenotype of the CL-14, Atayde
and co-workers (2004) found that that the expression of gp82, a stage-specific
glycoprotein involved in infection in vivo and host cell invasion in vitro, was greatly
reduced on the surface of metacyclic forms of CL-14. After performing a direct
comparison of the CL Brener and CL-14 genomes, we found that both genomes are
highly similar, with no substantial differences in genome organization, total number of
predicted coding sequences and in the number of copies among multi-gene families.
The absence of major differences at the genome level warrants for further studies
focusing on gene expression to identify changes in the mRNA population or in the
proteome that could explain the differences in virulence between the two strains.
Materials and Methods
Parasite cultures and DNA sequencing
Epimastigotes were cultured at 28oCin liver infusion tryptose (LIT) medium as
described by Camargo (1964). Three genomic libraries were constructed with total
DNA purified from epimastigotes, two of them with 5 g of total DNA and using the
shotgun method and the third one constructed using the paired end - 3 kb span - method
with 500
g of DNA. Each library was sequenced individually by high-throughput
pyrosequencing (Roche-454 FLX Titanium).
Genome assembly and sequence analyses
Whole genome assembly was carried using both ab initio methods and by
comparative genomic analyses with T. cruzi CL Brener genome, using the Newbler
assembler and Perl scripts. Gene prediction and annotation were performed using Gene145 MarkS (Besemer et al., 2001) and best reciprocal BLAST hit to CL Brener sequences.
Individual genes were identified using reciprocal BLASTp and tBLASTn on
unassembled reads.
A total of 3'457'102 individual reads totaling 1,506,882,872
nucleotides that were parsed to extract low quality sequences was submitted to the
different analyses.
Sequence alignments were created using Megablast and Clustalw. CL-14 contigs
were aligned against CL Brener coding sequences with Megablast without lowcomplexity filter to mask repetitive sequences. Those alignments were parsed using Perl
(v5.10.1) and BioPerl (v1.6) scripts, to accept only reciprocal best hits where the HSPs
should have 95% of reads lengths, and 90% of identity. Aligned sequences from CL-14
were used to perform multiple and global alignments in Clustalw with IUB score
matrix, within each gene group. From those alignments, overhangs were extracted and
the results were arranged in phenotype trees by neighbor-join algorithm by MEGA
software (v4). A search for the three known classes of immunostimulatory CpG DNA
motifs (26) using the fuzznuc algorithm (EMBOSS package) was performed using the
individual reads as described previously (Bartholomeu et al., 2009).
Phylogenetic analyses and PCR amplifications
In silico PCR analyses were performed with individual reads and then confirmed
by PCR amplifications using total DNA from CL-14 as template and gel electrophoresis
of PCR products. Two nuclear markers, mini-exon or Spliced Leader (SL) (Burgos et
al., 2007) and ribosomal subunit 24S
(Souto et al., 1996) and one mitochondrial
marker, cytochrome oxidase subunit II (COII) were used to determine the classification
of CL-14 as described by Freitas et al., (2006). The e-PCR software (Schuler, 1997),
allowing up to 2 mismatches and 2 gaps, was used to search for primer sequences F146 AAGGTGCGTCGACAGTGTGG
and
R-TTTTCAGAATGGCCGAACAGT
corresponding to the ribosomal subunit 24S
CGTACCAATATAGTACAGAAACTG
and
and the primer sequences FR-CTCCCCAGTGTGGCCTGGG
corresponding to miniexon genes. Analysis of the COII sequences was performed by in
silico
PCR
using
F-CCATATATTGTTGCATTATT
and
R-
TTGTAATAGGAGTCATGTTT followed by in silico digestion of PCR products with
AluI restriction enzyme. PCR amplifications were confirmed with DNA extracted from
epimastigote cultures of strains representatives of T. cruzi groups I-VI and two
biological samples of CL-14. PCR reactions were
performed with 0.75 U of
GoTaqDNA polymerase (Promega) and buffer containing 1.5 mM MgSO4, 40
M
dNTPsand 10 pM of each primer. PCR products obtained with the COII primers were
digested with AluI, and all products subjected to electrophoresis on 6% polyacrylamide
gel followed by silver staining.
Pulse-field gel electrophoresis and Southern blot analyses
Epimastigotes were included in agarose blocks as described by [27]. Pulse-field
gel electrophoresis (PFGE) was carried out as reported by [28]. Chromosomes from
Hansenula wingei (Bio-Rad) were used as molecular mass standards. Separated
chromosomes were transferred to nylon filters and hybridized with 32P labeled probes as
described previously (Teixeira et al., 1995).
147 Results and Discussion
Genome sequencing and comparative karyotyping
Using the 454 technology and whole genome shot-gun sequencing we generated
a total of 1,507 Mb of sequences derived from 3,457,102 reads from three CL-14
genomic libraries and performed a comparative analysis with the genome sequences of
T. cruzi CL-Brener (Table 1). Based on a haploid nuclear genome size estimated in 55
Mb (Souza et al., 2011), the total nucleotide sequenced corresponds to 27 x coverage of
the CL-14 genome. A similar genome size was estimated for the CL Brener clone (ElSayed et al., 2005) and a comparison of chromosomal bands separated by pulse field gel
electrophoresis analysis showed a similar pattern between CL-14 and CL Brener (Fig
1A). The 60Mb haploid nuclear genome predicted for the CL Brener, estimation based
on the sequencing data as well as on fluorescent staining (Souza et al., 2011), is only
slightly larger than the genome size estimation for CL-14. Both genomes are however,
significantly larger than the genome of a T. cruzi I strain, Sylvio X10, which has 5.9 Mb
less of haploid sequences (Franzén et al. 2011). Most differences that account for the
reduced size of the Sylvio X10 genome are concentrated in the copy number of
members belonging to large gene families. The estimated GC content of 51 %, based on
the total reads of the CL-14 genome is also similar to the CL Brener genome but it is
higher than the GC content of the Sylvio X10 (48 %) genome.
A draft sequence assembly of the CL-14 genome results in a total of 43,906
contigs (Table 1). Such large number of contigs was expected since over 50% of this
parasite genome consists of repeated sequences which also hampered the complete
assembly of CL Brener genome, which was also based on a whole genome shot gun
sequencing strategy. The haploid CL Brener genome has an estimated number of
148 12,000 genes organized in long clusters that are polycistronic transcribed (El-Sayed et
al., 2005). Preliminary analyses of assembled CL-14 contigs indicate a similar number
of genes with a similar genomic organization. Because of the larger number of contigs,
it was not possible to make an accurate prediction of a total number of genes since a
vast number of open reading frames were found to be truncated. Moreover, as indicated
below, similar to CL Brener, CL-14 has a hybrid genome constituted by two distinct
haplotypes. Since the assembly tools did not discriminate between the two haplotypes,
we decided to conduct all further analyses described in the next sessions solely based on
sequencing data generated from the reads and not from assembled contigs. However, to
investigate the existence of changes in the karyotype or the presence of large
chromosomal rearrangements, we hybridized chromosomal bands that were separated
by PFGE with different probes. A few changes in chromosomal mapping of gp82 genes
have been reported by Atayed et al., 2004, who identified the presence of two
chromosomal bands hybridizing with a gp82 probe in the CL-14 clone which are absent
in the CL isolate. However, since the CL isolate may contain a mix population of
different clones, we decided to compare the molecular karyotype of CL-14 and the CL
Brener clone. The results shown on figure 1A and 1B indicate that CL-14 and CL
Brener chromosomes have similar patterns and also hybridize with GPI8, MASP,
Amastin and DGF-1 sequences in a similar way. Although a few differences could be
observed with the chromosomes containing MASP sequences, all 27 members of the
Amastin gene family are equally organized in the CL-14 and CL Brener genomes,
indicating that no major rearrangements are found between these two genomes.
149 Phylogenetic analysis
To determine which T. cruzi group CL-14 belongs to, we analyzed sequences
corresponding to 24S subunit of the ribosomal DNA (rDNA) and the Spliced Leader
(SL) gene clusters as well as sequences corresponding to the mitochondrial gene
cytochrome oxidase II (COII). Electronic PCR were performed using primers specific
for these sequences and the sizes of the generated amplicons using the CL-14 reads as
template were compared with the expected sizes for the corresponding amplicons
derived from genomic sequences from strains representative of all six T. cruzi DTUs.
For the cytochrome oxidase II (COII) amplicon, we compared the sizes of the products
of Alu I digestion of the amplicons. As shown in Table 1, a comparison of the
fragments resulting from the amplification of the 24S rDNA and SL markers indicated
that CL-14 must be classified as T. cruzi II, since it presents amplicons with 150 bp for
the SL and 125 bp for the 24S rDNA markers. However, PCR products corresponding
to the mitochondrial COII gene results in two amplicons of 81 and 294 bp after Alu I
digestion, which is characteristic of strains belonging to T. cruzi III, IV, V or VI. Taken
together, these results as well as the results that are described in the next section,
indicate that similar to CL Brener, CL-14 is a hybrid strain and must be classified as T.
cruzi VI. Since the two clones were isolated from the same strain and based on the fact
that the mitochondrial marker corresponds to a T. cruzi III, we hypothesize that the CL14, is derived from the same hybridization event that occurred between ancestral strains
belonging to T. cruzi II and III, which, similar to CL Brener, has retained a T. cruzi III
mitochondrion. The results obtained from the in silico analyses were confirmed by in
vitro amplification of DNA purified from epimastigote cultures of CL-14 and CL
150 Brener using primers that amplify the SL, the 24S
rDNA and COII (Supplementary
Fig 1).
In addition of the analyses of rDNA and SL genes, we performed sequence
alignments of two nuclear single copy genes, msh2 and trypanothione reductase (TR)
genes as well as one mitochondrial gene, COII. The results showed in figure 2
confirmed our prediction that CL-14 is very close phylogenetically to CL Brener and
that sequences belonging to the two distict haplotypes (esmeraldo and non-esmeraldolike) are present in the CL-14 genome. Sequence alignments between 392,310 reads
from the CL-14 genome that correspond to coding regions and coding sequences
corresponding to both CL Brener haplotypes showed that 175,612 ( 44.3%) have a best
alignment with the Esmeraldo haplotype, 185,497 reads (47.3 %) with non-Esmeraldolike haplotype. For a total of 31,201 reads (7.95%), it was not possible to distinguish
between the two haplotypes.
Mitochondrial maxicircle genome assembly
Members of the kinetoplastid family have a mitochondrial genome organized in
a peculiar organelle known as the kinetoplast DNA (kDNA). T. cruzi kDNA consists of
thousands of variable, concatenated minicircles with 0.5 – 5.0 kb and dozens of
concatenated maxicircles with approximately 20 kb, from which 15 kb corresponds to
coding region sequences (Ruvalcaba-Trejo and Sturm, 2011; Junqueira et al., 2005).
Most maxicircle genes contains open reading frame (ORF) frameshifts, which are
corrected at the RNA level by a complex process of Uridine insertions and deletions
known as
RNA editing
which depends
151 on
gRNAs
encoded
mainly
by
minicirclesequences (Hajduk et al., 1993). A total of 1724 reads was found to align with
maxi-circle sequences derived from the CL Brener. A consensus sequence generated
from assembled 14 contigs is shown in Fig 3. The assembly and annotation of the CL14 maxi-circle shows it has approximately 20.6 Mb and contains, besides the 12 S and
9S ribosomal RNA genes (rRNA), all 18 open reading frames previously identified in
the maxicircle of CL Brener and Esmeraldo strains. The assembly of CL-14 maxicircle
also showed that these three mitocondrial genomes show a high level of synteny. RNA
editing is a hallmark of genes encoded by trypanosome mitochondrial maxicircle DNA.
Open reading frame (ORF) analyses of maxicircle DNA from the three T. cruzi strains
showed that 9 genes are extensively edited and 3 genes have smaller changes in their
ORF due to insertion and deletions of uridines. Alignment analyses of CL-14 maxicircle
genes with Cl Brener sequences indicates that a similar number of genes undergoes
RNA editing and are thus depend on this pos-transcriptional modification to generate
functional mitochondrial mRNAs.
Comparative analyses of multigene families
After searching for the total 22,570 protein-coding genes predicted in the CL
Brener genome, we found that all of them are present in the CL-14 genome, thus
indicating that the gene content of both genomes is highly similar. For only three genes,
Tc00.1047053506215.10 and Tc00.1047053511215.90, annotated as hypothetical
proteins and Tc00.1047053509351.4, a ribosomal protein L38, a lower coverage (less
than 95%) from CL-14 reads were found. We thus decided to investigate whether
differences in the number of copies in the various multigene families may underlie the
phenotypic differences observed among these strains. Comparative analyses based on
152 sequence reads presenting more than 97.5% identity to several of the gene families
described in the CL Brener genome, showed in Table 2, indicated that, in addition of
having a similar group of genes, no large differences in the copy number of members of
multi-gene families are found between the two genomes.
Differences in the sequences encoding SAPA repeats of trans-sialidases
During the process of aligning the reads against the reference CL Brener genome, we
noticed one major difference regarding a sub-group of the large Trans-Sialidase gene
family (TcTS). Members of the TcTS group I are known to have a C-terminal domain
with 12-amino-acid repeats called „shed acute phase antigen‟ or SAPA repeats. At least
three copies of TcTS containing a repetitive domain varying from 19 to 41 repeats,
which are highly immunogenic and promote an increase in the half-life of sheded TcTS
protein, are present in the CL Brener genome, whereas in CL-14 only one copy
containing 3 SAPA repeats was identified. This reduced amount of SAPA repeats in the
CL-14 TcTS was confirmed by southern and western blot analyses using probes
containing SAPA sequences and anti-SAPA antibodies, respectively (Fig. 4-A and C).
We also confirmed this deletion in the sequences encoding the C-terminal SAPA
repeats by PCR amplification using primers annealing in the flanking regions of this
repeats and DNA purified from CL Brener and CL-14. As shown in Fig 4-B, whereas a
discrete band with only 500 nucleotides was detected with CL-14,
a large smear
corresponding to sequences containing different sizes of the large repetitive domain
present in CL Brener was generated after PCR. By eliciting a strong humoral response,
TcTS containing SAPA repeats are considered virulence factors involved with
mechanisms developed by the parasite to evade the host immune response. The lack of a
153 large repetitive domain in the TcTS of CL-14 may thus be one of the factors that could
explain the differences in virulence between these two strains.
Acknowledgments
This work is supported by funds from CAPES, CNPq, Fundação de Amparo a Pesquisa
do Estado de Minas Gerais- FAPEMIG (Brazil) and the Instituto Nacional de Ciencia e
Tecnologia de Vacinas (INCTV).
References
Atayde VD, Neira I, Cortez M, Ferreira D, Freymüller E, Yoshida N. Molecular basis of
non-virulence of Trypanosoma cruzi clone CL-14. Int J Parasitol. 34(7):851-60.
2004.
Berriman, M.; Ghedin, E.; Hertz-Fowler, C.; Blandin, G.; Renauld, H.;Bartholomeu, C.
C.; Lennard, N. J.; Caler E. et al. The genome of the African trypanosome
Trypanosoma brucei, Science 309, pp. 416–422, 2005.
Besemer J, Lomsadze A, Borodvsky Mark GeneMarkS: a self-training method for
prediction of gene starts in microbial genomes. Implications for finding
sequence motifs in regulatory regions. Nucleic Acids Research 29: 2607-2618.
2001.
Brener Z . Biology of Trypanosoma cruzi. Annu Rev Microbiol 27:347-382. 1973
Burgos, J.M.et al. Direct molecular profiling of minicircle signatures and lineages of
Trypanosoma cruzi bloodstream populations causing congenital Chagas disease,
International Journal of Parasitology 37 (12), pp. 1319–1327, 2007.
154 Buscaglia CA and Di Noia JM . Trypanosoma cruzi clonal diversity and the
epidemiology of Chagas' disease. Microbes Infect 5:419-427. 2003.
Camargo, E.P., Growth and Differentiation in Trypanosoma Cruzi. I. Origin of
Metacyclic Trypanosomes in Liquid Media. Rev Inst Med Trop Sao Paulo, 6: p.
93-100.1964.
El-Sayed, N.M., et al., The genome sequence of Trypanosoma cruzi, etiologic agent of
Chagas disease. Science, 309(5733): p. 409-15. 2005.
Franzén, O.; Ochaya, S.; Sherwood, E.; Lewis, M. D.; Llewellyn, M. S. et al. Shotgun
Sequencing Analysis of Trypanosoma cruzi I Sylvio X10/1 and Comparison
with T. cruzi VI CL Brener. PLoS Negl Trop Dis 5(3): e984, 2011.
Freitas, J. M.; Augusto-Pinto, L.; Pimenta, J. R.; Bastos-Rodrigues, L.; Gonçalves, V.
F.; Teixeira, S. M.; Chiari,E.; Junqueira, A. C.; Fernandes, O.; Macedo, A. M.;
Machado, C. R.; Pena, S. D. Ancestral genomes, sex and the population
structure of Trypanosoma cruzi. PLoS Pathog 2: e24, 2006.
Gaunt, MW, Yeo, M, Frame, IA, Stothard, JR, Carrasco, HJ, Taylor, MC, S.S. Mena, P.
Veazey, G.A.J. Miles, N. Acosta, A.R. Arias, M.A. Miles.
Mechanism of
genetic exchange in American trypanosomes, Nature 241: 936-939.2003.
Ivens, A.C.; Peacock, C. S.;Worthey, E. A.; Murphy, L.; Aggarwal, G.; Berriman, M.;
Sisk, E.; Rajandream, M. A. et al. The genome of the kinetoplastid parasite,
Leishmania major. Science 309, pp. 436–442, 2005.
Jackson, A. P.; Sanders, M.; Berry, A.; McQuillan, J.; Aslett, M. A.; Quail, M. A.;
Chukualim, B.; Capewell, P.; MacLeod, A.; Melville, S. E.; Gibson, W.; Barry,
J. D.; Berriman, M.; Hertz-Fowler, C.The genome sequence of Trypanosoma
brucei gambiense, causative agent of chronic human african trypanosomiasis.
PLoS Negl Trop Dis. 4:e658, 2010.
155 Junqueira C, Santos LI, Galvão-Filho B, Teixeira SM, Rodrigues FG, DaRocha WD,
Chiari E, Jungbluth AA, Ritter G, Gnjatic S, Old LJ, Gazzinelli RT.
Trypanosoma cruzi as an effective cancer antigen delivery vector. Proc Natl
Acad Sci U S A.108(49):19695-700., 2011
Junqueira, C.; Gerrero, A. T.; Galvão-Filho, B.; Andrade, W. A.; Salgado, A. P.; Cunha,
T. M.; Robert, C.; Campos, M. A.; Penido, M. L.; Mendonça-Previato, L.;
Previato, J. O.; Ritter, G.; Cunha, F. Q.; Gazzinelli, R. T.; Trypanosoma cruzi
adjuvants potentiate T cell-mediated immunity induced by a NY-ESO-1 based
antitumor vaccine. Plos One, vol. 7, 2012
Lima, M. T.; Jansen, A. M.; Rondinelli, E.; Gattass, C. R. Trypanosoma cruzi:
properties of a clone isolated from the CL strain. Parasitol. Res., 77: 77-81,
1990.
Lima, M. T.; Lenzi, H. L.; Gattass, C. R. Negative tissue parasitism in mice injected
with a non-infective clone of Trypanosoma cruzi. Parasitol. Res. 81: 6-12, 1995.
Machado, CM and Ayala, FJ (2001) Nucleotide sequences provide evidence of genetic
exchange among distantly related lineages of Trypanosoma cruzi. Proc. Natl.
Acad.Sci.U.S.A. 98:7396-7401, 2001.
Peacock, C. S.; Seeger, K.; Harris, D.; Murphy, L.; Ruiz, J. C.; Quail, M. A.; Peters, N.;
Adlem, E.; Tivey, A. et al. Comparative genomic analysis of three Leishmania
species that cause diverse human disease. Nat Genet. 39(7):839-47, 2007.
Raymond F., Boisvert S., Roy G., et al.; Genome sequencing of the lizard parasite
Leishmania tarentolae reveals loss of genes associated to the intracellular stage
of human pathogenic species. Nucleic Acids Res. 40:1131-47.2012.
156 Soares, M. B.; Goncalves,
R.; et al; Balanced
immunized with an avirulent
cytokine-producing pattern in mice
Trypanosoma cruzi. An Acad Bras Cienc,
p. 167-172, 2003.
Souza, R. T.; Lima, F. M.; Barros, R. M.; Cortez, D. R.; Santos, M. F.; Cordero, E. M.;
Ruiz, J. C.; Goldenberg, S.; Teixeira, M. M. G.; Franco da Silveira, J.; Genome
Size, Karyotype Polymorphism and Chromosomal Evolution in Trypanosoma
cruzi. PLoS ONE 6(8): e23042, 2011.
Teixeira, S.M., L.V. Kirchhoff, and J.E. Donelson, Post-transcriptional elements
regulating expression of mRNAs from the amastin/tuzin gene cluster of
Trypanosoma cruzi. J Biol Chem. 270(38): p. 22586-94. 1995
Weatherly, D.B., C. Boehlke, and R.L. Tarleton, Chromosome level assembly of the
hybrid Trypanosoma cruzi genome. BMC Genomics, 10: p. 255. 2009.
Westenberger, S. J.; Cerqueira, G. C.; El-Sayed, N. M.; Zingales. B.; Campbell, D. A.;
Sturm, N. R. Trypanosoma cruzi mitochondrial maxicircles display species- and
strain-specific variation and possess a conserved element in the non-coding
region. BMC Genomics. 7: 2164-7-60, 2006.
WHO,
Chagas disease (American trypanosomiasis)
fact
sheet., in
Weekly
epidemiological record. 2010, World Health Organization: Geneva. p. 334 - 336.
Zingales B, Stolf BS, Souto RP, Fernandes O, Briones MR. Epidemiology,
biochemistry and evolution of Trypanosoma cruzi lineages based on ribosomal
RNA sequences. Mem Inst Oswaldo Cruz. 94:159–164. 1999
Zingales B,et al.; Second Satellite Meeting. A new consensus for Trypanosoma cruzi
intraspecific nomenclature: second revision meeting recommends TcI to
TcVI.Mem Inst Oswaldo Cruz. 104:1051-1054.2009
157 Table 1: Summary of sequencing data for the CL Brener
and CL-14 genomes.
158 T. cruzi
COII
SL
24S rDNA
Tc I
30, 81 and 264
150
110
Tc II
81, 82 and 212
150
125
Tc III
81 and 294
200
110
Tc IV
81 and 294
200
125
Tc V
81 and 294
150
110 and 125
Tc VI
81 and 294
150
125
CL-14
81 and 294
150
125
groups
Table 2 – Molecular marker profiles derived from
amplifications of two nuclear (SL and rRNA) and one
mitochondrial (COII) genes in different T. cruzi strains.
Expected fragment lengths for strains belonging to each T.
cruzi group and for the in silico generated products using
CL-14 sequences are shown in base-pairs.
159 Multigene Family
CL-14 CL Brener Identity %
Trans-sialidase
1463
1481
99.80
MASP
1399
1465
99.87
Mucin
999
992
97.82
RHS
773
777
99.74
DGF
565
569
99.84
GP63
491
449
99.73
RNA helicase
156
157
99.68
Kinesin
102
102
98.78
Tuzin
83
83
99.76
Cruzain (calpain)
67
66
99.05
Dynei heavy chain
45
45
99.35
Amastin
27
27
99.69
GAPDH
21
20
99.74
MSH2
2
2
100
PGP
2
2
100
Table 3 – Number of predicted copies of members of
multigene families and average identity between
homologous genes.
160 A)
B)
Figure 1: Chromosomal bands separation and Southern blot analysis of
T. cruzi CL-14 and CL Brener strains. Panel A shows ethidium staining
of pulse field gel separation of chromosomal bands isolated from CL-14
and CL Brener epimastigotes. Hybridizations were performed with GPI8
and MASP DNA probes, a single copy gene and multigênica family,
respectively. Panel B shows digested bands that hybridized with a 32Plabelled probe corresponding to a member of the amastin gene family
from CL Brener and DGF gene family, each.
161 Figure 2: Unrooted neighbour-joining trees based on predicted amino acid sequences of
the Trypanothione reductase (TR), the mismatch repair protein MSH2 and cytochrome
oxidase (COII) obtained from the genome databases of T. cruzi CL Brene, CL-14 and
Sylvio X-10 clones sequences. For the two nuclear genes (TR and MSH2) sequences
corresponding to the two alleles (esmo like and non-esmo like) are shown. Bootstrap
values were calculated over 1000 trees from pseudo-replicate datasets.
162 \
Figure 3: A representation of the T. cruzi CL-14 maxicircle with all annotated 18
protein coding genes, 2 ribosomal RNA (rRNA) genes and the repeptitive region.
163 100� bp�
500�
400�
300�
200�
100�
CL� Br� Try�
CL� Br� Epi�
CL-14Try�
CL-14� Epi� �
CL� Br� Try�
CL� Br� Epi�
CL-14Try�
CL-14� Epi� �
CL� Br� Try�
CL� Br� Epi�
CL-14Try�
MW�
C
CL-14� Epi� �
~500� bp�
CL� Br�
CL-14� �
B
A
177� –�
118� –�
� � 75� -�
����
51� -�
� � 39� -�
� � 26� -�
� � 18� –�
KDa�
Comassie Anti-SAPA Anti-TS
Figure 4 – In vitro analysis to check SAPA repeats amounts. Panel A
shows southern blots of digested DNA hybridized with SAPA probes.
Panel B shows PCR results of SAPA amplification. Panel C shows
western blots from total protein electrophoresis with anti-TS or antiSAPA antibodies.
164 300
200
125
100
rDNA
110
300
200
150
200
SL
100
300
200
294
264
212
81
100
COII
Supplementary figure 1: PCR amplification of microssatellite and maxicircle sequences
from CL-14 and CL Brener.
165