Why does the SARS-Cov2 genome end in aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa? (2020)

FormerBandmate · on Feb 11, 2023

The top answer at the link explains it best:

Good observation! The 3' poly(A) tail is actually a very common feature of positive-strand RNA viruses, including coronaviruses and picornaviruses.

For coronaviruses in particular, we know that the poly(A) tail is required for replication, functioning in conjunction with the 3' untranslated region (UTR) as a cis-acting signal for negative strand synthesis and attachment to the ribosome during translation. Mutants lacking the poly(A) tail are severely compromised in replication.

subroutine · on Feb 11, 2023

Adding a poly(A) tail when engineering rna plasmids is so common that it's part of the standard feature library of most plasmid editors.

Here I have a screenshot of the ApE editor displaying one plasmid I made for a neurobio experiment involving the overexpression of two chimeric proteins (actin and profilin, respectively linked to green and red fluorophores; note the editor has autotagged the polyA tail feature):

https://ibb.co/XSCSKC9

userbinator · on Feb 12, 2023

Looks almost like a hex editor.

samus · on Feb 12, 2023

It kind of is. We are not quite advanced enough to have the equivalent of true programming languages for genetics.

digitaltrees · on Feb 12, 2023

But we will. GitHub for genetics would be awesome

sleepydog · on Feb 12, 2023

It reminds me of Wireshark

koeng · on Feb 12, 2023

ApE is one of those tools that is so old-school but just works. I used to hate using it but it's really grown on me - Snapgene is just too darn expensive and benchling isn't very power user friendly

x-shadowban · on Feb 12, 2023

Can you add any number of A's, or just 1, or shorten it by 1, and it still is functional?

subroutine · on Feb 12, 2023

The number of adenine repeats that confer functional properties is quite variable but it definitely needs to be more than "just 1". I've seen anywhere from 25-250 used in designed plasmids. The exact number people use in their engineered sequence is based on a number of factors, not all of them scientific in nature (e.g. companies charge per basepair synthesize a bespoke polypeptide; e.g. you copied the sequence from a previous clone into ApE and that sequence used 30 repeats and worked fine).

kevviiinn · on Feb 12, 2023

Iirc there's a protein that adds 12-15 slowly then another protein comes in and adds another 200+ when it detects the start of the tail. Not sure about the details about how or why it stops tho. At least when considering mRNA getting prepped for nuclear export

subroutine · on Feb 12, 2023

You are probably right, I'm not an expert on this process. But there's likely a difference between the innate eukaryotic cell polyadenylation process vs. how coronavirus accomplishes polyadenylation because coronavirus rna never enters the nucleus.

kolinko · on Feb 12, 2023

In this case it probably works most reliably with around this number of As, and less/more would decrease reliability, butnitmwould still be functional.

Most of genetics is like that.

kevviiinn · on Feb 12, 2023

With more As you risk running into the upper limit on sequence length for the virus shell and with less you run the risk of quicker degradation and not enough expression

subroutine · on Feb 12, 2023

The first point being only a concern if you are a virus. If instead you are engineering say, an rna vaccine, you could package your transcript into lipofectamine instead of a coronavirus envelope ;) but point taken, there's always an upper limit.

kevviiinn · on Feb 12, 2023

Also a concern when engineering viruses for research or other treatments such as the AAVs

subroutine · on Feb 14, 2023

True. I should have said "if you are using a virus"

userbinator · on Feb 12, 2023

attachment to the ribosome during translation

There's been a lot of analogies with NOP slides in the comments here and there, but if you look at how the process of reading the genome works, this section is more like the leader/trailer on a tape:

https://en.wiktionary.org/wiki/leader#Noun "A piece of material at the beginning or end of a reel or roll to allow the material to be threaded or fed onto something, as a reel of film onto a projector or a roll of paper onto a rotary printing press."

https://en.wiktionary.org/wiki/trailer#Noun "A short blank segment of film at the end of a reel, for convenient insertion of the film in a projector."

worewood · on Feb 12, 2023

And this fits nicely because the genome replication really looks like a roll of film being played

kevviiinn · on Feb 12, 2023

Only if the film gets bent and twisted around itself, binding on itself to activate or deactivate sequences and with the read film actively getting spliced to create slightly different versions of the scenes

Honestly it's fuckin wild, there's a lot going on rather than just linear read->express

xg15 · on Feb 12, 2023

As a coder, I find it a bit frustrating how often explanations in biology contain fuzzy phrases like "plays a role in" or "is required for" or "works together with" or "modulates the activity of" - without ever expected what role it plays, why it is required or how it works together.

So props to them for going a bit more into detail here - and also highlighting that the reason for the fuzzy phrases can often be that we literally don't know the details: The empirical basis may be "if this thing is removed then this other thing won't work", without us necessarily knowing why this is the case.

madhadron · on Feb 12, 2023

The other side of this is: what constitutes an explanation? What formal structure that you express observations in seems like knowing what's going on to you? Formal structures in fields tend to be matched to what the experimental abilities of the field are.

In programming, we designed our systems to give us what seem like hard bottoms in our formal models. Most programmers don't reason below the level of their structured programming language. Of the ones that do, most treat the processor instructions as a hard bottom. There are layers down and down until you have physicists working on semiconductor properties, but we have intentionally designed the layers so that you can comfortably rest on them.

In biology any formal structure you think in is logically poised over the abyss. What pins it in place is not that it is on philosophical bedrock, but the observations and experiments that the formal structure summarizes.

kevviiinn · on Feb 12, 2023

It isn't unknown its just complicated. There are numerous proteins involved, such as a protein that detects the poly A tail and if not present will degrade the RNA by cleavage.

sublinear · on Feb 12, 2023

I think you'll find similar bullshittery in the details of every topic that isn't math. Think of it like finding "TODO" comments in old code.

eropple · on Feb 12, 2023

I mean, historically mathematics have had areas full of "HACK:" comments, too--ones retroactively applied to the whole of Newtonian physics, even, but it's still useful enough that we keep it around!

af3d · on Feb 12, 2023

Newtonian physics is not "a hack". It yields very precise results for simple gravitational interactions, at least up to the point where relativistic effects begin to dominate. Even Einstein's equations are not 100% "perfect". All mathematical models have their limitations. (Although some do produce better approximations than others.)

eropple · on Feb 12, 2023

A hack is in the eye of the beholder, and many hacks are load-bearing and totally fine for the entire timespan for which the thing they're stashed within is expected to be useful. I agree with you that all mathematical models have their limitations--but the choice to use a good-enough one is, over sufficient time and distance, a tradeoff that can be described as such.

sublinear · on Feb 13, 2023

All I meant is that the approximations in math are in the assumptions which is far easier to deal with in the grander scheme of things.

eropple · on Feb 13, 2023

Assumptions? Don't worry; we've got those too! if I had a nickel for all the times I've seen somebody pick 1 for "0, 1, is N" input cases? ;)

koheripbal · on Feb 11, 2023

Interestingly, vaccines also have a long repeating sequence at the end. It provides molecular stability.

lambdasquirrel · on Feb 12, 2023

Found a wikipedia article describing it:

https://en.wikipedia.org/wiki/Polyadenylation

ramesh31 · on Feb 11, 2023

The biological equivalent of an endline

qclibre22 · on Feb 12, 2023

The stop codon is more like the '\n' or '\0' in computing. Polya tails protect against degradation and is used for nuclear export of RNA.

subroutine · on Feb 12, 2023

This is true, but perhaps worth noting the nuclear transport function of polyA tails don't come into play for coronavirus. The payload of coronavirus is a positive-sense single-stranded RNA. Which means it does not need to enter the nucleus for preprocessing and can basically just start replicating shortly after entering a cell. See diagram...

https://upload.wikimedia.org/wikipedia/commons/f/f4/Coronavi...

There might be a software analog to another polyA tail feature: the provision of a 'shelf-life'. Each replication cycle removes a few adenosines, and at a certain point the tail sequence is too short to recruit protection and the RNA is ushered into the degradation pathway.

NobleLie · on Feb 12, 2023

I don't think the genome has a specific length of the tail (33 is a consensus length)

During genomic assays, the poly a tail will not be a specific length, but a single consensus sequence is still provided.

This was also posted in the first comment:

> Similar to eukaryotic mRNA, the positive-strand coronavirus genome of ~30 kilobases is 5’-capped and 3’-polyadenylated. It has been demonstrated that the length of the coronaviral poly(A) tail is not static but regulated during infection; however, little is known regarding the factors involved in coronaviral polyadenylation and its regulation. Here, we show that during infection, the level of coronavirus poly(A) tail lengthening depends on the initial length upon infection and that the minimum length to initiate lengthening may lie between 5 and 9 nucleotides. By mutagenesis analysis, it was found that (i) the hexamer AGUAAA and poly(A) tail are two important elements responsible for synthesis of the coronavirus poly(A) tail and may function in concert to accomplish polyadenylation and (ii) the function of the hexamer AGUAAA in coronaviral polyadenylation is position dependent. Based on these findings, we propose a process for how the coronaviral poly(A) tail is synthesized and undergoes variation. Our results provide the first genetic evidence to gain insight into coronaviral polyadenylation.

Peng Y-H, Lin C-H, Lin C-N, Lo C-Y, Tsai T-L, Wu H-Y (2016) Characterization of the Role of Hexamer AGUAAA and Poly(A) Tail in Coronavirus Polyadenylation. PLoS ONE 11(10): e0165077

subroutine · on Feb 12, 2023

The polyA tail isn't even coded by the genome, it's added after transcription by a processive polyadenylation multiprotein complex with the final tally being the result of partially stochastic processes. So yeah, agreed, the number of adenines is variable.

nomercy400 · on Feb 12, 2023

So it is more like a protective casing? Or like a car bumper, to protect what's inside?

amelius · on Feb 12, 2023

Can a drug target that sequence specifically?

moralestapia · on Feb 12, 2023

That sequence is present everywhere in your body and living beings, so no.

https://en.wikipedia.org/wiki/Polyadenylation

refurb · on Feb 12, 2023

Not if it's common to both viruses and human cells you can't.

TEP_Kim_Il_Sung · on Feb 12, 2023

IIRC Adenin binds to Thymin, and there are some viruses and bacteria that have alternative bases, and scientusts have discovered 82 other possible ones.

If the virus could be bound with an artificial RNA strand that had a stronger bond than natural RNA, it could be denatured, and pooped out.

https://devries.chem.ucsb.edu/research/past/base-pairing

anonymouskimmer · on Feb 12, 2023

Poly-A binding proteins naturally exist. They are used to regulate translation and to sequester mRNA during heat shock stress, IIRC. This prevents mistranslation, again IIRC.

https://faseb.onlinelibrary.wiley.com/doi/10.1096/fasebj.31....

lsternlicht · on Feb 12, 2023

dimisdas · on Feb 12, 2023

Is regex a drug feature we are building towards?

kps · on Feb 12, 2023

You can't parse DNA with regex.

WirelessGigabit · on Feb 12, 2023

Not with that attitude.

amstan · on Feb 12, 2023

If stackoverflow taught me anything, you will only summon Zalgo by trying.

cornel_io · on Feb 12, 2023

They used to say that about email addresses. But hold my beer and I'll be back in like a month with some buggy half-assed crap that kinda does the job and only occasionally crashes the system!

kevviiinn · on Feb 12, 2023

In my experience genetics is a bit more complicated than an email address

Yes, that is an understatement

eyelidlessness · on Feb 12, 2023

Is it that simple? From a lay perspective (and granted I’m a little foggy because I actually have covid right now), I’d expect the answer is yes but with huge unintended consequences.

_kava · on Feb 12, 2023

Ok yes, theoretically you can make something targeting the polyA tail. But everything else you body make will also get targeted because this is basically a marker of all RNA for translation.

Now making a drug that targets only viruses and not your body RNA? Possible but it is so hard not much progress has been made.

jkingsman · on Feb 12, 2023

If this sort of question fascinates you, you might like "Reverse Engineering the source code of the BioNTech/Pfizer SARS-CoV-2 Vaccine"[0], an article written with a tone that I've found to resonate with engineers and like-minded folk.

[0] https://berthub.eu/articles/posts/reverse-engineering-source...

a_w · on Feb 12, 2023

https://news.ycombinator.com/item?id=25598270

farkanoid · on Feb 12, 2023

What a fascinating article! Thanks for sharing.

I didn't realise there was so much crossover between embedded design and biology!

sirsinsalot · on Feb 12, 2023

That was exceptionally informative and accessible. Thank you

doctor_eval · on Feb 12, 2023

Fantastic article!

jtchang · on Feb 11, 2023

It's like a NOP slide for viruses: https://en.wikipedia.org/wiki/NOP_slide

Just kidding...sort of!

uxp100 · on Feb 12, 2023

User Zoe Sparks on that page covers why they don’t really feel it’s like a nop slide. I think that answer is a good supplement to the accepted one.

akira2501 · on Feb 12, 2023

I think they miss the point entirely. The environment in the cell is mostly mechanical, but it's also dominated by random forces. If you are unable to guarantee where you are going to "enter the sled" or when the "tail hits the ribozyme" then the nop sled seems to be an equivalent feature.

So.. to me, it's odd they invoke "legitimate code." The comparison I'd consider would be "combative code." For example, the old game "core wars." Thinking in that mindset, I can see several uses for a "nop sled" in "legitimate code."

gleenn · on Feb 11, 2023

I don't think this is nearly as true for virii genomes, but larger species have lots of protetive sections of DNA to protect from mutations. If you lose a non-protein-coding section of DNA to mutation, no harm to the species occurs. In humans, only about 1.5% of our DNA codes for protein that is actually generated. Virii are physically extremely tiny in terms of cell size and must be very efficient in terms of storing the DNA within them so way more actually codes, but no doubt there are similar factors at play.

flobosg · on Feb 11, 2023

Viral genomes are very compressed. In fact, viruses usually have overlapping genes, where one genomic region can codify for more than one product. See the following review for more information: https://www.nature.com/articles/s41576-021-00417-w

HL33tibCe7 · on Feb 11, 2023

Read https://berthub.eu/articles/posts/reverse-engineering-source...

jboy55 · on Feb 12, 2023

It amazes me that the genome is only 29k long. If you were to write a computer virus now, it probably wouldn't be that short, let alone something that can infect and kill millions of people.

IshKebab · on Feb 12, 2023

A computer virus that just replicates and causes damage can be a lot smaller than that.

tduberne · on Feb 12, 2023

Very interesting. I first wondered how nature can randomly generate such a pattern, and then realized we are just falling for our "built in" pattern recognition: it would feel much more "natural" for the stop sequence to be the encoding of some specific protein without any clearly recognizable pattern... But it would actually be more unlikely to appear/survive mutation than "any long-enough sequence of A".

I also like how it is established that this has an effect on replication, but that as far as I understand we do not understand the underlying process. Humbling.

roenxi · on Feb 11, 2023

File formats are really easy to figure out and are a big advantage for moving data around. Even without an academic theory, pretty much everyone in software starts to figure out the same tricks as soon as reliable transmission becomes a goal. I assume that at least one reason for this is that genomes are data, data likes to live in structured formats, and file terminators are more reliable for biology to process than encoding the length of the genome (although, biology being messy, I wouldn't be shocked if both were done). Evolution has a good grasp of engineering principles.

Are there probably desirable chemical properties? Yes. Is nature overloading each part of a genome with uses? More than likely. Has it figured out how to terminate a sequence? Obviously.

roughly · on Feb 12, 2023

So something to keep in mind when looking at biological systems, and especially when looking at genomics - evolution doesn't actually have a master plan or agency, it's just drift and reproduction. There's a lot of reuse and a lot of parsimony that can look elegant, but there's no 'design' process of evolution - it's a pile of things that looked enough like other stuff that over time with some tweaks they could take on dual roles. There's also no separation of concerns - DNA is a molecule, and it's acted upon by other molecules following the same rules of chemical and quantum interactions that affect everything else. Certain RNA sequences aren't transcoded but rather fold into functional molecules and enzymes, and protein folding and subsequent structure and function is affected by how fast the RNA is transcribed, which is affected by the population of available tRNA molecules. Genomics only looks like information - it's still chemistry.

mlhpdx · on Feb 12, 2023

I am entirely unqualified to answer, but I choose to believe it’s the equivalent of scratch (reserved stack) memory in an executable image. If I’m wrong, well at least I’m enjoying it.

psanford · on Feb 11, 2023

I can't see that and not think: base64 null byte padding.

layer8 · on Feb 12, 2023

Yeah, I’d be worried if it was AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA===. ;)

personjerry · on Feb 11, 2023

The virus printer adds padding to make the genome a multiple of 64

nashashmi · on Feb 11, 2023

So upon multiple sequential infections, the polytail A will reduce?

m0llusk · on Feb 12, 2023

Obligatory: https://uncyclopedia.com/wiki/AAAAAAAAA!

pimpampum · on Feb 12, 2023

Trying to Buffer overflow the cell.

Apocryphon · on Feb 12, 2023

Nominative determinism

zxcvbn4038 · on Feb 11, 2023

Maybe the scientist who made it died before he finished? ATTACGAAAAAAAAAAAAAAAAAAAAAAAA……

jMyles · on Feb 11, 2023

Look, if he died while engineering a virus, he wouldn't bother to code "AAAAAAA" he'd just say it!

naasking · on Feb 11, 2023

Clearly he died at his computer and his face landed on the keyboard.

mynameisash · on Feb 11, 2023

Well that's what it says.

DonHopkins · on Feb 11, 2023

Perhaps he was dictating?

https://www.youtube.com/watch?v=ZlIz0q8aWpA

akiselev · on Feb 11, 2023

Dictated, not transcribed.

dilznoofus · on Feb 11, 2023

Dictated, but not read

notfed · on Feb 12, 2023

Why the downvotes? Video is hilariously relevant.

superjan · on Feb 12, 2023

Perhaps they were dictating?

josefx · on Feb 12, 2023

Proof that god is dead?

layer8 · on Feb 12, 2023

[flagged]

vandahm · on Feb 12, 2023

[flagged]

ihatepython · on Feb 12, 2023

Movie should have been named CATTAGA

kazinator · on Feb 12, 2023

This ending is the result of the Lisp closing parenthesis being important enough to be directly mapped to a dedicated nucleotide base.

inDigiNeous · on Feb 12, 2023

[flagged]

sirsinsalot · on Feb 12, 2023

"Maybe" isn't a good replacement for "I don't know".

Other people do know and you can find this information out.

The body uses it in RNA for denoting the life cycle of reuse to avoid degradation.

The linked article tells you

otherme123 · on Feb 12, 2023

Of all the possible signatures of SARS-CoV-2 being lab-made, the polyA isn't one or them.

https://en.m.wikipedia.org/wiki/Polyadenylation

rmbyrro · on Feb 11, 2023

[flagged]

luckylion · on Feb 11, 2023

Had it started with aaaaaaaaaaaaa, I would've put my money on someone optimizing for the virus yellow pages.

dogma1138 · on Feb 11, 2023

Isn't the start and end of a genome is rather arbitrary?

rightbyte · on Feb 12, 2023

Doesn't the replication have a direction though? I.e. start and end.

xwdv · on Feb 11, 2023

It’s a good thing people aren’t more intelligent, otherwise they would be able to construct and propagate deeper and more advanced conspiracies like this. Most people thankfully don’t even know what a genome is.