18.13 Manipulating genomes: gene editing

Analysing natural genomes rapidly leads to ideas about modifying genomes. Of course, since the dawn of agriculture, practical people have been involved in modifying the genomes of their cultivated plants and animals by a combination of artificial selection and selective breeding. Indeed, although we were unaware of it at the time, by selecting brews or ferments that produced the most satisfactory end products in brewing, baking and other food fermentations (cheese, salami, soy, miso) we have also been unconsciously applying selection pressure to the fungi and bacteria involved in those processes for a very long time (in the yeast world the process is called ‘domestication’).

During the twentieth century increasing knowledge of genetics enabled applied genetics to be much more formalised and very considerable advances were made in breeding improved varieties. Classical genetics of this sort puts the emphasis on the phenotype. What matters is the phenotypic characteristic of the new strain; those features that make it more useful or advantageous. In time, deeper analysis might establish the genetic basis of a phenotype or trait and enable genetic manipulation (mutation, controlled breeding) to further enhance the trait and/or combine it with others.

Automated DNA sequencing generates large volumes of genomic sequence data quite quickly. The consequence is that many genetic sequences are discovered well in advance of information about their function in the life of the organism. Molecular analysis enables us to start from the other end of this line of activity; we can seek to find the possible phenotypes that may obtain from a specific genetic sequence obtained by DNA sequencing. So, if classical, 20th century, genetics is considered to be forward genetics, proceeding from phenotype to genetic sequence; then the molecular genetics of the 21st century has come to be called reverse genetics. Reverse genetics attempts to link a specific genetic sequence with precise effects on the organism. But more than reversing the direction from which you view the objects the ‘…basic aspect of these approaches is that a complex system can be understood more thoroughly if considered as a whole…’ (Horgan & Kenny, 2011).

In practice the process proceeds from functional analysis by experimental design and can eventually lead to functional design. The essential flow of activity is: gene sequence → change or disrupt the DNA (deletion, inactivation by insertion, point mutation) → mutant phenotype → function → alter function → change sequence → new (improved?) phenotype.

This is called functional genomics; being the study of gene function on the genomic scale. In filamentous fungi it is a field of research that has made great advances in very recent years and which continues to advance at rapid pace. Transformation and gene manipulation systems have been developed and applied to many economically important filamentous fungi and oomycetes; overall, the integration of information from the various processes that occur within a cell provides a more complete picture of how genes give rise to biological functions and will ultimately help us to understand the biology of organisms, in both health and disease (Weld et al., 2006; Bunnik & Le Roch, 2013).

We have been using fungi to produce materials of commercial value for a long time; we mentioned ‘domestication’ in the first paragraph and there has always been a drive to improve the fungal strains involved, and this certainly applies to yeasts used in alcohol fermentation and baking, and fungi used for cheese finishing, and a range of food fermentations around the world.

We are concentrating on 21st century mycology in this book but before we go much further we must emphasise that recombinant DNA technology was developed in several model filamentous fungi more than a generation ago. A few noteworthy examples are:

  • 1973: The first DNA-mediated transformation of a fungal species using genomic DNA without the use of vectors was carried out by Mishra & Tatum (1973), who transformed an inositol-requiring mutant strain of Neurospora crassa to inositol independence using DNA extracted from an inositol-independent strain.
  • 1979: Case et al. (1979) developed an efficient transformation system for Neurospora crassa that used sphaeroplasts and a recombinant Escherichia coli plasmid carrying the N. crassa qa-2+ gene (which encodes the enzyme dehydroquinase).
  • 1983: Ballance et al. (1983) performed the first auxotrophic marker transformation in Aspergillus nidulans when they relieved an auxotrophic requirement for uridine in a mutant strain of A. nidulans by transformation with a cloned segment of Neurospora crassa DNA containing the corresponding (i.e. homologous) gene coding for orotidine-5′-phosphate decarboxylase.
  • 1985 saw the first successful transformations of a filamentous industrial fungus when Buxton et al. (1985) transformed sphaeroplasts of a mutant of Aspergillus niger defective in ornithine transcarbamylase function with plasmids carrying a functional copy of the argB gene of A. nidulans, and Kelly & Hynes (1985) transformed A. niger, which cannot use acetamide as a nitrogen or carbon source, with the amdS (acetamidase) gene of A. nidulans.

Restriction enzymes were discovered in 1970 and the ‘recombinant DNA technology’ toolkit emerged in the 1970s and 1980s. Several strategies have been used for these historical improvement projects, including:

  • Mutagenesis; meaning the use of mutagens to generate random deletions, insertions and point mutations, usually by creating large populations of mutagen treated organisms (forming a large library of mutants) using chemical mutagens (point mutations), gamma radiation (deletions) or DNA insertions (insertional knockouts). The hope is that in one step the treatment will produce strains with improved expression and secretion of the product of interest. This has been used successfully to improve productivity of:
    • α-amylase by Aspergillus oryzae (Section 17.16),
    • ‘cellulase’ by Trichoderma reesei, some mutants of which produce up to 40 g l-1 total ‘cellulase’ activity of which half is the cellobiohydrolase known as CBH-l (see Section 17.22),
    • penicillin by Penicillium chrysogenum, strain development by mutagenesis and strain selection of which is shown in Table 17.9 (Section 17.15).
  • Site-directed mutagenesis is a more refined technique that can modify chosen parts of the sequence of interest, such as regulatory regions in the promoter of a gene or codon changes in the ORF to identify/modify specific amino acids to affect directly the protein function. The technique can also be used to create ‘gene knockouts’ by deleting a gene function (forming what is known as a null allele). Directed deletions have been created in every non-essential gene in the yeast genome (Winzeler et al., 1999) and methods are available for efficient gene targeting in filamentous fungi (Krappmann, 2007). A significant advantage of site-directed (or insertional) mutagenesis over random chemical or radiation mutagenesis is that the genes mutated by insertion are tagged (i.e. physically identified) by the transforming DNA (T-DNA), which is used to disrupt the genes. This means that the molecules are readily identifiable in vitro, and, if the inserted sequence carries an expressed phenotype distinct from the recipient (such as an antibiotic resistance, ability to use an exotic substrate or render a toxin harmless) then the successfully-transformed cells can be identified in vivo.
  • Knockouts are gene deletions; an alternative approach is to substitute genes at specific times and in specific cells with experimental sequences and this is called ‘gene knockin’. The method involves insertion of a protein coding cDNA ‘signal’ or ‘reporter’ sequence at a particular site and is particularly applicable to study the function of the regulatory sites (promoters, for example) controlling expression of the gene being replaced. This is accomplished by observing how the easily-observed reporter phenotype responds to regulation.

Gene knockouts and knockins are permanent sequence alterations. Several gene silencing techniques target the expression machinery and are generally temporary. This approach is often called gene knockdown since the effect is usually to grossly reduce expression of the gene. Gene silencing may use double stranded RNA, also known as RNA interference (RNAi) or Morpholino oligos.

  • RNA interference relies on a specific cellular pathway (called the RNAi pathway) interacting with the introduced double-stranded RNAs (dsRNAs, typically over 200 nucleotides long), which are made to be complementary to some target messenger RNA (mRNA). An RNase-like enzyme called Dicer in this pathway generates small interfering RNAs (siRNAs) about 20-25 nucleotides long. The siRNAs assemble into complexes containing ribonuclease (known as RISCs, or RNA-induced silencing complexes). The siRNA strands guide the RISCs to their complementary target RNA molecules, which they cut and destroy; thereby systematically interfering with expression of the target gene, so that the effect of the absence of that gene activity can be catalogued.
  • Morpholino antisense oligos block access to the target mRNA without the need for mRNA degradation. Morpholinos contain standard nucleic acid bases, but instead of the bases being linked to ribose rings connected by phosphate groups, those bases are bound to morpholine rings linked through phosphorodiamidate groups. The latter are uncharged and therefore not ionised in the usual physiological pH range; this and the other structural differences mean that Morpholinos are not sensitive to the same enzymes or chemical reactions as natural polynucleotides, but they still bind to complementary sequences of RNA by standard nucleic acid base-pairing. Morpholinos (usually 25 bases in length) base pair with regions of the natural RNA and this binding blocks splicing and translation, and therefore expression of the target gene.
  • Natural genetic recombination; meaning classical ‘applied genetics’ involving cross-breeding to generate segregation and recombination of ‘desirable’ genes using the sexual cycle (although only a few of the fungi used in commercial industries reproduce sexually); the parasexual cycle (Section 7.8); heterokaryosis or protoplast fusion; combined with artificial selection of the required combination of useful traits. This approach has been used successfully to improve productivity of glucoamylase by Aspergillus niger and exoglucanase by Trichoderma reesei. Generally speaking, mutagenesis and recombination strategies increase productivity by less than two-fold in a single step.
  • Genetic manipulation; meaning the use of recombinant DNA technology to create a potentially unnatural fungal genotype that has commercially desirable characteristics, which is the main topic of the rest of this Section.

All these approaches require that a recipient cell is transformed by uptake of the constructed DNA so that the latter can at least form a partial heterozygote that ideally undergoes homologous recombination and integrates the constructed DNA into the resident chromosome. The first barrier to successful transformation is the fungal cell wall, and most transformation techniques depend on three main ways of breaching the wall (which can be combined to improve efficiency) (Weld et al., 2006):

  • enzymic removal of the cell wall to create protoplasts (which lack all wall material) or sphaeroplasts (which retain a residual amount of the original wall);
  • use of electroporation by applying electric shocks; a brief electric pulse (lasting in the region of 1 to 20 ms) at a potential gradient of about 0.5 to 10kV cm-1 is applied to temporarily permeabilise cell membranes to enable entry of large, charged molecules across the hydrophilic membrane;
  • or by ‘shooting’ micrometre-sized particles (usually of denser, relatively inert, metals like tungsten or gold) coated with DNA or RNA into the cells; a process called biolistic transformation. The microparticles coated with DNA or RNA are introduced into cells by being accelerated to velocities of approximately 500 m s-1 by the forces generated by explosion of gunpowder or by explosive expansion of cold helium gas.

Success was achieved with all these approaches, but by far the most important development was Agrobacterium tumefaciens-mediated transformation (AMT).

Agrobacterium tumefaciens is a gram-negative bacterium which is a common plant pathogen that causes crown gall tumours on plants. This tumorous growth of the plant tissue is induced when the bacterium transfers some bacterial DNA (called T-DNA) to the host plant. T-DNA is located on a 200 kbp plasmid (the tumour-inducing or Ti plasmid). The T-DNA integrates into the plant genome, then T-DNA genes that encode enzymes to produce plant growth regulators are expressed, and their expression results in uncontrolled growth of the plant cells. However, for use as a cloning vector, the T-region of the Ti plasmid can be deleted and replaced by other DNA sequences because plasmid virulence, transfer and integration are controlled by genes elsewhere on the plasmid.

What is significant for our present discussion is that Agrobacterium tumefaciens is able to transfer its T-DNA to a very wide range of fungi and produces a significantly higher frequency of more stable transformants than alternative transformation methods (Michielse et al., 2005). AMT is a relatively simple system to work with, primarily because it does not require the production of protoplasts or sphaeroplasts. Indeed, a major attraction of AMT is the variety of starting materials that can be used: protoplasts, spores, mycelium, and pieces of fruit body tissues have all produced successful transformation. Even fungi that have not been transformed by other systems have been successfully transformed by co-cultivation with Agrobacterium. The approach seems to be applicable to the full range of fungi (zygomycetes, Ascomycota and Basidiomycota) and shows great potential for fungal biotechnology and medicine (Michielse et al., 2005; Sugui et al., 2005). Agrobacterium tumefaciens mediated transformation has been described as:

 ‘…one of the most transformative technologies for research on fungi developed in the last 20 years, a development arguably only surpassed by the impact of genomics...[AMT] has been widely applied in forward genetics, whereby generation of strain libraries using random T-DNA insertional mutagenesis, combined with phenotypic screening, has enabled the genetic basis of many processes to be elucidated. Alternatively, AMT has been fundamental for reverse genetics, where mutant isolates are generated with targeted gene deletions or disruptions, enabling gene functional roles to be determined…’ (Idnurm et al., 2017).

Despite the confident descriptions given above and the use of phrases like ‘relatively simple system’, applying a transformation system to an organism for the first time is often not as ‘simple’ as might be suggested. There are many variables that must be optimised and even after reliable transformation systems have been developed, there may still be difficulties to overcome before it is possible to analyse gene function. A major potential problem for genetic analysis of any filamentous fungus is the multinucleate nature of the hyphae. Multiple nuclei can confuse results because gene replacement and insertional mutagenesis rely on the isolation of homokaryotic transformants derived from a single transformation event to study loss of function mutants (Weld et al., 2006). The consequence is that methods must be carefully refreshed and optimised every time they are applied to a new organism.

Gene cloning involves inserting DNA molecules of interest into specialised carriers called vectors that enable replication within a host cell, producing many copies of the inserted piece of DNA carried by the vector. Cloning vectors are ‘engineered’ to contain one or several recognition sites for restriction enzymes. Digesting both the vector and the DNA to be cloned with the same restriction enzyme produces complementary ‘sticky ends’ in both molecules, allowing the foreign (or heterologous) DNA fragment to be inserted into the vector. A vector carrying an inserted fragment of DNA is known as a recombinant plasmid. The replicated molecules are called clones because all the copies made in the host cell are identical.

After harvesting from the host cell, the cloned DNA can be purified for further analysis. There are several types of cloning vector, which differ in origin, nature of host cell, and in their capacity for the size of inserted DNA they can carry. The simplest vectors are bacterial plasmids, which are circular, double‑stranded, DNA molecules that replicate in the host independently of the main bacterial chromosome. Commonly used plasmids can carry up to 15 kb of foreign DNA, or up to 25 kb can be accommodated in vectors derived from the bacteriophage (‘phage’) lambda (λ). This is a double stranded DNA virus that infects the bacterium Escherichia coli. The λ phage DNA molecule circularises after infection because it has complementary single stranded overlaps at each end known as cos (for cohesive end) sites. A completely artificial, larger capacity vector has been engineered by inserting cos sites into a plasmid. These are called cosmids. They can carry up to 45 kb of inserted DNA and have the additional advantages that they use a virus coat to infect host bacteria (a very efficient way of entering the host) but replicate like a plasmid and can be constructed to use plasmid‑derived markers for recombinant selection.

Yeast artificial chromosomes (YACs) can carry DNA inserts of up to 1 million base pairs (1 megabase = 1 Mb) in length. YAC vectors are plasmids that contain yeast centromere DNA, two yeast telomeres separated by a restriction site, and yeast replication origins (autonomous replication sequences, or ARS) as well as two selectable markers. Restriction enzyme digestion produces two fragments, one a telomere + selectable marker + cloning site, the other a telomere + selectable marker + replication origin + centromere + cloning site, which are mixed with the DNA to be cloned. Among the constructs which result will be some which behave like yeast chromosomes during mitosis. Any that are constructed with two centromeres, without a centromere, or lacking a telomere will fail to segregate. Consequently, the presence of both selectable markers coupled with proper mitotic segregation is sufficient to identify the desired constructs.

All yeast vectors are shuttle vectors, meaning that they can be propagated (that is, grown) in cell cultures of both yeast and the bacterium Escherichia coli. These vectors contain a bacterial plasmid backbone that contains all the functions required for maintenance and selection in E. coli. They also contain yeast chromosomal elements that determine their characteristics and behaviour within yeast cells. The main types of yeast vectors are:

  • Integrative plasmids (YIp); which are maintained as a single copy providing they have integrated successfully into the genome.
  • Replicative plasmids (YRp); which contain a chromosomal origin of replication (ARS), and because of this origin of replication are maintained autonomously at high copy number (which means 20 to 200 copies of the plasmid per yeast cell).
  • Centromeric plasmids (YCp); which contain both ARS and centromere sequences and are consequently maintained in the cell as a single copy autonomously replicating supplementary chromosome.
  • Episomal plasmids (YEp); which is also an autonomously replicating plasmid (contains ARS) but contains the origin of replication from yeast’s own 2µ plasmid so it is maintained at a copy number of about 20 to 50 copies per cell. This type of vector is used for gene over-expression purposes (as are YRps). Gene over-expression creates a gain of function mutation and requires the use of multicopy vectors and strong promoters.

The ideal vector carries easily selectable markers, which enable transfer and incorporation of the vector and its cargo-DNA to be detected, and easily controlled regulation of the cargo-DNA (which usually means a strong and readily-controlled promoter) so that expression of the genes in which you are interested can be controlled; for examples, see Meyer et al., (2011) and Gressler et al., (2015). The best way of finding out about useful vectors is to view the genomics website for the organism you want to study. At the very least this will give you references to research on the organism, which will direct you towards vectors and techniques that have already been used successfully.

By a very considerable distance, the most crucial development in recent years has been gene editing. The process depends on engineered nucleases, which can be designed to cut at any location in the genome of any species and introduce modified DNA sequences into the endogenous (host organism) sequence. There are three major classes of engineered nuclease these enzyme (we will describe the fourth gene editing system, the CRISPR-Cas system, separately below):

  • zinc-finger nucleases (ZFNs),
  • transcription activator-like effector nucleases (TALENs) and,
  • engineered meganucleases.

Engineered nucleases create site-specific double-strand breaks at desired locations in the genome. These fusion proteins serve as readily targetable ‘DNA scissors’ for gene editing applications that enable targeted genome modifications to be accomplished such as sequence insertion, deletion, repair and replacement in living cells. The induced double-strand breaks are repaired through nonhomologous end-joining or homologous recombination, and the whole process results in precisely targeted mutations (‘edits’) being incorporated into the experimental genome. This type of gene editing was selected by the journal Nature Methods as the 2011 Method of the Year (Anonymous, 2011). Fundamental to the use of engineered nucleases in genome editing is that the engineered enzymes produce double stranded breaks (DSBs) in the DNA of the target organism. Double strand breaks are cytotoxic lesions that threaten genome integrity and most organisms have mechanisms to repair DSBs (Ceccaldi et al., 2016).

The concept underlying ZFNs and TALENs technologies is that of a non-specific DNA cutting catalytic domain (obtained from an endonuclease with discrete and separate DNA recognition and cleaving sites) being linked to peptides that recognise specific DNA sequences such as zinc fingers (ZFNs) and transcription activator-like effectors (TALEs). Zinc finger motifs occur in several transcription factors. The C-terminal part of each finger is responsible for the specific recognition of a short region (about 3 base pairs) of the DNA sequence. Combining 6 to 8 zinc fingers whose recognition sites have been characterised produces a protein that can target around 20 base pairs of a specific gene. Although the nuclease portions of both ZFNs and TALENs constructs have similar properties, the difference between these engineered nucleases is in their DNA recognition peptide. ZFN ‘zinc fingers’ rely on a combination of cysteine and histidine residues to react with their metal ions so codons for those amino acids identify the nuclease target sequence.

Transcription Activator-Like Effectors (TALEs) are proteins secreted by Xanthomonas plant pathogenic bacteria that bind promoter sequences in the host and activate expression of plant genes that aid bacterial infection. They recognise plant DNA sequences through a central repeat domain consisting of a variable number of about 34 amino acid repeats. TALEs can be engineered to bind to practically any desired DNA sequence, so when combined with a Nuclease, the TALENs (which are artificial, engineered, restriction enzymes) can cut DNA at the specific location(s) desired by the experimenter. TALEN constructs are used in a similar way to ZFNs but have three advantages in targeted mutagenesis: (i) DNA binding specificity is higher, (ii) off-target effects are lower, and (iii) construction of DNA-binding domains is easier.

Meganucleases, discovered in the late 1980s, are endonucleases characterised by a large recognition site (DNA sequences of 12 to 40 base pairs). Sites of this length generally occur only once in any given genome, so meganucleases are the most specific of the naturally occurring restriction enzymes. Such meganucleases are quite common, but the most valuable tools for gene engineering have been derived from the LAGLIDADG family of endonucleases, so-called for the conservation of a specific amino acid sequence motif which is defined by each letter as a code that identifies a specific residue (the motif is: Leucine-Alanine-Glycine-Leucine-Isoleucine-Aspartic acid-Alanine-Aspartic acid-Glycine). This motif binds to a specific DNA sequence; change the amino acid sequence and it will bind to a different DNA sequence. The ‘engineering’ aspect of this is that mutagenesis and high throughput screening methods have been used to create meganuclease variants that recognise a defined a catalogue of unique DNA sequences. Others have been fused to various meganucleases to create hybrid enzymes that recognise a new sequence and yet others have had the DNA interacting amino acids of the meganuclease altered to design sequence specific meganucleases; all contributing to what is called rationally designed meganuclease. Meganucleases have the benefit of causing less toxicity in cells than ZFNs because of more stringent DNA sequence recognition; however, the construction of sequence-specific enzymes for all possible sequences is costly and time consuming. Nevertheless, it can be done. View https://en.wikipedia.org/wiki/Genome_editing to learn more about engineered nucleases.

The CRISPR-Cas9-based system has become a common platform for genome editing in a variety of organisms. CRISPRs (Clustered Regularly Interspaced Short Palindromic Repeats) are genetic elements, which provide bacteria with adaptive immunity to viruses and plasmids. They consist of short sequences that originate as remnants of genes from past infections, sandwiched between unusual, repeated bacterial DNA sequences; the ‘clustered regularly interspaced short palindromic repeats’ that give CRISPR its name. The CRISPR-associated protein Cas9 is an endonuclease that uses a guide sequence within an RNA duplex, tracrRNA:crRNA, to form base pairs with DNA target sequences, enabling Cas9 to introduce a site-specific double-strand break in the DNA. The dual tracrRNA:crRNA was engineered as a single guide RNA (sgRNA) that retains two critical features:

  • a sequence at the 5′ side that determines the DNA target site by Watson-Crick base-pairing with the target DNA, and
  • a duplex RNA structure at the 3′ side that binds to Cas9.

From this, Doudna & Charpentier (2014) created a simple two-component system in which experimenter-determined changes in the guide sequence of the sgRNA direct Cas9 to target the specific DNA sequence of interest to the experimenter. Cas9-sgRNA-mediated DNA cleavage produces a blunt double-stranded break in the target DNA that triggers repair enzymes to disrupt or replace DNA sequences at or near the cleavage site. Catalytically inactive forms of Cas9 can also be used for programmable regulation of transcription and visualisation of genomic loci.

The simplicity of the CRISPR-Cas9 system has made this a cost-effective and easy-to-use technology to precisely and efficiently target, edit, modify, regulate, and mark genomic loci of a wide array of cells and organisms. By introducing plasmids containing Cas genes and specifically constructed CRISPRs into living eukaryotic cells, the eukaryotic genome can be cut at any desired position. This is the quickest and cheapest method for gene editing and requires the least amount of expertise in molecular biology because it is RNA rather than protein that is engineered to guide the nuclease to the target. This is a major advantage that CRISPR has over the ZFN and TALEN methods; it can target different DNA sequences using its about 80-nucleotide sgRNAs, while both ZFN and TALEN methods require construction and testing of the proteins created for targeting each DNA sequence. The CRISPR-Cas system was selected by the journal Science as its 2015 Breakthrough of the Year (McNutt, 2015); you can read about the latest developments in the ‘CRISPR revolution’ topic page written by Jon Cohen (a staff writer for Science) at this URL: http://www.sciencemag.org/topic/crispr, and Frederik Bussler’s 'CRISPR.Report' website at: https://www.crispr.report/cas-clover-the-clean-alternative-to-crispr-cas9/.

Gratifyingly, on 7th October 2020 The Royal Swedish Academy of Sciences awarded the Nobel Prize in Chemistry 2020 to Emmanuelle Charpentier and Jennifer A. Doudna ‘…for the development of a method for genome editing’ [https://www.nobelprize.org/prizes/chemistry/2020/press-release/].

Since 2015, the number and diversity of known CRISPR–Cas systems have markedly increased; these have been reviewed by Makarova et al. (2020).  Gene editing technologies have been developed for application to animals (Dunn & Pinkert, 2014), plants (Mohanta et al., 2017) and fungi (Nødvig et al., 2015; Chen et al., 2017; Pudake et al., 2017; Zheng et al., 2017; Vonk et al., 2019; Wang & Coleman, 2019), and they all make fascinating reading.

Anzalone et al. (2019) describe a technique they call prime editing (or search-and-replace genome editing) as being:

 ‘…a versatile and precise genome editing method that directly writes new genetic information into a specified DNA site using a catalytically impaired Cas9 fused to an engineered reverse transcriptase, programmed with a prime editing guide RNA (pegRNA) that both specifies the target site and encodes the desired edit.’

The authors claim that prime editing greatly expands the scope and capabilities of genome editing, and in principle could correct about 89% of known pathogenic human genetic variants. So, what could it do for fungi?

Updated October, 2020