Current Knowledge Of Lepidoptera Genomes And Future Directions

Science is often advanced with the development of new technologies. Since the sequencing of the first human genome, there has been much progress made in DNA sequencing technologies. We now have the ability to sequence complete genomes for a relatively low cost and much of the analyses can be done within a small research group. As a result, genomes are being sequenced across multiple taxonomic groups and research in genetics is quickly moving to a genomic scale. Studies that were once done with a few genetic markers are now using data from complete genomes, an approach which expands the scope of scientific questions that can be addressed.

To highlight the current state of genome sequencing within the Arthropoda, the journal Current Opinion in Insect Science published an issue dedicated to reviews of selected insect taxa. This series of articles focused on available genome sequences and future work necessary to accelerate the use of genomic technologies in entomological research. One particular review covered research within the Lepidoptera, the insect order comprised of butterflies and moths. This article not only reviews the current state of Lepidoptera genome sequencing but also emphasizes future challenges, including suggestions for storing and distributing genomic data to the arthropod research community.


The Lepidoptera (butterflies and moths) is one of the most ecologically diverse insect orders with more than 157,000 species described in 43 superfamilies. Most Lepidoptera belong to the taxonomic grouping of Ditrysia, which contains approximately 98% of the described species (Figure 1). Their genomes are relatively small in size (~200 – 800MB or 1⁄4 of the human genome) and lack structural complexity. However, there are < 80 species among <10 superfamilies whose genomes have been sequenced and assembled, with most belonging to butterfly, and a few moth families (Figure 1).

Figure 1. Phylogeny of Lepidoptera showing relationships among the major superfamilies and the number of assembled genomes (modified from Mitter et al. [4]). Orange highlights indicate superfamilies with at least one genome with a functional gene annotation; yellow indicates superfamilies with only a single genome and no functional annotation. The graph on the upper left shows the number of annotated genomes published per year since 2008. Republished with permission from Elsevier from
The growth of genome sequencing has led to larger phylogenomic datasets but with many Lepidoptera families lacking complete genome assemblies, truly robust datasets cannot be compiled. Similarly, the function of many genes, particularly among insects, remains untested, though novel gene editing technologies are emerging quickly. The community support for Lepidoptera genomics is growing with better management and dissemination of data. It would benefit still from more consistent database standardization and additional genome sequences that are more evenly distributed throughout the group.

One central repository for Lepidoptera genomes is lepbase, which provides associated assembly statistics and gene annotations [1]. Platforms such as the i5k Workspace@NAL [2] hosts arthropod genomes and provide analytical assistance for users with limited bioinformatic experience. There are a number of other valuable databases available and often users need to search multiple sources to find a genome assembly of interest. Further complications include sequencing projects occurring in parallel without researchers being aware of related work. To avoid potential conflicts, it is recommended that members of lepbase or i5k be informed of genome sequencing projects to keep the community updated.

As to long-term data storage, it is good practice to archive completed or draft genome assemblies within the National Center for Biotechnology Information site (NCBI) upon completion to ensure that the data are screened and assigned an accession number for reference. It can be difficult to determine when a genome is “complete” and several versions of a single species’ genome can be released at different draft stages, which often makes comparisons difficult. With an assigned accession number, if improvements are made to a released genome, users can archive different versions of the same genome sequence and ensure downstream analyses are completed on a standardized set of genomic data.


The first Lepidoptera genome sequenced was the domesticated silkworm, Bombyx mori [3], a model species important for commercial silk production. Since then, the majority of Lepidoptera genomes have been sequenced within the past 5 years (Figure 1) and continues to grow as sequencing costs decrease and sequencing technologies improve. Broader sampling across major phylogenetic lineages is needed for the field of Lepidoptera genomics to move forward. Moreover, scientists should continue to make genomes publicly available along with metadata describing the assembly process while noting any limitations so they can be used more efficiently.

These findings are described in the article entitled Lepidoptera genomes: current knowledge, gaps and future directions, recently published in the journal Current Opinion In Insect ScienceThis work was conducted by Deborah A Triant, Scott D Cinel, and Akito Y Kawahara from the University of Florida in Gainesville, FL.


  1. Challis RJ, Kumar S, Kumar K, Dasmahapatra K, Jiggins CD, Blaxter M. Lepbase: the Lepidopteran genome database. bioRxiv 2016
  2. Poelchau M, Childers C, Moore G, Tsavatapalli V, Evans J, Lee C-Y, Lin H, Lin J-W, Hackett K. The i5k Workspace@NAL—enabling genomic data access, visualization and curation of arthropod genomes. Nuc Acids Res 2015 43:D714-D719.
  3. Mita K, Kasahara M, Sasaki S, Nagayasu Y, Yamada T, Kanamori H. The Genome Sequence of Silkworm, Bombyx mori. DNA Res 2004 11:27-35.
  4. Mitter C, Davis DR, Cummings MP. Phylogeny and evolution of Lepidoptera. Annu Rev Entomol. 2017 62:265-283.



Harnessing The Power Of Rural Youth Has The Biggest Potential For Poverty Reduction

Kamal, 22, a seasonal farmer and a local of Gafargaon Upazila in the Mymensingh district, is leaving his home in […]

The Wenchuan Earthquake: 10 Years Later

On May 12th, 2008, an earthquake of magnitude 7.9 (Mw) hit the Longmen mountain range in the west part of […]

Modeling Metabolism To Investigate The Response Of S. aureus To Different Nutrient Environments

S. aureus is a type of bacterium that is carried in the nose of 30% of healthy adults and can […]

Slow-Release Of Two Classical Antibiotics: Finding The Matrix Right Under Your Feet

Since Alexander Fleming discovered penicillin in 1928, millions of human lives have been saved thanks to the use of antibiotics. […]

Comparison Of Pollution Into Water Catchments From Snowmelt-Runoff Vs Rainfall-Runoff

Urbanization has accelerated in recent years worldwide, and cropland and woodlands have been converted to roads, buildings, and other paved […]

If Neural Networks Are Allowed To Sleep And Dream, Their Performance Sensibly Increases

The harmonic oscillator for associative memory and pattern recognition in Artificial Intelligence is certainly the Hopfield model [1] (or, equivalently […]

Investigating Sea-Level Sediment Transport And The Summer Monsoon Season

The International Ocean Discovery Program (IODP) is one of the most an influential marine research collaboration programs that explores the […]

Science Trends is a popular source of science news and education around the world. We cover everything from solar power cell technology to climate change to cancer research. We help hundreds of thousands of people every month learn about the world we live in and the latest scientific breakthroughs. Want to know more?