A Computer Vision Application For Galaxy Detection

Computer Vision is an interdisciplinary field that combines knowledge from disciplines such as Physics, Computer Science, and Electrical Engineering. Its main goal is to develop algorithms and systems capable to reproduce human vision skills. The fields most closely related to computer vision are image processing, image analysis, and machine vision.

The core applications of computer vision have been historically in the healthcare, automotive, and agriculture industries, mostly because of the large investments required for the development and deployment of these systems. During the last 8 years, the situation has changed dramatically: the barriers to entry have decreased, and open-source libraries have proliferated. Nowadays, students and professionals from low-income countries have access to a large stack of computer vision resources and can develop high impact applications in short timescales.

Figure 1: Related disciplines to Computer Vision. Credit: Roberto E. Gonzalez.

Observational astronomy is a division of astronomy that is concerned with recording data about the observable Universe. Ground-based and space telescopes are used nightly to observe planets and distant galaxies. Specialized telescope instruments collect raw data that is stored in remote servers and later processed using several image processing and analysis pipelines.

The usual tasks related to the processing of astronomical images are systematics effect removal, point source detection, and image enhancement. These tasks are available in several applications, such as IRAF1 and libraries such as Astropy2, and they are routinely used by astronomers and engineers.

Nowadays, most of the data acquisition and processing tasks are fully automated. The community has developed several data reduction pipelines and frameworks that are freely available and can be easily used by professionals working at telescopes, universities, and outside academia.

Figure 2: A map showing some of what the Sloan Digital Sky Survey has discovered over the last twenty years. Image Credit: V. Belokurov, M. R. Blanton, A. Bonaca, X. Fan, M. C. Geha, R. H. Lupton, the SDSS Collaboration (https://www.sdss.org)

The Sloan Digital Sky Survey (SDSS3) is the largest astronomical survey ever executed, producing a catalog that contains about 500 million sources. The entire dataset weighs more than 100 TB and it includes images, spectra, and catalogs from one-third of the celestial sky. The data reduction and analysis of the data were done initially using customized pipelines developed by astronomers, data scientists, and engineers from several universities and institutes in the USA, and was later expanded by professionals from all over the world.

Although data reduction and preparation is mostly done using classical image processing methods, there still a lot of space for improvements in the areas of data analysis and visualization. Computer vision looks like a promising solution to facilitate the analysis of the big data in Astronomy and to accelerate the discovery of structures and phenomena in the Universe. However, this is not an easy task; the introduction of new methods or techniques coming from different fields(interdisciplinarity) is slow and usually is delayed several years from the state-of-the-art. The reason for this may be explained by two factors: one is that there are not may interdisciplinary scientists who bring knowledge from other fields; the second factor is that knowledge spreads into other fields after it becomes mature and well-developed. As an example, we have  computer vision techniques developed in the field of Computer Science, which are associated with machine- or deep-learning, arrive into Astrophysics ~4-5 years after they are developed, and it can be easily seen by counting the number of paper publications related to computer vision/deep learning/machine learning, which are fewer than 300, and concepts such as Deep Learning, Faster-CNN, and SSD have only just appeared in papers since ~2017-2018.

In this context, AstroCV4 repository appears to be an invitation to join efforts to reduce this time delay in the knowledge transfer from Computer Vision into Astrophysics, especially now with the overwhelming growth of knowledge in Computer Vision and access to new development frameworks and cheaper GPU computational power.

As part of the AstroCV initiative, we train a galaxy detection and identification model using state-of-the-art SSD neural networks framework(Darknet), and we develop a new data augmentation procedure to make this robust against images coming from different filters and instruments. The training set is built from the Galaxy Zoo5 database, with a classification of elliptical, spiral, edge-on, and merge galaxies. Data augmentation is very important for any model training scenario; it helps to improve the results of small training sets and make models more reliable in different conditions. In particular, astronomy images are taken in multiple filters and in FITS format with raw CCD data for each pixel, then data conversion from FITS to a RGB image is not unique and depends on the telescope’s camera, band filters, reduction schema, and on the conversion method used to scale photon counts to color scale.

We produced a data augmentation schema including several color conversion methods on the same objects, resulting in an important improvement in detection for images coming from different telescopes/instruments, taking into account we used a training set from SDSS instrument only. In Figure 3, we show results for images from SDSS reaching a recall ratio of 90%. However, for images taken from different color filters and telescope, results are not that good, and performance may drop down to even 20% recall performance. Including our data augmentation procedure, we get up to 3x better recall results. In Figure 4, we show results for an image taken from the Hubble Deep Field.

Figure 3: Galaxies found using our model in a typical SDSS image. Credit: Roberto Gonzalez
Figure 4: Galaxies in a Hubble Deep field image with data augmentation. (without data augmentation we could find one third of the galaxies only). Credit: Roberto Gonzalez

Roberto Gonzalez and Roberto Muñoz are formerly astronomers and moved to the Computer Vision Industry for a Chilean company MetricArts6, so knowledge transfer between Astrophysics, Computer Science and the Industry has become a daily basis process for them. They think that interdisciplinarity and collaboration between the technology industry and academics are fundamental to lead in the Computer Vision and AI fields. However, it requires a change of thinking from a traditional academy, and from traditional industry, where interdisciplinarity and knowledge transfer have a low value, especially in less developed countries.

These findings are described in the article entitled Galaxy detection and identification using deep learning and data augmentation, recently published in the journal Astronomy and Computing.


  1. Image Reduction and Analysis Facility http://iraf.noao.edu/
  2. http://www.astropy.org/
  3. https://www.sdss.org/
  4. https://github.com/astroCV
  5. https://www.galaxyzoo.org
  6. www.metricarts.com

About The Author

Roberto E. González

Roberto is the Chief Scientist at MetricArts & Cluster HPC support scientist at Centro Astro-Ingenieria UC. He holds a PhD in Astrophysics and expert in parallel computing, HPC, computer vision and big data. Roberto has broad interdisciplinary experience between Computer Science and Astrophysics, with publications in both fi elds. Expertise in Large scale structure, Cosmic-web metrics, Cosmological Hydro-Nbody simulations, and The Local Group in a cosmological context to probe LCDM. Four years experience as lecture professor of courses for science and engineering careers at Universidad Catolica. Extensive collaboration with professors in universities abroad. Collaboration and participation with Industry and leading CORFO R&D projects.

Speak Your Mind!


A Novel Mechanism Of B Cell Activation By Bacteria

The bacterial species Burkholderia ambifaria belongs to the Burkholderia cepacia complex, a group of related bacterial strains, which can cause opportunistic infections in immunocompromised hosts. These bacteria produce various virulence factors, among which are soluble carbohydrate-binding proteins, so-called lectins. The lectin BambL from Burkholderia ambifaria binds to the carbohydrate fucose with high affinity. Fucose residues […]

Researchers Are Now Creating Body Parts Using 3D Printing

Science fiction pushes the boundaries of the methods, theories, and tools that we have today. They allow us an insight into a future that could be, albeit in a more exaggerated and fantastical way. One of the more intriguing concepts of science fiction is being able to recover your body parts with ease and without […]

Improved Stability Of Live, Attenuated Vaccine gdhA Derivative Pasteurella Multocida B:2 By Freeze Drying Technique

The use of live attenuated vaccine gdhA derivative Pasteurella multocida B:2 that can be administered intranasally is an alternative way to protect cattle and buffaloes from hemorrhagic septicaemi disease and increase vaccination coverage. Despite its proven record, however, there has been limited success in the development of a commercial live attenuated vaccine that is associated with […]

Increased PPAR-γ Expression To Help Regulate Obesity And Overweight

Accordingly to the World Health Organization (WHO), almost 2 billion adults were overweight in 2016. Of these, over 650 million were obese. Clearly, overweight and obesity have become a global emergency with serious health concerns for affected individuals. Indeed, being overweight or obese increases the risks of diabetes, hypertension, atherosclerosis, and cancer. Considering the health […]

Anion Exchange Membrane Crosslinked In The Easiest Way Exhibits High Alkaline Stability

Anion exchange membrane (AEM) is like plastic wrap in the kitchen, but it consists of a polymer backbone and positively charged functional groups, which can exchange anions. The use of AEM is growing quickly in electrochemical energy conversion devices (alkaline fuel cells) and energy storage systems (e.g. flow batteries). As its core component, AEM must […]

Transflammation: A New Frontier In Regenerative Medicine

Cardiovascular regeneration focuses on repairing or replacing damaged or senescent cardiac and vascular tissue. This damage is largely caused by myocardial ischemia (poor perfusion of the heart causing dysfunction or loss of cardiac tissue) and fibrosis (with replacement of myocytes and vessels with non-functional scar tissue), each of which may lead to heart failure, impaired […]

The Effects Of Natural Organic Matter On Nanomaterials

Accompanied by the rapid development of modern technology, more and more nanomaterial-bearing products are being invented: antibacterial phone shells, antifungal socks, carbon fiber-reinforced rackets, quantum dot displays, etc. These products tremendously improve the quality of our daily lives, but also raise a serious problem: nanomaterial pollution. Literally, nanomaterial refers to the material with a size […]