← Back to home Open Colab

Notebook Presentation

Stellar Classification Using Gaia Data

ExplanationStep 1

Building a Hertzsprung-Russell Diagram from Gaia Stellar Data

ExplanationStep 2

Project goal

The goal of this project is to use real Gaia stellar catalog data to build a Hertzsprung-Russell diagram.

The Hertzsprung-Russell diagram shows the relationship between a star's color and its absolute brightness. It is one of the most important tools in astrophysics because it reveals major stellar populations such as main sequence stars, red giants, and white dwarfs.

In this project, I will use Python to:

  • access Gaia data
  • clean a sample of stellar measurements
  • calculate stellar distances from parallax
  • estimate absolute magnitudes
  • plot an HR diagram
  • identify the main stellar regions
ExplanationStep 3

Importing libraries

CodeStep 4
Python
!pip install astroquery -q
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 11.1/11.1 MB 33.3 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 32.3 MB/s eta 0:00:00
CodeStep 5
Python
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from astroquery.gaia import Gaia
Output log 1
In preparation for Gaia DR4, the Gaia archive is in evolution. Unfortunately, it may be unstable at times and particular types of queries may time out. Please consider registering for a user account (https://www.cosmos.esa.int/web/gaia-users/register). For questions or advice, please contact the Gaia helpdesk (https://www.cosmos.esa.int/web/gaia/gaia-helpdesk).
ExplanationStep 6

Downloading Gaia stellar data

I use a small sample of stars from Gaia DR3.

The selected columns are:

  • parallax, used to estimate distance
  • phot_g_mean_mag, apparent brightness in Gaia's G band
  • bp_rp, color index
  • parallax_error, used to check measurement quality

Only stars with reliable parallaxes are selected. This helps produce a cleaner Hertzsprung-Russell diagram.

CodeStep 7
Python
query = """
SELECT TOP 15000
    source_id,
    ra,
    dec,
    parallax,
    parallax_error,
    phot_g_mean_mag,
    bp_rp,
    phot_bp_mean_mag,
    phot_rp_mean_mag
FROM gaiadr3.gaia_source
WHERE
    parallax > 2
    AND parallax_over_error > 10
    AND phot_g_mean_mag IS NOT NULL
    AND bp_rp IS NOT NULL
    AND phot_bp_mean_mag IS NOT NULL
    AND phot_rp_mean_mag IS NOT NULL
    AND phot_g_mean_mag BETWEEN 4 AND 18
    AND bp_rp BETWEEN -0.5 AND 4.0
"""

job = Gaia.launch_job_async(query)
results = job.get_results()

df = results.to_pandas()

print("Number of stars downloaded:", len(df))
df.head()
INFO:astroquery:Query finished.
INFO: Query finished. [astroquery.utils.tap.core]
Number of stars downloaded: 15000
Output log 3
            source_id         ra        dec  parallax  parallax_error  \
0  137341028218921216  45.215014  35.435171  2.150371        0.109505   
1  137354913848176384  45.082992  35.521805  4.414815        0.030129   
2  138830527172656128  46.077017  35.815947  3.279607        0.143288   
3  138839422049279488  45.376115  35.460993  3.351197        0.119281   
4  138847981919616896  45.533685  35.600378  6.688006        0.018479   

   phot_g_mean_mag     bp_rp  phot_bp_mean_mag  phot_rp_mean_mag  
0        17.498400  2.323648         18.742424         16.418776  
1        15.103880  2.107482         16.179569         14.072087  
2        17.698900  2.637417         19.086605         16.449188  
3        17.578335  2.333639         18.822166         16.488527  
4         9.206958  0.736701          9.490245          8.753544  
ExplanationStep 8

Calculating stellar distance and absolute magnitude

Gaia measures stellar parallax in milliarcseconds. Parallax is used to estimate the distance to a star.

Distance in parsecs is calculated as:

distance = 1000 / parallax

The apparent G magnitude describes how bright a star appears from Earth. To compare stars physically, I calculate the absolute G magnitude, which estimates how bright the star would appear from a standard distance of 10 parsecs.

Absolute magnitude is calculated as:

M_G = G - 5 * log10(distance / 10)

CodeStep 9
Python
df["distance_pc"] = 1000 / df["parallax"]

df["absolute_g_mag"] = df["phot_g_mean_mag"] - 5 * np.log10(df["distance_pc"] / 10)

df[["parallax", "distance_pc", "phot_g_mean_mag", "absolute_g_mag", "bp_rp"]].head()
Output log 1
   parallax  distance_pc  phot_g_mean_mag  absolute_g_mag     bp_rp
0  2.150371   465.036062        17.498400        9.160967  2.323648
1  4.414815   226.510081        15.103880        8.328442  2.107482
2  3.279607   304.914574        17.698900       10.278009  2.637417
3  3.351197   298.400881        17.578335       10.204334  2.333639
4  6.688006   149.521398         9.206958        3.333441  0.736701
ExplanationStep 10

Building the Hertzsprung-Russell diagram

The Hertzsprung-Russell diagram plots stellar color against absolute brightness.

In this diagram:

  • The x-axis shows BP-RP color index
  • The y-axis shows absolute G magnitude
  • Hotter, bluer stars appear on the left
  • Cooler, redder stars appear on the right
  • Brighter stars appear toward the top

The y-axis is inverted because lower magnitude values mean higher brightness.

CodeStep 11
Python
import os

os.makedirs("figures", exist_ok=True)

sun_bp_rp = 0.82
sun_abs_g = 4.67

plt.figure(figsize=(8, 10))

hb = plt.hexbin(
    df["bp_rp"],
    df["absolute_g_mag"],
    gridsize=180,
    mincnt=1,
    bins="log"
)

plt.gca().invert_yaxis()

plt.xlabel("Gaia BP-RP Color Index")
plt.ylabel("Absolute G Magnitude")
plt.title("Hertzsprung-Russell Diagram from Gaia DR3 Data")

cbar = plt.colorbar(hb)
cbar.set_label("Stellar density")

plt.scatter(
    sun_bp_rp,
    sun_abs_g,
    s=160,
    marker="*",
    edgecolors="black",
    linewidths=0.8,
    label="Sun"
)

plt.text(sun_bp_rp + 0.08, sun_abs_g, "Sun", fontsize=11)

plt.text(1.7, 8.0, "Main Sequence", fontsize=12)
plt.text(1.4, 1.0, "Red Giants", fontsize=12)
plt.text(0.0, 12.0, "White Dwarf Region", fontsize=12)

plt.legend()

plt.savefig("figures/final_hr_diagram.png", dpi=300, bbox_inches="tight")
plt.show()
Notebook plot 1
Plot 1 from the notebook output
ExplanationStep 12

The diagram shows that stars are not randomly distributed.

Most stars form a clear diagonal band called the main sequence. These stars are in a stable stage of stellar evolution, where they generate energy by fusing hydrogen in their cores.

The upper region contains brighter evolved stars, including red giants. These stars have expanded and become much more luminous than main sequence stars of similar color.

White dwarfs are expected in the lower-left region of the HR diagram. In this sample, the white dwarf region is sparsely populated because the query selected a general nearby-star sample rather than a white-dwarf-focused sample.

ExplanationStep 13

Results

The HR diagram built from Gaia DR3 data shows a clear main sequence. This confirms that the selected stars are not randomly distributed in color and luminosity.

Main findings:

  • Most stars in the sample lie on the main sequence.
  • The Sun appears on the main sequence, as expected for a stable hydrogen-burning star.
  • A population of brighter evolved stars appears above the main sequence.
  • The lower-left region corresponds to where white dwarfs are expected, although this sample contains fewer such objects.
  • The structure of the diagram is consistent with the standard Hertzsprung-Russell diagram used in stellar astrophysics.

This project shows that real Gaia catalog data can reproduce the main structure of stellar populations.

ExplanationStep 14

Limitations

This project uses a simplified analysis pipeline.

Main limitations:

  • The sample contains 15,000 stars, not the full Gaia catalog.
  • Distances are estimated directly from parallax, which is less accurate for stars with larger uncertainty.
  • Interstellar extinction and reddening are not corrected.
  • The query selects a general nearby-star sample, not a specialized sample for white dwarfs or red giants.
  • Stellar classification is done visually from the HR diagram rather than using formal astrophysical classification models.

Despite these limitations, the project successfully reproduces the main features of the Hertzsprung-Russell diagram using real observational data.

ExplanationStep 15

Conclusion

In this project, I used Gaia DR3 stellar catalog data to build a Hertzsprung-Russell diagram.

I calculated stellar distances from parallax, estimated absolute G magnitudes, and plotted stellar color against intrinsic brightness. The resulting diagram showed the main sequence, evolved giant stars, and the expected white dwarf region.

The project demonstrates how large astronomical catalogs can be used to study stellar populations and understand the physical stages of stellar evolution.

ExplanationStep 16

Sources

  • Gaia DR3 stellar catalog
  • ESA Gaia Mission
  • Astroquery Python package
  • Gaia Archive