Notebook Presentation
Stellar Classification Using Gaia Data
Project goal
The goal of this project is to use real Gaia stellar catalog data to build a Hertzsprung-Russell diagram.
The Hertzsprung-Russell diagram shows the relationship between a star's color and its absolute brightness. It is one of the most important tools in astrophysics because it reveals major stellar populations such as main sequence stars, red giants, and white dwarfs.
In this project, I will use Python to:
- access Gaia data
- clean a sample of stellar measurements
- calculate stellar distances from parallax
- estimate absolute magnitudes
- plot an HR diagram
- identify the main stellar regions
Importing libraries
!pip install astroquery -q━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 11.1/11.1 MB 33.3 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 32.3 MB/s eta 0:00:00
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from astroquery.gaia import GaiaOutput log 1
In preparation for Gaia DR4, the Gaia archive is in evolution. Unfortunately, it may be unstable at times and particular types of queries may time out. Please consider registering for a user account (https://www.cosmos.esa.int/web/gaia-users/register). For questions or advice, please contact the Gaia helpdesk (https://www.cosmos.esa.int/web/gaia/gaia-helpdesk).
Downloading Gaia stellar data
I use a small sample of stars from Gaia DR3.
The selected columns are:
- parallax, used to estimate distance
- phot_g_mean_mag, apparent brightness in Gaia's G band
- bp_rp, color index
- parallax_error, used to check measurement quality
Only stars with reliable parallaxes are selected. This helps produce a cleaner Hertzsprung-Russell diagram.
query = """
SELECT TOP 15000
source_id,
ra,
dec,
parallax,
parallax_error,
phot_g_mean_mag,
bp_rp,
phot_bp_mean_mag,
phot_rp_mean_mag
FROM gaiadr3.gaia_source
WHERE
parallax > 2
AND parallax_over_error > 10
AND phot_g_mean_mag IS NOT NULL
AND bp_rp IS NOT NULL
AND phot_bp_mean_mag IS NOT NULL
AND phot_rp_mean_mag IS NOT NULL
AND phot_g_mean_mag BETWEEN 4 AND 18
AND bp_rp BETWEEN -0.5 AND 4.0
"""
job = Gaia.launch_job_async(query)
results = job.get_results()
df = results.to_pandas()
print("Number of stars downloaded:", len(df))
df.head()INFO:astroquery:Query finished.
INFO: Query finished. [astroquery.utils.tap.core] Number of stars downloaded: 15000
Output log 3
source_id ra dec parallax parallax_error \ 0 137341028218921216 45.215014 35.435171 2.150371 0.109505 1 137354913848176384 45.082992 35.521805 4.414815 0.030129 2 138830527172656128 46.077017 35.815947 3.279607 0.143288 3 138839422049279488 45.376115 35.460993 3.351197 0.119281 4 138847981919616896 45.533685 35.600378 6.688006 0.018479 phot_g_mean_mag bp_rp phot_bp_mean_mag phot_rp_mean_mag 0 17.498400 2.323648 18.742424 16.418776 1 15.103880 2.107482 16.179569 14.072087 2 17.698900 2.637417 19.086605 16.449188 3 17.578335 2.333639 18.822166 16.488527 4 9.206958 0.736701 9.490245 8.753544
Calculating stellar distance and absolute magnitude
Gaia measures stellar parallax in milliarcseconds. Parallax is used to estimate the distance to a star.
Distance in parsecs is calculated as:
distance = 1000 / parallax
The apparent G magnitude describes how bright a star appears from Earth. To compare stars physically, I calculate the absolute G magnitude, which estimates how bright the star would appear from a standard distance of 10 parsecs.
Absolute magnitude is calculated as:
M_G = G - 5 * log10(distance / 10)
df["distance_pc"] = 1000 / df["parallax"]
df["absolute_g_mag"] = df["phot_g_mean_mag"] - 5 * np.log10(df["distance_pc"] / 10)
df[["parallax", "distance_pc", "phot_g_mean_mag", "absolute_g_mag", "bp_rp"]].head()Output log 1
parallax distance_pc phot_g_mean_mag absolute_g_mag bp_rp 0 2.150371 465.036062 17.498400 9.160967 2.323648 1 4.414815 226.510081 15.103880 8.328442 2.107482 2 3.279607 304.914574 17.698900 10.278009 2.637417 3 3.351197 298.400881 17.578335 10.204334 2.333639 4 6.688006 149.521398 9.206958 3.333441 0.736701
Building the Hertzsprung-Russell diagram
The Hertzsprung-Russell diagram plots stellar color against absolute brightness.
In this diagram:
- The x-axis shows BP-RP color index
- The y-axis shows absolute G magnitude
- Hotter, bluer stars appear on the left
- Cooler, redder stars appear on the right
- Brighter stars appear toward the top
The y-axis is inverted because lower magnitude values mean higher brightness.
import os
os.makedirs("figures", exist_ok=True)
sun_bp_rp = 0.82
sun_abs_g = 4.67
plt.figure(figsize=(8, 10))
hb = plt.hexbin(
df["bp_rp"],
df["absolute_g_mag"],
gridsize=180,
mincnt=1,
bins="log"
)
plt.gca().invert_yaxis()
plt.xlabel("Gaia BP-RP Color Index")
plt.ylabel("Absolute G Magnitude")
plt.title("Hertzsprung-Russell Diagram from Gaia DR3 Data")
cbar = plt.colorbar(hb)
cbar.set_label("Stellar density")
plt.scatter(
sun_bp_rp,
sun_abs_g,
s=160,
marker="*",
edgecolors="black",
linewidths=0.8,
label="Sun"
)
plt.text(sun_bp_rp + 0.08, sun_abs_g, "Sun", fontsize=11)
plt.text(1.7, 8.0, "Main Sequence", fontsize=12)
plt.text(1.4, 1.0, "Red Giants", fontsize=12)
plt.text(0.0, 12.0, "White Dwarf Region", fontsize=12)
plt.legend()
plt.savefig("figures/final_hr_diagram.png", dpi=300, bbox_inches="tight")
plt.show()The diagram shows that stars are not randomly distributed.
Most stars form a clear diagonal band called the main sequence. These stars are in a stable stage of stellar evolution, where they generate energy by fusing hydrogen in their cores.
The upper region contains brighter evolved stars, including red giants. These stars have expanded and become much more luminous than main sequence stars of similar color.
White dwarfs are expected in the lower-left region of the HR diagram. In this sample, the white dwarf region is sparsely populated because the query selected a general nearby-star sample rather than a white-dwarf-focused sample.
Results
The HR diagram built from Gaia DR3 data shows a clear main sequence. This confirms that the selected stars are not randomly distributed in color and luminosity.
Main findings:
- Most stars in the sample lie on the main sequence.
- The Sun appears on the main sequence, as expected for a stable hydrogen-burning star.
- A population of brighter evolved stars appears above the main sequence.
- The lower-left region corresponds to where white dwarfs are expected, although this sample contains fewer such objects.
- The structure of the diagram is consistent with the standard Hertzsprung-Russell diagram used in stellar astrophysics.
This project shows that real Gaia catalog data can reproduce the main structure of stellar populations.
Limitations
This project uses a simplified analysis pipeline.
Main limitations:
- The sample contains 15,000 stars, not the full Gaia catalog.
- Distances are estimated directly from parallax, which is less accurate for stars with larger uncertainty.
- Interstellar extinction and reddening are not corrected.
- The query selects a general nearby-star sample, not a specialized sample for white dwarfs or red giants.
- Stellar classification is done visually from the HR diagram rather than using formal astrophysical classification models.
Despite these limitations, the project successfully reproduces the main features of the Hertzsprung-Russell diagram using real observational data.
Conclusion
In this project, I used Gaia DR3 stellar catalog data to build a Hertzsprung-Russell diagram.
I calculated stellar distances from parallax, estimated absolute G magnitudes, and plotted stellar color against intrinsic brightness. The resulting diagram showed the main sequence, evolved giant stars, and the expected white dwarf region.
The project demonstrates how large astronomical catalogs can be used to study stellar populations and understand the physical stages of stellar evolution.
Sources
- Gaia DR3 stellar catalog
- ESA Gaia Mission
- Astroquery Python package
- Gaia Archive