Tutorial 11a - Random Forest Regressor to Predict Optimal Lens Properties

Building a Random Forest Regressor to Predict Optimal Lens Properties

This notebook demonstrates how to build and train a random forest regressor to predict the radius of curvature of a plano-convex lens in order to minimize the RMS spot size.

Steps Included:

Data Preparation: Generate and preprocess a dataset of optimal lens designs.
Model Training: Train the random forest regressor using the prepared dataset.
Model Evaluation: Evaluate the performance of the trained model.
Prediction: Use the trained model to predict the optimal radius of curvature for new lens parameters to minimize the RMS spot size.

[1]:

import warnings

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
from tqdm import tqdm

from optiland import analysis, materials, optic, optimization

Build a configurable plano-convex singlet

We start by defining a class, which parameterizes a plano-convex lens based on three properties:

Entrance Pupil Diameter (EPD) in mm
Index of refraction at 587 nm
Radius of curvature of the lens front surface

[2]:

class ConfigurableSinglet(optic.Optic):
    """A class to represent a configurable singlet lens.

    This class inherits from the `optic.Optic` class and allows for the creation
    of a singlet lens with configurable parameters: entrance pupil diameter (EPD),
    refractive index (n), and radius of curvature (R). The lens thickness is fixed
    at 1/5 of the entrance pupil diameter to maintain a reasonable aspect ratio.

    Attributes:
        epd (float): Entrance pupil diameter.
        n (float): Refractive index of the lens material.
        R (float): Radius of curvature of the lens.

    """

    def __init__(self, epd, n, R):
        super().__init__()

        mat = materials.IdealMaterial(n=n)

        # add surfaces
        self.surfaces.add(index=0, radius=np.inf, thickness=np.inf)
        self.surfaces.add(
            index=1,
            radius=R,
            thickness=epd / 5,
            is_stop=True,
            material=mat,
        )
        self.surfaces.add(index=2, radius=np.inf, thickness=15)
        self.surfaces.add(index=3)

        # add aperture
        self.set_aperture(aperture_type="EPD", value=epd)

        # add field
        self.fields.set_type(field_type="angle")
        self.fields.add(y=0)

        # add wavelength
        self.wavelengths.add(value=0.587, is_primary=True)

        # move image plane to paraxial focus
        self.image_solve()

Generate and draw a lens

This is an example of one possible lens configuration.

[3]:

lens = ConfigurableSinglet(epd=25, n=1.5, R=25)
lens.draw(num_rays=5)

../_images/examples_Singlet_RF_Model_RMS_Spot_Size_5_0.png

Data preparation

We build a dataset that we will use to train our random forest regressor. The ultimate goal of the predictive model will be to predict the radius of curvature of our plano-convex lens given the lens f-number and the refractive index (at 587 nm). We will run 2500 simulations, each with a different f-number and refractive index. We choose the following ranges of our variables:

F-number: 2 to 10
Refractive Index: 1.4 to 2.0

[4]:

fno = np.linspace(2, 10, 50)
n = np.linspace(1.4, 2.0, 50)

# Total number of simulations = 50^2 = 2500
fno, n = np.meshgrid(fno, n)
fno = fno.ravel()
n = n.ravel()

Generate helper function for optimization

This function will identify the optimal singlet, given the f-number and refractive index of the lens.

[5]:

def optimize_lens(lens):
    # Define empty optimization problem
    problem = optimization.OptimizationProblem()

    # Add target for RMS spot size - use on-axis field and a hexapolar distribution
    # Surface number is set to -1 to indicate the image plane
    input_data = {
        "optic": lens,
        "surface_number": -1,
        "Hx": 0,
        "Hy": 0,
        "num_rays": 5,
        "wavelength": 0.587,
        "distribution": "hexapolar",
    }
    problem.add_operand(
        operand_type="rms_spot_size",
        target=0,
        weight=10,
        input_data=input_data,
    )

    # Add target for focal length
    problem.add_operand(
        operand_type="f2",
        target=20,
        weight=1,
        input_data={"optic": lens},
    )

    # Allow the thickness to image plane and the radius of curvature to vary
    problem.add_variable(lens, "radius", surface_number=1)  # first lens surface
    problem.add_variable(lens, "thickness", surface_number=2)

    # Define optimizer
    optimizer = optimization.OptimizerGeneric(problem)

    # Optimize - ignore warnings if ray errors occur
    with warnings.catch_warnings():
        warnings.simplefilter("ignore")
        optimizer.optimize(tol=1e-6)

    # Retrieve optimized rms spot size
    rms_spot_size = problem.operands[0].value

    # Retrieve optimized focal length
    f_opt = problem.operands[1].value

    return lens, rms_spot_size, f_opt

Optimize lenses

We will record all data in a pandas dataframe. Without loss of generality, we fix the lens focal length to 20 mm. During optimization, we also enforce this.

[6]:

df = pd.DataFrame({"FNO": fno, "n": n})

df["EPD"] = 20 / df["FNO"]

# Variables to store the optimized values
df["R_opt"] = np.nan
df["rms_spot_size"] = np.nan
df["f_opt"] = np.nan

Run optimization for 2500 systems. This takes 3-4 minutes on the author’s machine.

[7]:

for k, row in tqdm(df.iterrows(), total=len(df)):
    # Generate starting lens
    lens = ConfigurableSinglet(epd=row["EPD"], n=row["n"], R=15)

    # Optimize lens
    lens, rms_spot_size, f_opt = optimize_lens(lens)

    # Store optimized values
    df.loc[k, "R_opt"] = lens.surfaces.radii[1]
    df.loc[k, "rms_spot_size"] = rms_spot_size
    df.loc[k, "f_opt"] = f_opt

100%|██████████| 2500/2500 [03:38<00:00, 11.44it/s]

Not all systems may have been fully optimized, so we will exclude these when building the random forest model. We can check for optimization success by comparing the resulting focal length to our target of 20 mm.

[8]:

df["optimization_success"] = (
    np.abs(df["f_opt"] - 20) < 0.01
)  # choose a tolerance of 0.01 mm

# View the number of successes and failures
df["optimization_success"].value_counts()

[8]:

optimization_success
True     2352
False     148
Name: count, dtype: int64

Investigate correlations between lens properties and optimized results.

[9]:

# Get dataframe of successful optimizations
df_success = df[df.optimization_success]

# Compute correlation matrix (exclude optimization_success & focal length columns)
corr = df_success.drop(columns=["optimization_success", "f_opt"]).corr()

# Generate a mask for the upper triangle
mask = np.triu(np.ones_like(corr, dtype=bool))

# Set up the matplotlib figure
f, ax = plt.subplots(figsize=(9, 7))

# Generate a custom diverging colormap
cmap = sns.diverging_palette(230, 20, as_cmap=True)

# Draw the heatmap with the mask and correct aspect ratio
sns.heatmap(
    corr,
    mask=mask,
    cmap=cmap,
    center=0,
    square=True,
    linewidths=0.5,
    cbar_kws={"shrink": 0.5},
)

[9]:

<Axes: >

../_images/examples_Singlet_RF_Model_RMS_Spot_Size_17_1.png

We identify a few relationships:

Larger refractive indices generally result in larger optimal radii of curvature.
Larger f-numbers result in smaller achieved RMS spot sizes.

Plot relationship between FNO, refractive index, and achieved RMS spot size (mm)

[10]:

plt.scatter(df_success.FNO, df_success.rms_spot_size, c=df_success.n, alpha=0.5)
plt.xlabel("Image-space F-number")
plt.ylabel("RMS spot size (mm)")
cbar = plt.colorbar()
cbar.set_label("Refractive index (n)", rotation=270, labelpad=20)
plt.show()

../_images/examples_Singlet_RF_Model_RMS_Spot_Size_20_0.png

This plot shows that there appears to be an RMS spot size minimum, which varies with f-number. We also see that larger refractive indices appear to result in smaller RMS spot sizes, which appears as a gradient-like effect that can especially be seen at smaller f-numbers.

Split data into features and target variable

[11]:

X = df_success[["n", "FNO"]]
y = df_success["R_opt"]

Split the data into training and testing sets

[12]:

X_train, X_test, y_train, y_test = train_test_split(
    X,
    y,
    test_size=0.2,
    random_state=42,
)

Train regression model

[13]:

model = RandomForestRegressor(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

[13]:

RandomForestRegressor(random_state=42)

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Evaluate the model

[14]:

y_pred = model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error: {mse:.3e}")

Mean Squared Error: 2.457e-07

Based on the resulting mean squared error on the test dataset, we can see that the model has successfully captured the relationships in the dataset.

We now choose new parameters for the refractive index and f-number and use our random forest model to predict the radius of curvature. We compute the RMS spot size and plot the spot diagram to confirm the resulting lens indeed reaches a minimal spot size.

[35]:

new_data = pd.DataFrame({"n": [1.864], "FNO": [5.45]})  # Example new data
prediction = model.predict(new_data)
print(f"Predicted Radius of Curvature: {prediction[0]:.2f} mm")

Predicted Radius of Curvature: 17.31 mm

[36]:

lens = ConfigurableSinglet(epd=20 / 5.45, n=1.864, R=prediction[0])
lens.draw(num_rays=5)

../_images/examples_Singlet_RF_Model_RMS_Spot_Size_34_0.png

[37]:

spot = analysis.SpotDiagram(lens)
print(
    f"RMS Spot Size: {spot.rms_spot_radius()[0][0]:.4f} mm",
)  # field index 0 and wavelength index 0

RMS Spot Size: 0.0052 mm

[38]:

spot.view()

../_images/examples_Singlet_RF_Model_RMS_Spot_Size_36_0.png

This notebook demonstrated how a random forest regressor can be trained to predict optimal lens parameters to achieve a minimal RMS spot size.
Although we chose a random forest model, other (likely simpler) models could also have been used.
Additional hyperparameter tuning or an expanded dataset would likely improve the model performance further.