Using JupyterLab to monitor coastal and marine environments in the Philippines on CopPhil
This notebook demonstrates how Sentinel-2 satellite imagery can be used to estimate and visualize key water quality indicators in Laguna de Bay, Philippines.
By applying spectral indices derived from Sentinel-2 reflectance bands, we can monitor parameters such as chlorophyll-a, cyanobacteria, turbidity, CDOM, DOC, and water color.
What this notebook covers
Connecting to the
/eodataarchive to access Sentinel-2 Level-2A productsLoading and preprocessing the required spectral bands
Calculating empirical water quality indices
Visualizing results with custom color scales and percentile-based clipping
Exploring how satellite-based monitoring can support environmental management ### Important Note The values derived in this notebook are based on empirical models. They provide useful relative indicators of water quality but cannot be considered absolute concentrations without in-situ calibration and validation. Ground measurements are essential for accurate monitoring and decision-making. ### References The empirical formulas for water quality proxies are adapted from the Sentinel Hub SE2WAQ.
Prerequisites
No. 1 Access to CopPhil site
You need a CopPhil hosting account, available at https://infra.copphil.philsa.gov.ph.
No. 2 Access to JupyterLab
JupyterLab used in this article is available here: https://jupyter.infra.copphil.philsa.gov.ph/hub/login?next=%2Fhub%2F.
No. 3 Working knowledge of JupyterLab
See article Introduction to JupyterLab on CopPhil
No. 4 Using API
This is an introductory example of obtaining EODATA using API calls, in Python:
Download satellite data using API on CopPhil
No. 5 Information on Sentinel-5P mission
Page Sentinel-5P mission shows basic information on Sentinel-5P mission, which is used in this article as a source of information.
No. 6 Have credentials to access EODATA ready
To obtain the necessary credentials to access EODATA, follow this article:
How to get credentials to access EODATA on CopPhil
You can generate up to 200 different keys per account.
Prepare your environment
import requests
import json
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import os
import boto3
import rasterio
import matplotlib.colors as mcolors
- We first establish a connection to the CopPhil S3 storage provided
by CloudFerro. This bucket (
/eodata) hosts the Sentinel-2 archive, accessible directly as.SAFEfolders.
If the connection succeeds, we can later open Sentinel-2 products directly from /eodata paths without downloading them locally.
#Enter your eodata credentials here:
access_key = ''
secret_key = ''
host = 'https://eodata.infra.copphil.philsa.gov.ph/'
# Initialize S3 resource
s3 = boto3.resource(
's3',
aws_access_key_id=access_key,
aws_secret_access_key=secret_key,
endpoint_url=host,
)
# Verify connection by accessing the 'eodata' bucket
bucket_name = 'eodata'
try:
bucket = s3.Bucket(bucket_name)
# Check if the bucket is accessible by listing its contents
objects = list(bucket.objects.limit(1)) # Limit to 1 object for testing
if objects:
print(f"Connection to bucket '{bucket_name}' successful. Objects found.")
else:
print(f"Connection to bucket '{bucket_name}' successful, but no objects found.")
except Exception as e:
print(f"Connection to bucket '{bucket_name}' failed:", e)
Connection to bucket 'eodata' successful. Objects found.
Search Sentinel-2 Products (Manila Bay)
We create a search query for the CopPhil OData API to find Sentinel-2 products that cover Manila Bay between the beggining of 2024 and today.
# search parameters: Sentinel-2 L2A over Manila Bay
catalogue_odata_url = "https://catalogue.infra.copphil.philsa.gov.ph/odata/v1"
collection_name = "SENTINEL-2"
product_type = "S2MSI2A" # Level-2A (surface reflectance)
# AOI: Manila Bay
aoi = "POLYGON((121.05 14.65, 121.50 14.65, 121.50 14.15, 121.05 14.15, 121.05 14.65))"
# Time window: wide window to get more scenes
search_period_start = "2024-01-01T00:00:00.000Z"
search_period_end = "2025-08-26T23:59:59.000Z"
# Optional: Cloud coverage set to <10% to get clearer images
# (cloudCover is a DoubleAttribute in the catalogue)
cloud_cover_max = 10.0
search_query = (
f"{catalogue_odata_url}/Products?"
f"$filter=Collection/Name eq '{collection_name}' "
f"and Attributes/OData.CSC.StringAttribute/any(att:att/Name eq 'productType' "
f"and att/OData.CSC.StringAttribute/Value eq '{product_type}') "
f"and Attributes/OData.CSC.DoubleAttribute/any(att:att/Name eq 'cloudCover' "
f"and att/OData.CSC.DoubleAttribute/Value lt {cloud_cover_max}) " #lt = less than
f"and OData.CSC.Intersects(area=geography'SRID=4326;{aoi}') "
f"and ContentDate/Start gt {search_period_start} "
f"and ContentDate/Start lt {search_period_end}"
f"&$top=1000"
)
print(search_query)
https://catalogue.infra.copphil.philsa.gov.ph/odata/v1/Products?$filter=Collection/Name eq 'SENTINEL-2' and Attributes/OData.CSC.StringAttribute/any(att:att/Name eq 'productType' and att/OData.CSC.StringAttribute/Value eq 'S2MSI2A') and Attributes/OData.CSC.DoubleAttribute/any(att:att/Name eq 'cloudCover' and att/OData.CSC.DoubleAttribute/Value lt 10.0) and OData.CSC.Intersects(area=geography'SRID=4326;POLYGON((121.05 14.65, 121.50 14.65, 121.50 14.15, 121.05 14.15, 121.05 14.65))') and ContentDate/Start gt 2024-01-01T00:00:00.000Z and ContentDate/Start lt 2025-08-26T23:59:59.000Z&$top=1000
Retrieve and Inspect Search Results
We send the query and parse the JSON response into a DataFrame for easier inspection. This gives us an overview of how many Sentinel-2 images from the search query are available.
response = requests.get(search_query).json()
result = pd.DataFrame.from_dict(response["value"])
print(f'Number of products which meets query filters: {len(result)}')
Number of products which meets query filters: 9
After confirming the images exist we can list paths to eodata:
products = json.loads(requests.get(search_query).text)
path_list = []
for item in products['value']:
path_list.append(item['S3Path'])
path_list
['/eodata/Sentinel-2/MSI/L2A/2024/11/22/S2A_MSIL2A_20241122T022011_N0511_R003_T51PTS_20241122T053347.SAFE',
'/eodata/Sentinel-2/MSI/L2A/2024/03/07/S2A_MSIL2A_20240307T021541_N0510_R003_T51PTR_20240307T053851',
'/eodata/Sentinel-2/MSI/L2A/2024/02/26/S2A_MSIL2A_20240226T021651_N0510_R003_T51PTR_20240226T053853',
'/eodata/Sentinel-2/MSI/L2A/2024/04/16/S2A_MSIL2A_20240416T021611_N0510_R003_T51PTS_20240416T075854',
'/eodata/Sentinel-2/MSI/L2A/2024/04/11/S2B_MSIL2A_20240411T021529_N0510_R003_T51PTR_20240411T043123',
'/eodata/Sentinel-2/MSI/L2A/2025/01/06/S2B_MSIL2A_20250106T022009_N0511_R003_T51PTS_20250106T041630.SAFE',
'/eodata/Sentinel-2/MSI/L2A/2024/04/16/S2A_MSIL2A_20240416T021611_N0510_R003_T51PTR_20240416T075854',
'/eodata/Sentinel-2/MSI/L2A/2024/02/26/S2A_MSIL2A_20240226T021651_N0510_R003_T51PUR_20240226T053853',
'/eodata/Sentinel-2/MSI/L2A/2025/08/04/S2B_MSIL2A_20250804T021529_N0511_R003_T51PTS_20250804T041948.SAFE']
And input the image folder path into output_dir, for extraction of
bands:
# Specify the directory where the files are located
output_dir = '/eodata/Sentinel-2/MSI/L2A/2025/01/06/S2B_MSIL2A_20250106T022009_N0511_R003_T51PTS_20250106T041630.SAFE'
# We can either use the full path above or use the path_list from the search results
#output_dir = path_list[x] # Replace x with the index of the desired product
Loading Sentinel-2 Bands (20m Resolution)
- This step defines a helper function to automatically locate and load
Sentinel-2 20m resolution bands from a
.SAFEproduct. The function searches for JP2 files in theR20mfolder and loads only the requested bands.
We then:
Specify the bands needed for water quality analysis (
B01, B02, B03, B04, SCL).Load them into memory as arrays.
Normalize reflectance values by dividing by 10,000 (converting digital numbers into surface reflectance).
This provides ready-to-use band arrays for further index calculations.
def find_and_load_s2_bands_20m(safe_folder, bands):
"""
Find and load Sentinel-2 bands at 20m resolution (JP2 files) inside a .SAFE folder.
Returns a dict {band_name: array}.
"""
band_data = {}
for root, dirs, files in os.walk(safe_folder): #walk through the directories
if "R20m" in root: # only look into the 20m folder
for file_name in files:
if file_name.lower().endswith(".jp2"):
for band in bands:
if f"{band}.jp2" in file_name or f"_{band}_" in file_name: # check if the band is in the filename
path = os.path.join(root, file_name) # full path to the file
with rasterio.open(path) as src:
band_data[band] = src.read(1)
print(f"Loaded {band}: {path}")
return band_data
# Example usage
bands_needed = ("B01","B02","B03","B04","SCL")
s2_bands_20m = find_and_load_s2_bands_20m(output_dir, bands_needed)
# Access arrays directly
b01 = s2_bands_20m.get("B01")
b02 = s2_bands_20m.get("B02")
b03 = s2_bands_20m.get("B03")
b04 = s2_bands_20m.get("B04")
scl = s2_bands_20m.get("SCL")
print("Bands loaded:", list(s2_bands_20m.keys()))
#band normalization
b01 = b01 / 10000.0
b02 = b02 / 10000.0
b03 = b03 / 10000.0
b04 = b04 / 10000.0
Loaded B01: /eodata/Sentinel-2/MSI/L2A/2025/01/06/S2B_MSIL2A_20250106T022009_N0511_R003_T51PTS_20250106T041630.SAFE/GRANULE/L2A_T51PTS_A040927_20250106T022850/IMG_DATA/R20m/T51PTS_20250106T022009_B01_20m.jp2
Loaded B02: /eodata/Sentinel-2/MSI/L2A/2025/01/06/S2B_MSIL2A_20250106T022009_N0511_R003_T51PTS_20250106T041630.SAFE/GRANULE/L2A_T51PTS_A040927_20250106T022850/IMG_DATA/R20m/T51PTS_20250106T022009_B02_20m.jp2
Loaded B03: /eodata/Sentinel-2/MSI/L2A/2025/01/06/S2B_MSIL2A_20250106T022009_N0511_R003_T51PTS_20250106T041630.SAFE/GRANULE/L2A_T51PTS_A040927_20250106T022850/IMG_DATA/R20m/T51PTS_20250106T022009_B03_20m.jp2
Loaded B04: /eodata/Sentinel-2/MSI/L2A/2025/01/06/S2B_MSIL2A_20250106T022009_N0511_R003_T51PTS_20250106T041630.SAFE/GRANULE/L2A_T51PTS_A040927_20250106T022850/IMG_DATA/R20m/T51PTS_20250106T022009_B04_20m.jp2
Loaded SCL: /eodata/Sentinel-2/MSI/L2A/2025/01/06/S2B_MSIL2A_20250106T022009_N0511_R003_T51PTS_20250106T041630.SAFE/GRANULE/L2A_T51PTS_A040927_20250106T022850/IMG_DATA/R20m/T51PTS_20250106T022009_SCL_20m.jp2
Bands loaded: ['B01', 'B02', 'B03', 'B04', 'SCL']
Visualizing Sentinel-2 True Color (RGB)
- We stack bands B04 (Red), B03 (Green), and B02 (Blue) into an RGB
composite to create a natural-color image.
- To improve contrast, the values are stretched between the **2nd and
98th percentiles**, which enhances visibility by removing extreme dark/bright pixels.
The result is a true-color Sentinel-2 image at 20 m resolution, plotted with Matplotlib.
# Stack into RGB
rgb = np.dstack([b04, b03, b02])
# stretch between 2nd–98th percentile to imporve contrast
rgb_min, rgb_max = np.percentile(rgb, (2, 98))
rgb_stretched = np.clip((rgb - rgb_min) / (rgb_max - rgb_min), 0, 1)
# Plot
plt.figure(figsize=(9,9))
plt.imshow(rgb_stretched)
plt.title("Sentinel-2 True Color (B04, B03, B02) - 20m")
plt.axis("off")
plt.show()
Extracting Water Pixels with the Scene Classification Layer (SCL)
Sentinel-2 Level-2A products include a Scene Classification Layer (SCL), which provides pixel-level labels such as vegetation, cloud, shadows, and water.
In this layer, water pixels are assigned the class value of 6.
Here we:
Create a
water_maskby selecting all pixels whereSCL == 6.Print out the total number of water pixels found.
If the count is 0, the selected scene does not contain any detected
water.
#Water pixels are classified with value 6 in the SCL band
water_mask = (scl == 6)
#print out number of water pixels, if it's 0, then the scene has no water.
print("Water pixels:", int(water_mask.sum()))
Water pixels: 7816831
Estimating Chlorophyll-a Concentration
Chlorophyll-a (Chl-a) is a key indicator of phytoplankton biomass and overall water quality. We calculate it here using an empirical model based on Sentinel-2 reflectance bands:
- Steps performed: - Compute Chl-a concentration from the **Green
(B03)** and Coastal Aerosol (B01) bands.
Clip values to the 95th percentile to reduce the influence of extreme outliers.
Apply the water mask (from SCL) to exclude land and other non-water pixels.
Define a custom color scale and class breaks (
[0, 5, 8, 11, 16]) based on the pixel value distribution in this scene.Visualize the Chl-a concentration map with a colorbar for interpretation.
# Calculating the empirical models for Chlorophyl concentration
chl_a = 4.26 * np.power((b03 / b01), 3.94)
# Clip the values to the 95th percentile to avoid outliers messing up the color scale
chl_a = np.clip(chl_a, 0, np.nanpercentile(chl_a, 95))
# Apply SCL water mask
chl_a = np.where(water_mask, chl_a, np.nan)
scaleChl_a = [0, 5, 8, 11, 16] #custom scale for this photo based on pixel value distribution
# Custom color scale
colorScale = [
(73/255, 111/255, 242/255), # blue
(130/255, 211/255, 95/255), # green
(254/255, 253/255, 5/255), # yellow
(253/255, 0/255, 4/255), # red
(142/255, 32/255, 38/255), # dark red
(217/255, 124/255, 245/255) # purple
]
custom_cmap = mcolors.ListedColormap(colorScale)
# Create norm to map the values to the corresponding bins
norm_chl = mcolors.BoundaryNorm(scaleChl_a, custom_cmap.N)
# plot
fig, ax = plt.subplots(figsize=(8, 6))
im = ax.imshow(chl_a, cmap=custom_cmap, norm=norm_chl)
ax.set_title("Chl-a (µg/L)")
# Add colorbar
plt.colorbar(im, ax=ax, fraction=0.046, pad=0.04, label="Chl-a (µg/L)")
ax.axis("off")
plt.tight_layout()
plt.show()
Calculating and Visualizing Multiple Water Quality Parameters
In addition to Chlorophyll-a, we calculate several other empirical water quality indicators derived from Sentinel-2 reflectance bands:
Cyanobacteria (Cya)
\[Cya = 115530.31 \times \left(\frac{B03 \cdot B04}{B02}\right)^{2.38}\]Turbidity
\[Turb = 8.93 \times \frac{B03}{B01} - 6.39\]CDOM (Colored Dissolved Organic Matter)
\[CDOM = 537 \times e^{-2.93 \cdot \frac{B03}{B04}}\]DOC (Dissolved Organic Carbon)
\[DOC = 432 \times e^{-2.24 \cdot \frac{B03}{B04}}\]Color (optical water color proxy)
\[Color = 25366 \times e^{-4.53 \cdot \frac{B03}{B04}}\]
Processing steps:
Clip values at the 95th percentile to remove outliers.
Apply the SCL water mask to restrict analysis to water pixels.
Define custom bins for each parameter, based on the observed distribution.
Visualize all indices in a 2 × 3 subplot grid with consistent color scales.
This provides a comparative overview of different water quality proxies for the selected Sentinel-2 scene.
# Formulas for other water quality parameters
cya = 115530.31 * np.power(((b03 * b04) / b02), 2.38)
turb = 8.93 * (b03 / b01) - 6.39
cdom = 537 * np.exp(-2.93 * (b03 / b04))
doc = 432 * np.exp(-2.24 * (b03 / b04))
color = 25366 * np.exp(-4.53 * (b03 / b04))
# Clip the values to the 95th percentile to
cya = np.clip(cya, 0, np.nanpercentile(cya, 95))
cdom = np.clip(cdom, 0, np.nanpercentile(cdom, 95))
doc = np.clip(doc, 0, np.nanpercentile(doc, 95))
color = np.clip(color, 0, np.nanpercentile(color, 95))
turb = np.clip(turb, 0, np.nanpercentile(turb, 95))
# Apply water mask
cya = np.where(water_mask, cya, np.nan)
turb = np.where(water_mask, turb, np.nan)
cdom = np.where(water_mask, cdom, np.nan)
doc = np.where(water_mask, doc, np.nan)
color = np.where(water_mask, color, np.nan)
# Custom scales
scaleCya = [12, 900, 1300, 2100, 5200] # cyanobacteria
scaleTurb = [0, 3, 4, 5, 6] # turbidity
scaleCDOM = [0, 17, 20, 25, 35] # CDOM absorption
scaleDOC = [0, 30, 35, 40, 50] # dissolved organic carbon
scaleColor = [0, 120, 160, 210, 350] # water color
# Norms
norm_cya = mcolors.BoundaryNorm(scaleCya, custom_cmap.N)
norm_turb = mcolors.BoundaryNorm(scaleTurb, custom_cmap.N)
norm_cdom = mcolors.BoundaryNorm(scaleCDOM, custom_cmap.N)
norm_doc = mcolors.BoundaryNorm(scaleDOC, custom_cmap.N)
norm_col = mcolors.BoundaryNorm(scaleColor, custom_cmap.N)
# Plot all side by side in 2x3 grid
fig, axs = plt.subplots(2, 3, figsize=(18, 12))
im1 = axs[0,0].imshow(chl_a, cmap=custom_cmap, norm=norm_chl); axs[0,0].set_title("Chl-a (µg/L)")
im2 = axs[0,1].imshow(cya, cmap=custom_cmap, norm=norm_cya); axs[0,1].set_title("Cyanobacteria")
im3 = axs[0,2].imshow(turb, cmap=custom_cmap, norm=norm_turb); axs[0,2].set_title("Turbidity")
im4 = axs[1,0].imshow(cdom, cmap=custom_cmap, norm=norm_cdom); axs[1,0].set_title("CDOM")
im5 = axs[1,1].imshow(doc, cmap=custom_cmap, norm=norm_doc); axs[1,1].set_title("DOC")
im6 = axs[1,2].imshow(color, cmap=custom_cmap, norm=norm_col); axs[1,2].set_title("Color")
for ax, im, label in zip(axs.flat,
[im1, im2, im3, im4, im5, im6],
["Chl-a (µg/L)", "Cya", "Turbidity", "CDOM", "DOC", "Color"]):
plt.colorbar(im, ax=ax, fraction=0.046, pad=0.04, label=label)
ax.axis("off")
plt.tight_layout()
plt.show()
Next Steps and Improvements
While this notebook demonstrates the potential of Sentinel-2 imagery for monitoring water quality in Laguna de Bay, several important improvements can make the analysis more robust and actionable:
- In-situ validation
The empirical indices used here must be calibrated and validated with ground-truth measurements to ensure quantitative accuracy.
- Time-series analysis
Extending the workflow to multiple Sentinel-2 acquisitions would allow detection of seasonal trends, algal blooms, and long-term water quality changes.
- Cloud and shadow masking
Refining the masking using the SCL layer or additional algorithms would improve reliability by excluding contaminated pixels.
This workflow provides a relative assessment of water quality proxies. For practical environmental management and decision-making, it should always be complemented with in-situ data collection and calibration.