Download satellite data using API on CopPhil

The following example aims to present how to interact with the catalogue in order to download data from CopPhil portal using API. It is divided into two main parts:

  • Download full Sentinel-2 product which contains all bands and metadata files

  • Download selected Sentinel-2 bands using ‘Nodes()’ approach

Prerequisites

No. 1 Access to CopPhil site

You need a CopPhil hosting account, available at https://infra.copphil.philsa.gov.ph.

No. 2 Access to JupyterLab

JupyterLab used in this article is available here: https://jupyter.infra.copphil.philsa.gov.ph/hub/login?next=%2Fhub%2F.

No. 3 Working knowledge of JupyterLab

See article Introduction to JupyterLab on CopPhil

No. 4 Information on Sentinel-2 mission

Page Sentinel-2 mission shows basic information on Sentinel-2 mission, which is used in this article as a source of information.

No. 5 Use Geo science kernel

Geo science kernel has preinstalled all of the Python libraries in this article:

Python 3 kernel and Geo science kernel in JupyterLab on CopPhil

What We Are Going To Cover

  • Preparing your environment

  • Searching for the Sentinel-2 L2A products

    • Building a query

    • Inspect results of the request

  • Downloading data

    • Authentication

    • Selecting the product to be downloaded

    • Unpacking data

  • Downloading multiple files

  • Downloading selected bands

    • Authentication

    • Establishing a session

    • Extracting ID and Name of the product

    • Downloading metadata file

    • Getting the location of individual bands in Sentinel-2 product

    • Download individual bands

  • Summary

Prepare your environment

Upload necessary python libraries to a Code cell.

# HTTP requests
import requests

# JSON parser
import json

# XML parser
import xml.etree.ElementTree as ET

# data manipulation
import pandas as pd

# file manipulation
import zipfile
import os

Search for the Sentinel-2 L2A products

Build a query

In order to find desired products, it is needed to determinate some specific filters:

collection_name

Sentinel-1, Sentinel-2 etc.

product_type

MSIL1C, MSIL2A

aoi

extent (coordinates) of the area of interest (WGS84)

search_period_start

time range - start date

search_period_end

time range - end date - max_cloud_cover: maximum cloud cover (%) of the image

# base URL of the product catalogue
catalogue_odata_url = "https://catalogue.infra.copphil.philsa.gov.ph/odata/v1"

# search parameters
collection_name = "SENTINEL-2"
product_type = "S2MSI2A"
aoi = "POLYGON((120.962986 14.598416, 120.995964 14.599182, 120.999658 14.563436, 120.960348 14.567522, 120.962986 14.598416))"
search_period_start = "2024-01-01T00:00:00.000Z"
search_period_end = "2024-09-30T00:00:00.000Z"
max_cloud_cover = 20

search_query = f"{catalogue_odata_url}/Products?$filter=Collection/Name eq '{collection_name}' and Attributes/OData.CSC.StringAttribute/any(att:att/Name eq 'productType' and att/OData.CSC.StringAttribute/Value eq '{product_type}' and Attributes/OData.CSC.DoubleAttribute/any(att:att/Name eq 'cloudCover' and att/OData.CSC.DoubleAttribute/Value le {max_cloud_cover})) and OData.CSC.Intersects(area=geography'SRID=4326;{aoi}') and ContentDate/Start gt {search_period_start} and ContentDate/Start lt {search_period_end}"

#search_query_cloud = f"{search_query} and Attributes/OData.CSC.DoubleAttribute/any(att:att/Name eq 'cloudCover' and att/OData.CSC.DoubleAttribute/Value le {max_cloud_cover})"
print(f"""\n{search_query.replace(' ', "%20")}\n""")

The result is

https://catalogue.infra.copphil.philsa.gov.ph/odata/v1/Products?$filter=Collection/Name%20eq%20’SENTINEL-2’%20and%20Attributes/OData.CSC.StringAttribute/any(att:att/Name%20eq%20’productType’%20and%20att/OData.CSC.StringAttribute/Value%20eq%20’S2MSI2A’%20and%20Attributes/OData.CSC.DoubleAttribute/any(att:att/Name%20eq%20’cloudCover’%20and%20att/OData.CSC.DoubleAttribute/Value%20le%2020))%20and%20OData.CSC.Intersects(area=geography’SRID=4326;POLYGON((120.962986%2014.598416,%20120.995964%2014.599182,%20120.999658%2014.563436,%20120.960348%2014.567522,%20120.962986%2014.598416))’)%20and%20ContentDate/Start%20gt%202024-01-01T00:00:00.000Z%20and%20ContentDate/Start%20lt%202024-09-30T00:00:00.000Z

Inspect results of the request

Show the number of found products which fits into filters requirements.

response = requests.get(search_query).json()
result = pd.DataFrame.from_dict(response["value"])
len(result)

The result is

6

Get a list of names and id’s of the images which fit the query.

id_list = []
for element in response['value']:
    id_value = element['Id']  # Get the Id value
    id_list.append(id_value)  # Append it to the list

for id in id_list:
    print(id)

name_list = []
for element in response['value']:
    name_value = element['Name']  # Get the Id value
    name_list.append(name_value)  # Append it to the list

for name in name_list:
    print(name)
51b58ff8-4fd5-4fe3-b2f0-beebfee95bab
f5c26c11-d72d-4c91-80ec-fd7428c0d518
dfe46249-bc17-401f-bf9a-bcc05efe77c7
cbdd1d86-52ac-4ee7-aa33-306019d525db
af473459-5577-48ba-a852-2607f7fe8357
4acc616f-1f77-460b-867e-4d3452dee225
S2B_MSIL2A_20240211T021829_N0510_R003_T51PTS_20240211T050012.SAFE
S2A_MSIL2A_20240307T021541_N0510_R003_T51PTS_20240307T053851.SAFE
S2B_MSIL2A_20240302T021609_N0510_R003_T51PTS_20240302T043128.SAFE
S2B_MSIL2A_20240421T021529_N0510_R003_T51PTS_20240421T043611.SAFE
S2A_MSIL2A_20240416T021611_N0510_R003_T51PTS_20240416T075854.SAFE
S2A_MSIL2A_20240426T021611_N0510_R003_T51PTS_20240426T075954.SAFE

Get a name and id of the first image which fits the query.

id_download = response['value'][0]['Id']
name_download = response['value'][0]['Name']
print(id_download, name_download)
51b58ff8-4fd5-4fe3-b2f0-beebfee95bab S2B_MSIL2A_20240211T021829_N0510_R003_T51PTS_20240211T050012.SAFE

Get an info when the image was generated.

image_date = response['value'][0]['ContentDate']['Start']
print(image_date)
2024-02-11T02:18:29.024000Z

Download data

Authentication

Log in to the portal. If status_code=200, it means you are logged successfully.

# Provide CopPhil account credentials - replace with your own data

username = 'your_username'
password = 'your_password'

auth_server_url = "https://auth.copphil.cloudferro.com/auth/realms/copphilinfra/protocol/openid-connect/token"
data = {
    "client_id": "copphil-public",
    "grant_type": "password",
    "username": username,
    "password": password,
}

response = requests.post(auth_server_url, data=data, verify=True, allow_redirects=False)
access_token = json.loads(response.text)["access_token"]
status_code = response.status_code
print(status_code)

The result:

200

Select the product to be downloaded

From the list which was provided in the previous stage extract image name and id to download the proper image.

id_download

Id of the first product.

name_download

Name od the first product.

url_download = f'https://download.infra.copphil.philsa.gov.ph/odata/v1/Products({id_download})/$value'
url_download_using_token = url_download + '?token=' + access_token

# downloads the file
pulling = requests.get(url_download_using_token)
open(f"{name_download}.zip", 'wb').write(pulling.content)

Here is the result:

1156520968

Unpack data

Unzip the downloaded folder.

# Unzipping the file
with zipfile.ZipFile(f'{name_download}.zip', 'r') as zip_ref:
    zip_ref.extractall(f'{name_download}')  # This will extract files to 'unzipped_folder'

# Check if unzipped successfully
if os.path.exists(f'{name_download}'):
    print(f"Unzipped successfully into {name_download}")
else:
    print("Unzipping failed.")

The result:

Unzipped successfully into S2B_MSIL2A_20240211T021829_N0510_R003_T51PTS_20240211T050012.SAFE

Download multiple files

Create a loop to download all images which fit to query requirements.

# Download each file using the IDs and corresponding names
for id_value, name_value in zip(id_list, name_list):
    # Construct the download URL for each file
    url_download = f'https://download.infra.copphil.philsa.gov.ph/odata/v1/Products({id_download})/$value'
    url_download_using_token = url_download + '?token=' + access_token

    # Download the file
    pulling = requests.get(url_download_using_token)

    # Save the file using the corresponding name
    with open(f"{name_value}.zip", 'wb') as file:
        file.write(pulling.content)

    print(f"Downloaded {name_value}.zip")

The output is:

Downloaded S2A_MSIL2A_20240307T021541_N0510_R003_T51PTS_20240307T053851.SAFE.zip
Downloaded S2A_MSIL2A_20240416T021611_N0510_R003_T51PTS_20240416T075854.SAFE.zip
Downloaded S2A_MSIL2A_20240426T021611_N0510_R003_T51PTS_20240426T075954.SAFE.zip
Downloaded S2B_MSIL2A_20240421T021529_N0510_R003_T51PTS_20240421T043611.SAFE.zip
Downloaded S2B_MSIL2A_20240211T021829_N0510_R003_T51PTS_20240211T050012.SAFE.zip
Downloaded S2B_MSIL2A_20240302T021609_N0510_R003_T51PTS_20240302T043128.SAFE.zip

Download selected bands

Authentication

Login to the portal. If status_code=200, it means you are logged successfully.

# Provide CopPhil account credentials - replace with your own data

username = 'your_username'
password = 'your_password'

auth_server_url = "https://auth.copphil.cloudferro.com/auth/realms/copphilinfra/protocol/openid-connect/token"
data = {
    "client_id": "copphil-public",
    "grant_type": "password",
    "username": username,
    "password": password,
}

response = requests.post(auth_server_url, data=data, verify=True, allow_redirects=False)
access_token = json.loads(response.text)["access_token"]
status_code = response.status_code
print(status_code)

The output:

200

Establish a session

Establish a session and again take a look on founded products

session = requests.Session()
session.headers["Authorization"] = f"Bearer {access_token}"

result.head(3)
@odata.mediaContentType Id Name ContentType ContentLength OriginDate PublicationDate ModificationDate Online EvictionDate S3Path Checksum ContentDate Footprint GeoFootprint
0 application/octet-stream 51b58ff8-4fd5-4fe3-b2f0-beebfee95bab S2B_MSIL2A_20240211T021829_N0510_R003_T51PTS_2... application/octet-stream 1156520968 2024-02-11T06:09:02.000000Z 2024-02-11T06:18:35.134676Z 2024-03-13T10:30:49.000330Z True 9999-12-31T23:59:59.999999Z /eodata/Sentinel-2/MSI/L2A/2024/02/11/S2B_MSIL... [{'Value': 'fc89b2ffa223c038953f7e67ea9145c8',... {'Start': '2024-02-11T02:18:29.024000Z', 'End'... geography'SRID=4326;POLYGON ((120.205542908771... {'type': 'Polygon', 'coordinates': [[[120.2055...
1 application/octet-stream f5c26c11-d72d-4c91-80ec-fd7428c0d518 S2A_MSIL2A_20240307T021541_N0510_R003_T51PTS_2... application/octet-stream 1130852495 2024-03-07T06:30:50.000000Z 2024-03-07T06:40:56.016476Z 2024-03-07T09:51:23.141122Z True 9999-12-31T23:59:59.999999Z /eodata/Sentinel-2/MSI/L2A/2024/03/07/S2A_MSIL... [{'Value': '83ad987c68029bd4359e28dafc9e8d6a',... {'Start': '2024-03-07T02:15:41.024000Z', 'End'... geography'SRID=4326;POLYGON ((120.205542908771... {'type': 'Polygon', 'coordinates': [[[120.2055...
2 application/octet-stream dfe46249-bc17-401f-bf9a-bcc05efe77c7 S2B_MSIL2A_20240302T021609_N0510_R003_T51PTS_2... application/octet-stream 1131528020 2024-03-02T05:42:43.000000Z 2024-03-02T05:54:12.768121Z 2024-03-02T05:55:36.451258Z True 9999-12-31T23:59:59.999999Z /eodata/Sentinel-2/MSI/L2A/2024/03/02/S2B_MSIL... [{'Value': '02fce369b36758079e58eafce7ef1e9b',... {'Start': '2024-03-02T02:16:09.024000Z', 'End'... geography'SRID=4326;POLYGON ((120.205542908771... {'type': 'Polygon', 'coordinates': [[[120.2055...

Extract Id and Name of the product

result.iloc[0, 1]

first row second column - first product and its Id

result.iloc[0, 2]

second row third column - first product and its Name

# Select identifier of the first product
product_identifier = result.iloc[0, 1] # Id
product_name = result.iloc[0, 2]       # Name
print(product_identifier, product_name)

The result:

51b58ff8-4fd5-4fe3-b2f0-beebfee95bab S2B_MSIL2A_20240211T021829_N0510_R003_T51PTS_20240211T050012.SAFE

Download metadata file

‘Nodes()’ represents hierarchical navigation or traversal within a nested data structure.

To use ‘Nodes()’ segments it is required to obtain metadata file. For Sentinel-2 Level 2 it is always named “MTD_MSIL2A.xml”.

product_name = product_name.replace('.SAFE', '')

url = f"{catalogue_odata_url}/Products({product_identifier})/Nodes({product_name})/Nodes(MTD_MSIL2A.xml)/$value"
Products({product_identifier})

This segment accesses a specific product in the Products entity set by its identifier.

Nodes({product_name})

This segment represents a folder or node within the product’s structure. Here, {product_name} specifies the name of this particular node.

Nodes(MTD_MSIL2A.xml)

This nested Nodes segment accesses another level within the node hierarchy. MTD_MSIL2A.xml is a file or a sub-node within {product_name}.

$value

The $value segment is a special OData directive used to retrieve the content.

# Nodes() method lets us traverse the directory tree and retrieve single file from the product
url = f"{catalogue_odata_url}/Products({product_identifier})/Nodes({product_name})/Nodes(MTD_MSIL2A.xml)/$value"
response = session.get(url, allow_redirects=False)
while response.status_code in (301, 302, 303, 307):
    url = response.headers["Location"]
    response = session.get(url, allow_redirects=False)

file = session.get(url, verify=False, allow_redirects=True)

# Save the product in home directory
outfile = "MTD_MSIL2A.xml"
with open(outfile, "wb") as f:
    f.write(file.content)

The result:

/home/ubuntu/anaconda3/envs/xcube_env/lib/python3.12/site-packages/urllib3/connectionpool.py:1099: InsecureRequestWarning: Unverified HTTPS request is being made to host 'download.infra.copphil.philsa.gov.ph'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#tls-warnings
  warnings.warn(

Get the location of individual bands in Sentinel-2 product

Extract the file paths of multiple spectral bands from a Sentinel-2 metadata XML file and construct each path based on a nested element structure, and then stores each path as a list in band_location.

root[0][0][12][0][0][0]

The last but one refers to 10/20/60 resolution, the last one refers to band.

band_location.append(f“{product_name}/{root[0][0][12][0][0][0].text}.jp2”.split(“/”))
last two values - [0][0]

Refers to Blue Band (10 meters resolution)

band_location.append(f“{product_name}/{root[0][0][12][0][0][1].text}.jp2”.split(“/”))
last two values - [0][1]

Refers to Green Band (10 meters resolution)

band_location.append(f“{product_name}/{root[0][0][12][0][0][2].text}.jp2”.split(“/”))
last two values - [0][2]

Refers to Red Band (10 meters resolution)

band_location.append(f“{product_name}/{root[0][0][12][0][0][3].text}.jp2”.split(“/”))
last two values - [0][3]

Refers to NIR Band (10 meters resolution)

For 20 meters resolution it would be:

band_location.append(f“{product_name}/{root[0][0][12][0][1][0].text}.jp2”.split(“/”))
last two values - [1][0]

Refers to Blue Band

band_location.append(f“{product_name}/{root[0][0][12][0][1][1].text}.jp2”.split(“/”))
last two values - [1][1]

Refers to Green Band

band_location.append(f“{product_name}/{root[0][0][12][0][1][2].text}.jp2”.split(“/”))
last two values - [1][2]

Refers to Red Band

band_location.append(f“{product_name}/{root[0][0][12][0][1][6].text}.jp2”.split(“/”))
last two values - [1][6]

Refers to NIR Band

# Pass the path of the xml document
tree = ET.parse(str(outfile))
# get the parent tag
root = tree.getroot()

# Get the location of individual bands in Sentinel-2 granule
band_location = []
band_location.append(f"{product_name}/{root[0][0][12][0][0][0].text}.jp2".split("/"))
band_location.append(f"{product_name}/{root[0][0][12][0][0][1].text}.jp2".split("/"))
band_location.append(f"{product_name}/{root[0][0][12][0][0][2].text}.jp2".split("/"))
band_location.append(f"{product_name}/{root[0][0][12][0][0][3].text}.jp2".split("/"))

Download individual bands

Construct URLs to download each Sentinel-2 band file by navigating through nested Nodes() calls in the URL. Each band file is downloaded and saved locally, with the filename derived from the path.

# Build the url for each file using Nodes() method
bands = []
for band_file in band_location:
    url = f"{catalogue_odata_url}/Products({product_identifier})/Nodes({product_name})/Nodes({band_file[1]})/Nodes({band_file[2]})/Nodes({band_file[3]})/Nodes({band_file[4]})/Nodes({band_file[5]})/$value"
    response = session.get(url, allow_redirects=False)
    while response.status_code in (301, 302, 303, 307):
        url = response.headers["Location"]
        response = session.get(url, allow_redirects=False)
    file = session.get(url, verify=False, allow_redirects=True)
    # Save the product in home directory
    outfile = band_file[5]
    with open(outfile, "wb") as f:
        f.write(file.content)
    bands.append(str(outfile))
    print("Saved:", band_file[5])

The output:

/home/ubuntu/anaconda3/envs/xcube_env/lib/python3.12/site-packages/urllib3/connectionpool.py:1099: InsecureRequestWarning: Unverified HTTPS request is being made to host 'download.infra.copphil.philsa.gov.ph'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#tls-warnings
  warnings.warn(

Saved: T51PTS_20240211T021829_B02_10m.jp2

/home/ubuntu/anaconda3/envs/xcube_env/lib/python3.12/site-packages/urllib3/connectionpool.py:1099: InsecureRequestWarning: Unverified HTTPS request is being made to host 'download.infra.copphil.philsa.gov.ph'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#tls-warnings
  warnings.warn(

Saved: T51PTS_20240211T021829_B03_10m.jp2

/home/ubuntu/anaconda3/envs/xcube_env/lib/python3.12/site-packages/urllib3/connectionpool.py:1099: InsecureRequestWarning: Unverified HTTPS request is being made to host 'download.infra.copphil.philsa.gov.ph'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#tls-warnings
  warnings.warn(

Saved: T51PTS_20240211T021829_B04_10m.jp2

/home/ubuntu/anaconda3/envs/xcube_env/lib/python3.12/site-packages/urllib3/connectionpool.py:1099: InsecureRequestWarning: Unverified HTTPS request is being made to host 'download.infra.copphil.philsa.gov.ph'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#tls-warnings
  warnings.warn(

Saved: T51PTS_20240211T021829_B08_10m.jp2

Summary

The notebook introduced how to download data using API by two approaches:

  • obtaining the full zipped product (single and multiple files) as well as

  • extracting specific bands.