Download satellite data using API on CopPhil
The following example aims to present how to interact with the catalogue in order to download data from CopPhil portal using API. It is divided into two main parts:
Download full Sentinel-2 product which contains all bands and metadata files
Download selected Sentinel-2 bands using ‘Nodes()’ approach
Prerequisites
No. 1 Access to CopPhil site
You need a CopPhil hosting account, available at https://infra.copphil.philsa.gov.ph.
No. 2 Access to JupyterLab
JupyterLab used in this article is available here: https://jupyter.infra.copphil.philsa.gov.ph/hub/login?next=%2Fhub%2F.
No. 3 Working knowledge of JupyterLab
See article Introduction to JupyterLab on CopPhil
No. 4 Information on Sentinel-2 mission
Page Sentinel-2 mission shows basic information on Sentinel-2 mission, which is used in this article as a source of information.
No. 5 Use Geo science kernel
Geo science kernel has preinstalled all of the Python libraries in this article:
Python 3 kernel and Geo science kernel in JupyterLab on CopPhil
What We Are Going To Cover
Preparing your environment
Searching for the Sentinel-2 L2A products
Building a query
Inspect results of the request
Downloading data
Authentication
Selecting the product to be downloaded
Unpacking data
Downloading multiple files
Downloading selected bands
Authentication
Establishing a session
Extracting ID and Name of the product
Downloading metadata file
Getting the location of individual bands in Sentinel-2 product
Download individual bands
Summary
Prepare your environment
Upload necessary python libraries to a Code cell.
# HTTP requests
import requests
# JSON parser
import json
# XML parser
import xml.etree.ElementTree as ET
# data manipulation
import pandas as pd
# file manipulation
import zipfile
import os
Search for the Sentinel-2 L2A products
Build a query
In order to find desired products, it is needed to determinate some specific filters:
- collection_name
Sentinel-1, Sentinel-2 etc.
- product_type
MSIL1C, MSIL2A
- aoi
extent (coordinates) of the area of interest (WGS84)
- search_period_start
time range - start date
- search_period_end
time range - end date - max_cloud_cover: maximum cloud cover (%) of the image
# base URL of the product catalogue
catalogue_odata_url = "https://catalogue.infra.copphil.philsa.gov.ph/odata/v1"
# search parameters
collection_name = "SENTINEL-2"
product_type = "S2MSI2A"
aoi = "POLYGON((120.962986 14.598416, 120.995964 14.599182, 120.999658 14.563436, 120.960348 14.567522, 120.962986 14.598416))"
search_period_start = "2024-01-01T00:00:00.000Z"
search_period_end = "2024-09-30T00:00:00.000Z"
max_cloud_cover = 20
search_query = f"{catalogue_odata_url}/Products?$filter=Collection/Name eq '{collection_name}' and Attributes/OData.CSC.StringAttribute/any(att:att/Name eq 'productType' and att/OData.CSC.StringAttribute/Value eq '{product_type}' and Attributes/OData.CSC.DoubleAttribute/any(att:att/Name eq 'cloudCover' and att/OData.CSC.DoubleAttribute/Value le {max_cloud_cover})) and OData.CSC.Intersects(area=geography'SRID=4326;{aoi}') and ContentDate/Start gt {search_period_start} and ContentDate/Start lt {search_period_end}"
#search_query_cloud = f"{search_query} and Attributes/OData.CSC.DoubleAttribute/any(att:att/Name eq 'cloudCover' and att/OData.CSC.DoubleAttribute/Value le {max_cloud_cover})"
print(f"""\n{search_query.replace(' ', "%20")}\n""")
The result is
Inspect results of the request
Show the number of found products which fits into filters requirements.
response = requests.get(search_query).json()
result = pd.DataFrame.from_dict(response["value"])
len(result)
The result is
6
Get a list of names and id’s of the images which fit the query.
id_list = []
for element in response['value']:
id_value = element['Id'] # Get the Id value
id_list.append(id_value) # Append it to the list
for id in id_list:
print(id)
name_list = []
for element in response['value']:
name_value = element['Name'] # Get the Id value
name_list.append(name_value) # Append it to the list
for name in name_list:
print(name)
51b58ff8-4fd5-4fe3-b2f0-beebfee95bab
f5c26c11-d72d-4c91-80ec-fd7428c0d518
dfe46249-bc17-401f-bf9a-bcc05efe77c7
cbdd1d86-52ac-4ee7-aa33-306019d525db
af473459-5577-48ba-a852-2607f7fe8357
4acc616f-1f77-460b-867e-4d3452dee225
S2B_MSIL2A_20240211T021829_N0510_R003_T51PTS_20240211T050012.SAFE
S2A_MSIL2A_20240307T021541_N0510_R003_T51PTS_20240307T053851.SAFE
S2B_MSIL2A_20240302T021609_N0510_R003_T51PTS_20240302T043128.SAFE
S2B_MSIL2A_20240421T021529_N0510_R003_T51PTS_20240421T043611.SAFE
S2A_MSIL2A_20240416T021611_N0510_R003_T51PTS_20240416T075854.SAFE
S2A_MSIL2A_20240426T021611_N0510_R003_T51PTS_20240426T075954.SAFE
Get a name and id of the first image which fits the query.
id_download = response['value'][0]['Id']
name_download = response['value'][0]['Name']
print(id_download, name_download)
51b58ff8-4fd5-4fe3-b2f0-beebfee95bab S2B_MSIL2A_20240211T021829_N0510_R003_T51PTS_20240211T050012.SAFE
Get an info when the image was generated.
image_date = response['value'][0]['ContentDate']['Start']
print(image_date)
2024-02-11T02:18:29.024000Z
Download data
Authentication
Log in to the portal. If status_code=200, it means you are logged successfully.
# Provide CopPhil account credentials - replace with your own data
username = 'your_username'
password = 'your_password'
auth_server_url = "https://auth.copphil.cloudferro.com/auth/realms/copphilinfra/protocol/openid-connect/token"
data = {
"client_id": "copphil-public",
"grant_type": "password",
"username": username,
"password": password,
}
response = requests.post(auth_server_url, data=data, verify=True, allow_redirects=False)
access_token = json.loads(response.text)["access_token"]
status_code = response.status_code
print(status_code)
The result:
200
Select the product to be downloaded
From the list which was provided in the previous stage extract image name and id to download the proper image.
- id_download
Id of the first product.
- name_download
Name od the first product.
url_download = f'https://download.infra.copphil.philsa.gov.ph/odata/v1/Products({id_download})/$value'
url_download_using_token = url_download + '?token=' + access_token
# downloads the file
pulling = requests.get(url_download_using_token)
open(f"{name_download}.zip", 'wb').write(pulling.content)
Here is the result:
1156520968
Unpack data
Unzip the downloaded folder.
# Unzipping the file
with zipfile.ZipFile(f'{name_download}.zip', 'r') as zip_ref:
zip_ref.extractall(f'{name_download}') # This will extract files to 'unzipped_folder'
# Check if unzipped successfully
if os.path.exists(f'{name_download}'):
print(f"Unzipped successfully into {name_download}")
else:
print("Unzipping failed.")
The result:
Unzipped successfully into S2B_MSIL2A_20240211T021829_N0510_R003_T51PTS_20240211T050012.SAFE
Download multiple files
Create a loop to download all images which fit to query requirements.
# Download each file using the IDs and corresponding names
for id_value, name_value in zip(id_list, name_list):
# Construct the download URL for each file
url_download = f'https://download.infra.copphil.philsa.gov.ph/odata/v1/Products({id_download})/$value'
url_download_using_token = url_download + '?token=' + access_token
# Download the file
pulling = requests.get(url_download_using_token)
# Save the file using the corresponding name
with open(f"{name_value}.zip", 'wb') as file:
file.write(pulling.content)
print(f"Downloaded {name_value}.zip")
The output is:
Downloaded S2A_MSIL2A_20240307T021541_N0510_R003_T51PTS_20240307T053851.SAFE.zip
Downloaded S2A_MSIL2A_20240416T021611_N0510_R003_T51PTS_20240416T075854.SAFE.zip
Downloaded S2A_MSIL2A_20240426T021611_N0510_R003_T51PTS_20240426T075954.SAFE.zip
Downloaded S2B_MSIL2A_20240421T021529_N0510_R003_T51PTS_20240421T043611.SAFE.zip
Downloaded S2B_MSIL2A_20240211T021829_N0510_R003_T51PTS_20240211T050012.SAFE.zip
Downloaded S2B_MSIL2A_20240302T021609_N0510_R003_T51PTS_20240302T043128.SAFE.zip
Download selected bands
Authentication
Login to the portal. If status_code=200, it means you are logged successfully.
# Provide CopPhil account credentials - replace with your own data
username = 'your_username'
password = 'your_password'
auth_server_url = "https://auth.copphil.cloudferro.com/auth/realms/copphilinfra/protocol/openid-connect/token"
data = {
"client_id": "copphil-public",
"grant_type": "password",
"username": username,
"password": password,
}
response = requests.post(auth_server_url, data=data, verify=True, allow_redirects=False)
access_token = json.loads(response.text)["access_token"]
status_code = response.status_code
print(status_code)
The output:
200
Establish a session
Establish a session and again take a look on founded products
session = requests.Session()
session.headers["Authorization"] = f"Bearer {access_token}"
result.head(3)
@odata.mediaContentType | Id | Name | ContentType | ContentLength | OriginDate | PublicationDate | ModificationDate | Online | EvictionDate | S3Path | Checksum | ContentDate | Footprint | GeoFootprint | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | application/octet-stream | 51b58ff8-4fd5-4fe3-b2f0-beebfee95bab | S2B_MSIL2A_20240211T021829_N0510_R003_T51PTS_2... | application/octet-stream | 1156520968 | 2024-02-11T06:09:02.000000Z | 2024-02-11T06:18:35.134676Z | 2024-03-13T10:30:49.000330Z | True | 9999-12-31T23:59:59.999999Z | /eodata/Sentinel-2/MSI/L2A/2024/02/11/S2B_MSIL... | [{'Value': 'fc89b2ffa223c038953f7e67ea9145c8',... | {'Start': '2024-02-11T02:18:29.024000Z', 'End'... | geography'SRID=4326;POLYGON ((120.205542908771... | {'type': 'Polygon', 'coordinates': [[[120.2055... |
1 | application/octet-stream | f5c26c11-d72d-4c91-80ec-fd7428c0d518 | S2A_MSIL2A_20240307T021541_N0510_R003_T51PTS_2... | application/octet-stream | 1130852495 | 2024-03-07T06:30:50.000000Z | 2024-03-07T06:40:56.016476Z | 2024-03-07T09:51:23.141122Z | True | 9999-12-31T23:59:59.999999Z | /eodata/Sentinel-2/MSI/L2A/2024/03/07/S2A_MSIL... | [{'Value': '83ad987c68029bd4359e28dafc9e8d6a',... | {'Start': '2024-03-07T02:15:41.024000Z', 'End'... | geography'SRID=4326;POLYGON ((120.205542908771... | {'type': 'Polygon', 'coordinates': [[[120.2055... |
2 | application/octet-stream | dfe46249-bc17-401f-bf9a-bcc05efe77c7 | S2B_MSIL2A_20240302T021609_N0510_R003_T51PTS_2... | application/octet-stream | 1131528020 | 2024-03-02T05:42:43.000000Z | 2024-03-02T05:54:12.768121Z | 2024-03-02T05:55:36.451258Z | True | 9999-12-31T23:59:59.999999Z | /eodata/Sentinel-2/MSI/L2A/2024/03/02/S2B_MSIL... | [{'Value': '02fce369b36758079e58eafce7ef1e9b',... | {'Start': '2024-03-02T02:16:09.024000Z', 'End'... | geography'SRID=4326;POLYGON ((120.205542908771... | {'type': 'Polygon', 'coordinates': [[[120.2055... |
Extract Id and Name of the product
- result.iloc[0, 1]
first row second column - first product and its Id
- result.iloc[0, 2]
second row third column - first product and its Name
# Select identifier of the first product
product_identifier = result.iloc[0, 1] # Id
product_name = result.iloc[0, 2] # Name
print(product_identifier, product_name)
The result:
51b58ff8-4fd5-4fe3-b2f0-beebfee95bab S2B_MSIL2A_20240211T021829_N0510_R003_T51PTS_20240211T050012.SAFE
Download metadata file
‘Nodes()’ represents hierarchical navigation or traversal within a nested data structure.
To use ‘Nodes()’ segments it is required to obtain metadata file. For Sentinel-2 Level 2 it is always named “MTD_MSIL2A.xml”.
product_name = product_name.replace('.SAFE', '')
url = f"{catalogue_odata_url}/Products({product_identifier})/Nodes({product_name})/Nodes(MTD_MSIL2A.xml)/$value"
- Products({product_identifier})
This segment accesses a specific product in the Products entity set by its identifier.
- Nodes({product_name})
This segment represents a folder or node within the product’s structure. Here, {product_name} specifies the name of this particular node.
- Nodes(MTD_MSIL2A.xml)
This nested Nodes segment accesses another level within the node hierarchy. MTD_MSIL2A.xml is a file or a sub-node within {product_name}.
- $value
The $value segment is a special OData directive used to retrieve the content.
# Nodes() method lets us traverse the directory tree and retrieve single file from the product
url = f"{catalogue_odata_url}/Products({product_identifier})/Nodes({product_name})/Nodes(MTD_MSIL2A.xml)/$value"
response = session.get(url, allow_redirects=False)
while response.status_code in (301, 302, 303, 307):
url = response.headers["Location"]
response = session.get(url, allow_redirects=False)
file = session.get(url, verify=False, allow_redirects=True)
# Save the product in home directory
outfile = "MTD_MSIL2A.xml"
with open(outfile, "wb") as f:
f.write(file.content)
The result:
/home/ubuntu/anaconda3/envs/xcube_env/lib/python3.12/site-packages/urllib3/connectionpool.py:1099: InsecureRequestWarning: Unverified HTTPS request is being made to host 'download.infra.copphil.philsa.gov.ph'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#tls-warnings
warnings.warn(
Get the location of individual bands in Sentinel-2 product
Extract the file paths of multiple spectral bands from a Sentinel-2 metadata XML file and construct each path based on a nested element structure, and then stores each path as a list in band_location.
- root[0][0][12][0][0][0]
The last but one refers to 10/20/60 resolution, the last one refers to band.
band_location.append(f“{product_name}/{root[0][0][12][0][0][0].text}.jp2”.split(“/”))
- last two values - [0][0]
Refers to Blue Band (10 meters resolution)
band_location.append(f“{product_name}/{root[0][0][12][0][0][1].text}.jp2”.split(“/”))
- last two values - [0][1]
Refers to Green Band (10 meters resolution)
band_location.append(f“{product_name}/{root[0][0][12][0][0][2].text}.jp2”.split(“/”))
- last two values - [0][2]
Refers to Red Band (10 meters resolution)
band_location.append(f“{product_name}/{root[0][0][12][0][0][3].text}.jp2”.split(“/”))
- last two values - [0][3]
Refers to NIR Band (10 meters resolution)
For 20 meters resolution it would be:
band_location.append(f“{product_name}/{root[0][0][12][0][1][0].text}.jp2”.split(“/”))
- last two values - [1][0]
Refers to Blue Band
band_location.append(f“{product_name}/{root[0][0][12][0][1][1].text}.jp2”.split(“/”))
- last two values - [1][1]
Refers to Green Band
band_location.append(f“{product_name}/{root[0][0][12][0][1][2].text}.jp2”.split(“/”))
- last two values - [1][2]
Refers to Red Band
band_location.append(f“{product_name}/{root[0][0][12][0][1][6].text}.jp2”.split(“/”))
- last two values - [1][6]
Refers to NIR Band
# Pass the path of the xml document
tree = ET.parse(str(outfile))
# get the parent tag
root = tree.getroot()
# Get the location of individual bands in Sentinel-2 granule
band_location = []
band_location.append(f"{product_name}/{root[0][0][12][0][0][0].text}.jp2".split("/"))
band_location.append(f"{product_name}/{root[0][0][12][0][0][1].text}.jp2".split("/"))
band_location.append(f"{product_name}/{root[0][0][12][0][0][2].text}.jp2".split("/"))
band_location.append(f"{product_name}/{root[0][0][12][0][0][3].text}.jp2".split("/"))
Download individual bands
Construct URLs to download each Sentinel-2 band file by navigating through nested Nodes() calls in the URL. Each band file is downloaded and saved locally, with the filename derived from the path.
# Build the url for each file using Nodes() method
bands = []
for band_file in band_location:
url = f"{catalogue_odata_url}/Products({product_identifier})/Nodes({product_name})/Nodes({band_file[1]})/Nodes({band_file[2]})/Nodes({band_file[3]})/Nodes({band_file[4]})/Nodes({band_file[5]})/$value"
response = session.get(url, allow_redirects=False)
while response.status_code in (301, 302, 303, 307):
url = response.headers["Location"]
response = session.get(url, allow_redirects=False)
file = session.get(url, verify=False, allow_redirects=True)
# Save the product in home directory
outfile = band_file[5]
with open(outfile, "wb") as f:
f.write(file.content)
bands.append(str(outfile))
print("Saved:", band_file[5])
The output:
/home/ubuntu/anaconda3/envs/xcube_env/lib/python3.12/site-packages/urllib3/connectionpool.py:1099: InsecureRequestWarning: Unverified HTTPS request is being made to host 'download.infra.copphil.philsa.gov.ph'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#tls-warnings
warnings.warn(
Saved: T51PTS_20240211T021829_B02_10m.jp2
/home/ubuntu/anaconda3/envs/xcube_env/lib/python3.12/site-packages/urllib3/connectionpool.py:1099: InsecureRequestWarning: Unverified HTTPS request is being made to host 'download.infra.copphil.philsa.gov.ph'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#tls-warnings
warnings.warn(
Saved: T51PTS_20240211T021829_B03_10m.jp2
/home/ubuntu/anaconda3/envs/xcube_env/lib/python3.12/site-packages/urllib3/connectionpool.py:1099: InsecureRequestWarning: Unverified HTTPS request is being made to host 'download.infra.copphil.philsa.gov.ph'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#tls-warnings
warnings.warn(
Saved: T51PTS_20240211T021829_B04_10m.jp2
/home/ubuntu/anaconda3/envs/xcube_env/lib/python3.12/site-packages/urllib3/connectionpool.py:1099: InsecureRequestWarning: Unverified HTTPS request is being made to host 'download.infra.copphil.philsa.gov.ph'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#tls-warnings
warnings.warn(
Saved: T51PTS_20240211T021829_B08_10m.jp2
Summary
The notebook introduced how to download data using API by two approaches:
obtaining the full zipped product (single and multiple files) as well as
extracting specific bands.