Datasets for Geospatial Data science¶
A quick resource guide for satellite imagery, object detection, and geospatial ML.
In an ideal world for geospatial machine learning, you have three things:
A good model zoo to start quickly.
A training environment that runs without heavy setup.
Datasets to practice on.
Like:
Hugging Face == model zoo and datasets.
Google Colab == training environment.
Data == I do not know yet.
Platforms to Practice Geospatial ML¶
These platforms provide datasets, competitions, or model sharing useful for computer vision and satellite imagery tasks.
Hugging Face – datasets and pretrained models
CodaLab – popular competition platform (widely used in China)
Geospatial ML – curated geospatial machine learning resources
AIcrowd – ML competitions and benchmarks
Papers With Code – tasks, datasets, and model leaderboards
Roboflow Universe – community datasets for computer vision
OpenML – open datasets for machine learning experiments
Satellite Imagery Sources (Raster Data)¶
For georeferenced rasters for GIS or Deep Learning (CNNs) there is specialized SpatioTemporal Asset Catalog (STAC) repositories. https://developmentseed.org/scaling_science/docs/Open_data_code.html OR https://stacspec.org/en/about/datasets/
Reddit...¶
Ready-to-use aerial and satellite imagery datasets are not easy to find. This Reddit thread is a good resource for finding countries with NAIP-level aerial imagery coverage: https://www.reddit.com/r/gis/comments/1eowe3f/countries_with_naiplevel_imagery/ AND THIS IS AWESOME: https://geoservices.ign.fr/bdortho
Building & Urban Detection¶
SpaceNet – Building Segmentation Satellite imagery for building footprint extraction.
Other nodes¶
USDA Gateway: Provides free access to high-quality aerial ortho-imagery (NAIP) for the United States. EarthData (NASA): For global satellite-based raster datasets.
Radiant MLHub: Specifically designed for georeferenced training data and raw imagery for Earth observation. https://registry.opendata.aws/radiant-mlhub/
iSAID (Instance Segmentation in Aerial Images Dataset) https://captain-whu.github.io/iSAID/
Large Remote Sensing Object Detection
xView https://xviewdataset.org/ is one of the largest satellite object detection datasets.
European Land Data
Copernicus Urban Atlas Detailed land use and land cover maps for European cities. https://land.copernicus.eu/local/urban-atlas
DOTA (Dataset for Object Detection in Aerial Images) https://captain-whu.github.io/DOTA/dataset.html
### Conclusion
Reddit was best to find dataset to start using directly.
You can user clever keywords for Target Remote Sensing Tags: Use the search bar for terms like remote_sensing, satellite, or landcover.
USDA Gateway: Provides free access to high-quality aerial ortho-imagery (NAIP) for the United States.
EarthData (NASA): For global satellite-based raster datasets.
French source geoservices is greatest and easiest to access so far: <https://geoservices.ign.fr/bdortho>.