Datasets for Geospatial Data science

Datasets for Geospatial Data science

A quick resource guide for satellite imagery, object detection, and geospatial ML.

In an ideal world for geospatial machine learning, you have three things:

A good model zoo to start quickly.

A training environment that runs without heavy setup.

Datasets to practice on.

Like:

Hugging Face == model zoo and datasets.

Google Colab == training environment.

Data == I do not know yet.

Platforms to Practice Geospatial ML

These platforms provide datasets, competitions, or model sharing useful for computer vision and satellite imagery tasks.

Hugging Face – datasets and pretrained models

CodaLab – popular competition platform (widely used in China)

Geospatial ML – curated geospatial machine learning resources

AIcrowd – ML competitions and benchmarks

Papers With Code – tasks, datasets, and model leaderboards

Roboflow Universe – community datasets for computer vision

OpenML – open datasets for machine learning experiments

Satellite Imagery Sources (Raster Data)

For georeferenced rasters for GIS or Deep Learning (CNNs) there is specialized SpatioTemporal Asset Catalog (STAC) repositories. https://developmentseed.org/scaling_science/docs/Open_data_code.html OR https://stacspec.org/en/about/datasets/

Reddit...

Ready-to-use aerial and satellite imagery datasets are not easy to find. This Reddit thread is a good resource for finding countries with NAIP-level aerial imagery coverage: https://www.reddit.com/r/gis/comments/1eowe3f/countries_with_naiplevel_imagery/ AND THIS IS AWESOME: https://geoservices.ign.fr/bdortho

Building & Urban Detection

SpaceNet – Building Segmentation Satellite imagery for building footprint extraction.

Other nodes

USDA Gateway: Provides free access to high-quality aerial ortho-imagery (NAIP) for the United States. EarthData (NASA): For global satellite-based raster datasets.

Radiant MLHub: Specifically designed for georeferenced training data and raw imagery for Earth observation. https://registry.opendata.aws/radiant-mlhub/

iSAID (Instance Segmentation in Aerial Images Dataset) https://captain-whu.github.io/iSAID/

Large Remote Sensing Object Detection

xView https://xviewdataset.org/ is one of the largest satellite object detection datasets.

European Land Data

Copernicus Urban Atlas Detailed land use and land cover maps for European cities. https://land.copernicus.eu/local/urban-atlas

DOTA (Dataset for Object Detection in Aerial Images) https://captain-whu.github.io/DOTA/dataset.html

In [ ]:
### Conclusion
Reddit was best to find dataset to start using directly. 

You can user clever keywords for Target Remote Sensing Tags: Use the search bar for terms like remote_sensing, satellite, or landcover.

USDA Gateway: Provides free access to high-quality aerial ortho-imagery (NAIP) for the United States.
EarthData (NASA): For global satellite-based raster datasets. 
French source geoservices is greatest and easiest to access so far:   <https://geoservices.ign.fr/bdortho>.

links

social