Datasets for Data Science¶
intro¶
A quick resource guide for satellite imagery, object detection, and geospatial ML.
In an ideal world for geospatial machine learning, you have three things:
A good model zoo to start quickly.
A training environment that runs without heavy setup.
Datasets to practice on.
Like:
Hugging Face == model zoo and datasets.
Google Colab == training environment. Data == I do not know yet.
Platforms to Practice Geospatial ML¶
These platforms provide datasets, competitions, or model sharing useful for computer vision and satellite imagery tasks.
Hugging Face – datasets and pretrained models
CodaLab – popular competition platform (widely used in China)
Geospatial ML – curated geospatial machine learning resources
AIcrowd – ML competitions and benchmarks
Papers With Code – tasks, datasets, and model leaderboards
Roboflow Universe – community datasets for computer vision
OpenML – open datasets for machine learning experiments
Satellite Imagery Sources (Raster Data)¶
Reddit...¶
Ready-to-use aerial and satellite imagery datasets are not easy to find. This Reddit thread is a good resource for finding countries with NAIP-level aerial imagery coverage: https://www.reddit.com/r/gis/comments/1eowe3f/countries_with_naiplevel_imagery/
Building & Urban Detection¶
SpaceNet – Building Segmentation Satellite imagery for building footprint extraction.
Other nodes¶
iSAID (Instance Segmentation in Aerial Images Dataset) https://captain-whu.github.io/iSAID/
Large Remote Sensing Object Detection
xView One of the largest satellite object detection datasets.
European Land Data
Copernicus Urban Atlas Detailed land use and land cover maps for European cities. https://land.copernicus.eu/local/urban-atlas
DOTA (Dataset for Object Detection in Aerial Images) https://captain-whu.github.io/DOTA/dataset.html