Datasets for Data Science.

Datasets for Data Science

intro

A quick resource guide for satellite imagery, object detection, and geospatial ML.

In an ideal world for geospatial machine learning, you have three things:

A good model zoo to start quickly.

A training environment that runs without heavy setup.

Datasets to practice on.

Like:

Hugging Face == model zoo and datasets.

Google Colab == training environment. Data == I do not know yet.

Platforms to Practice Geospatial ML

These platforms provide datasets, competitions, or model sharing useful for computer vision and satellite imagery tasks.

Hugging Face – datasets and pretrained models

CodaLab – popular competition platform (widely used in China)

Geospatial ML – curated geospatial machine learning resources

AIcrowd – ML competitions and benchmarks

Papers With Code – tasks, datasets, and model leaderboards

Roboflow Universe – community datasets for computer vision

OpenML – open datasets for machine learning experiments

Satellite Imagery Sources (Raster Data)

Reddit...

Ready-to-use aerial and satellite imagery datasets are not easy to find. This Reddit thread is a good resource for finding countries with NAIP-level aerial imagery coverage: https://www.reddit.com/r/gis/comments/1eowe3f/countries_with_naiplevel_imagery/

Building & Urban Detection

SpaceNet – Building Segmentation Satellite imagery for building footprint extraction.

Other nodes

iSAID (Instance Segmentation in Aerial Images Dataset) https://captain-whu.github.io/iSAID/

Large Remote Sensing Object Detection

xView One of the largest satellite object detection datasets.

https://xviewdataset.org/

European Land Data

Copernicus Urban Atlas Detailed land use and land cover maps for European cities. https://land.copernicus.eu/local/urban-atlas

DOTA (Dataset for Object Detection in Aerial Images) https://captain-whu.github.io/DOTA/dataset.html

In [ ]:
 

links

social