Data Collection From Web APIs

A curated list of example code to collect data from Web APIs using DataPrep.Connector.

How to Contribute?

You can contribute to this project in two ways. Please check the contributing guide.

Add your example code on this page
Add a new configuration file to this repo

Why Contribute?

Your contribution will benefit ~100K DataPrep users.
Your contribution will be recoginized on Contributors.

Index

Art
Business
Calendar
Crime
Finance
Geocoding
Jobs
Lifestyle
Music
Networking
News
Science
Shopping
Social
Sports
Travel
Video
Weather

Art

Harvard Art Museum -- Collect Museums' Collection Data

<details> <summary>Find the objects with dog in their titles and were made in 1990.</summary>

from dataprep.connector import connect

# You can get ”api_key“ by following https://docs.google.com/forms/d/e/1FAIpQLSfkmEBqH76HLMMiCC-GPPnhcvHC9aJS86E32dOd0Z8MpY2rvQ/viewform
dc = connect('harvardartmuseum', _auth={'access_token': api_key})

df = await dc.query('object', title='dog', yearmade=1990)
df[['title', 'division', 'classification', 'technique', 'department', 'century', 'dated']]

| | title | division | classification | technique | department | century | dated | | --- | --------------------------- | --------------------------- | -------------- | -------------------- | ------------------------- | ------------ | ----- | | 0 | Paris (black dog on street) | Modern and Contemporary Art | Photographs | Gelatin silver print | Department of Photographs | 20th century | 1990s | | 1 | Pregnant Woman with Dog | Modern and Contemporary Art | Photographs | Gelatin silver print | Department of Photographs | 20th century | 1990 | | 2 | Pompeii Dog | Modern and Contemporary Art | Prints | Drypoint | Department of Prints | 20th century | 1990 |

</details> <details> <summary>Find 10 people that are Dutch.</summary>

from dataprep.connector import connect

# You can get ”api_key“ by following https://docs.google.com/forms/d/e/1FAIpQLSfkmEBqH76HLMMiCC-GPPnhcvHC9aJS86E32dOd0Z8MpY2rvQ/viewform
dc = connect('harvardartmuseum', _auth={'access_token': api_key})

df = await dc.query('person', q='culture:Dutch', size=10)
df[['display name', 'gender', 'culture', 'display date', 'object count', 'birth place', 'death place']]

| | display name | gender | culture | display date | object count | birth place | death place | | --- | ------------------------------- | ------- | ------- | -------------- | ------------ | -------------------------------- | ---------------------- | | 0 | Joris Abrahamsz. van der Haagen | unknown | Dutch | 1615 - 1669 | 7 | Arnhem or Dordrecht, Netherlands | The Hague, Netherlands | | 1 | François Morellon de la Cave | unknown | Dutch | 1723 - 65 | 1 | None | None | | 2 | Cornelis Vroom | unknown | Dutch | 1590/92 - 1661 | 3 | Haarlem(?), Netherlands | Haarlem, Netherlands | | 3 | Constantijn Daniel van Renesse | unknown | Dutch | 1626 - 1680 | 2 | Maarssen | Eindhoven | | 4 | Dirck Dalens, the Younger | unknown | Dutch | 1654 - 1688 | 3 | Amsterdam, Netherlands | Amsterdam, Netherlands |

</details> <details> <summary>Find all exhibitions that take place at a Harvard Art Museums venue after 2020-01-01.</summary>

from dataprep.connector import connect

# You can get ”api_key“ by following https://docs.google.com/forms/d/e/1FAIpQLSfkmEBqH76HLMMiCC-GPPnhcvHC9aJS86E32dOd0Z8MpY2rvQ/viewform
dc = connect('harvardartmuseum', _auth={'access_token': api_key})

df = await dc.query('exhibition', venue='HAM', after='2020-01-01')
df

| | title | begin date | end date | url | | --- | ------------------------------------------------------- | ---------- | ---------- | -------------------------------------------------------- | | 0 | Painting Edo: Japanese Art from the Feinberg Collection | 2020-02-14 | 2021-07-18 | https://www.harvardartmuseums.org/visit/exhibitions/5909 |

</details> <details> <summary>Find 5 records for publications that were published in 2013.</summary>

from dataprep.connector import connect

# You can get ”api_key“ by following https://docs.google.com/forms/d/e/1FAIpQLSfkmEBqH76HLMMiCC-GPPnhcvHC9aJS86E32dOd0Z8MpY2rvQ/viewform
dc = connect('harvardartmuseum', _auth={'access_token': api_key})

df = await dc.query('publication', q='publicationyear:2013', size=5)
df[['title','publication date','publication place','format']]

| | title | publication date | publication place | format | | --- | ------------------------------------------------- | ---------------- | ----------------- | ------------------------ | | 0 | 19th Century Paintings, Drawings & Watercolours | January 23, 2013 | London | Auction/Dealer Catalogue | | 1 | "With Éclat" The Boston Athenæum and the Orig... | 2013 | Boston, MA | Book | | 2 | "Review: Fragonard's Progress of Love at the F... | 2013 | London | Article/Essay | | 3 | Alternative Narratives | February 2013 | None | Article/Essay | | 4 | Victorian & British Impressionist Art | July 11, 2013 | London | Auction/Dealer Catalogue |

</details> <details> <summary>Find 5 galleries that are on floor (Level) 2 in the Harvard Art Museums building.</summary>

from dataprep.connector import connect

# You can get ”api_key“ by following https://docs.google.com/forms/d/e/1FAIpQLSfkmEBqH76HLMMiCC-GPPnhcvHC9aJS86E32dOd0Z8MpY2rvQ/viewform
dc = connect('harvardartmuseum', _auth={'access_token': api_key})

df = await dc.query('gallery', floor=2, size=5)
df[['id','name','theme','object count']]

| | id | name | theme | object count | | --- | ---- | -------------------------------------------- | ------------------------------------------------- | ------------ | | 0 | 2200 | European and American Art, 17th–19th century | The Emergence of Romanticism in Early Nineteen... | 20 | | 1 | 2210 | West Arcade | None | 6 | | 2 | 2340 | European and American Art, 17th–19th century | The Silver Cabinet: Art and Ritual, 1600–1850 | 73 | | 3 | 2460 | East Arcade | None | 2 | | 4 | 2700 | European and American Art, 19th century | Impressionism and the Late Nineteenth Century | 19 |

</details>

Business

Yelp -- Collect Local Business Data

<details> <summary>What's the phone number of Capilano Suspension Bridge Park?</summary>

from dataprep.connector import connect

# You can get ”yelp_access_token“ by following https://www.yelp.com/developers/documentation/v3/authentication
conn_yelp = connect("yelp", _auth={"access_token":yelp_access_token}, _concurrency = 5)

df = await conn_yelp.query("businesses", term = "Capilano Suspension Bridge Park", location = "Vancouver", _count = 1)

df[["name","phone"]]

| id | name | phone | | --- | ------------------------------- | --------------- | | 0 | Capilano Suspension Bridge Park | +1 604-985-7474 |

</details> <details> <summary>Which yoga store has the highest review count in Vancouver?</summary>

from dataprep.connector import connect

# You can get ”yelp_access_token“ by following https://www.yelp.com/developers/documentation/v3/authentication
conn_yelp = connect("yelp", _auth={"access_token":yelp_access_token}, _concurrency = 1)

  # Check all supported categories: https://www.yelp.ca/developers/documentation/v3/all_category_list
df = await conn_yelp.query("businesses", categories = "yoga", location = "Vancouver", sort_by = "review_count", _count = 1)
df[["name", "review_count"]]

| id | name | review_count | | --- | ------------------- | ------------ | | 0 | YYOGA Downtown Flow | 107 |

</details> <details> <summary>How many Starbucks stores in Seattle and where are they?</summary>

from dataprep.connector import connect

# You can get ”yelp_access_token“ by following https://www.yelp.com/developers/documentation/v3/authentication
conn_yelp = connect("yelp", _auth={"access_token":yelp_access_token}, _concurrency = 5)
df = await conn_yelp.query("businesses", term = "Starbucks", location = "Seattle", _count = 1000)

# Remove irrelevant data
df = df[(df['city'] == 'Seattle') & (df['name'] == 'Starbucks')]
df[['name', 'address1', 'city', 'state', 'country', 'zip_code']].reset_index(drop=True)

| id | name | addre

APIConnectors

Install / Use

README