APIConnectors
A curated list of example code to collect data from Web APIs using DataPrep.Connector.
Install / Use
/learn @sfu-db/APIConnectorsREADME
Data Collection From Web APIs
<!-- ALL-CONTRIBUTORS-BADGE:START - Do not remove or modify this section --> <!-- ALL-CONTRIBUTORS-BADGE:END -->A curated list of example code to collect data from Web APIs using DataPrep.Connector.
How to Contribute?
You can contribute to this project in two ways. Please check the contributing guide.
- Add your example code on this page
- Add a new configuration file to this repo
Why Contribute?
- Your contribution will benefit ~100K DataPrep users.
- Your contribution will be recoginized on Contributors.
Index
- Art
- Business
- Calendar
- Crime
- Finance
- Geocoding
- Jobs
- Lifestyle
- Music
- Networking
- News
- Science
- Shopping
- Social
- Sports
- Travel
- Video
- Weather
Art
Harvard Art Museum -- Collect Museums' Collection Data
<details> <summary>Find the objects with dog in their titles and were made in 1990.</summary>from dataprep.connector import connect
# You can get ”api_key“ by following https://docs.google.com/forms/d/e/1FAIpQLSfkmEBqH76HLMMiCC-GPPnhcvHC9aJS86E32dOd0Z8MpY2rvQ/viewform
dc = connect('harvardartmuseum', _auth={'access_token': api_key})
df = await dc.query('object', title='dog', yearmade=1990)
df[['title', 'division', 'classification', 'technique', 'department', 'century', 'dated']]
| | title | division | classification | technique | department | century | dated | | --- | --------------------------- | --------------------------- | -------------- | -------------------- | ------------------------- | ------------ | ----- | | 0 | Paris (black dog on street) | Modern and Contemporary Art | Photographs | Gelatin silver print | Department of Photographs | 20th century | 1990s | | 1 | Pregnant Woman with Dog | Modern and Contemporary Art | Photographs | Gelatin silver print | Department of Photographs | 20th century | 1990 | | 2 | Pompeii Dog | Modern and Contemporary Art | Prints | Drypoint | Department of Prints | 20th century | 1990 |
</details> <details> <summary>Find 10 people that are Dutch.</summary>from dataprep.connector import connect
# You can get ”api_key“ by following https://docs.google.com/forms/d/e/1FAIpQLSfkmEBqH76HLMMiCC-GPPnhcvHC9aJS86E32dOd0Z8MpY2rvQ/viewform
dc = connect('harvardartmuseum', _auth={'access_token': api_key})
df = await dc.query('person', q='culture:Dutch', size=10)
df[['display name', 'gender', 'culture', 'display date', 'object count', 'birth place', 'death place']]
| | display name | gender | culture | display date | object count | birth place | death place | | --- | ------------------------------- | ------- | ------- | -------------- | ------------ | -------------------------------- | ---------------------- | | 0 | Joris Abrahamsz. van der Haagen | unknown | Dutch | 1615 - 1669 | 7 | Arnhem or Dordrecht, Netherlands | The Hague, Netherlands | | 1 | François Morellon de la Cave | unknown | Dutch | 1723 - 65 | 1 | None | None | | 2 | Cornelis Vroom | unknown | Dutch | 1590/92 - 1661 | 3 | Haarlem(?), Netherlands | Haarlem, Netherlands | | 3 | Constantijn Daniel van Renesse | unknown | Dutch | 1626 - 1680 | 2 | Maarssen | Eindhoven | | 4 | Dirck Dalens, the Younger | unknown | Dutch | 1654 - 1688 | 3 | Amsterdam, Netherlands | Amsterdam, Netherlands |
</details> <details> <summary>Find all exhibitions that take place at a Harvard Art Museums venue after 2020-01-01.</summary>from dataprep.connector import connect
# You can get ”api_key“ by following https://docs.google.com/forms/d/e/1FAIpQLSfkmEBqH76HLMMiCC-GPPnhcvHC9aJS86E32dOd0Z8MpY2rvQ/viewform
dc = connect('harvardartmuseum', _auth={'access_token': api_key})
df = await dc.query('exhibition', venue='HAM', after='2020-01-01')
df
| | title | begin date | end date | url | | --- | ------------------------------------------------------- | ---------- | ---------- | -------------------------------------------------------- | | 0 | Painting Edo: Japanese Art from the Feinberg Collection | 2020-02-14 | 2021-07-18 | https://www.harvardartmuseums.org/visit/exhibitions/5909 |
</details> <details> <summary>Find 5 records for publications that were published in 2013.</summary>from dataprep.connector import connect
# You can get ”api_key“ by following https://docs.google.com/forms/d/e/1FAIpQLSfkmEBqH76HLMMiCC-GPPnhcvHC9aJS86E32dOd0Z8MpY2rvQ/viewform
dc = connect('harvardartmuseum', _auth={'access_token': api_key})
df = await dc.query('publication', q='publicationyear:2013', size=5)
df[['title','publication date','publication place','format']]
| | title | publication date | publication place | format | | --- | ------------------------------------------------- | ---------------- | ----------------- | ------------------------ | | 0 | 19th Century Paintings, Drawings & Watercolours | January 23, 2013 | London | Auction/Dealer Catalogue | | 1 | "With Éclat" The Boston Athenæum and the Orig... | 2013 | Boston, MA | Book | | 2 | "Review: Fragonard's Progress of Love at the F... | 2013 | London | Article/Essay | | 3 | Alternative Narratives | February 2013 | None | Article/Essay | | 4 | Victorian & British Impressionist Art | July 11, 2013 | London | Auction/Dealer Catalogue |
</details> <details> <summary>Find 5 galleries that are on floor (Level) 2 in the Harvard Art Museums building.</summary>from dataprep.connector import connect
# You can get ”api_key“ by following https://docs.google.com/forms/d/e/1FAIpQLSfkmEBqH76HLMMiCC-GPPnhcvHC9aJS86E32dOd0Z8MpY2rvQ/viewform
dc = connect('harvardartmuseum', _auth={'access_token': api_key})
df = await dc.query('gallery', floor=2, size=5)
df[['id','name','theme','object count']]
| | id | name | theme | object count | | --- | ---- | -------------------------------------------- | ------------------------------------------------- | ------------ | | 0 | 2200 | European and American Art, 17th–19th century | The Emergence of Romanticism in Early Nineteen... | 20 | | 1 | 2210 | West Arcade | None | 6 | | 2 | 2340 | European and American Art, 17th–19th century | The Silver Cabinet: Art and Ritual, 1600–1850 | 73 | | 3 | 2460 | East Arcade | None | 2 | | 4 | 2700 | European and American Art, 19th century | Impressionism and the Late Nineteenth Century | 19 |
</details>Business
Yelp -- Collect Local Business Data
<details> <summary>What's the phone number of Capilano Suspension Bridge Park?</summary>from dataprep.connector import connect
# You can get ”yelp_access_token“ by following https://www.yelp.com/developers/documentation/v3/authentication
conn_yelp = connect("yelp", _auth={"access_token":yelp_access_token}, _concurrency = 5)
df = await conn_yelp.query("businesses", term = "Capilano Suspension Bridge Park", location = "Vancouver", _count = 1)
df[["name","phone"]]
| id | name | phone | | --- | ------------------------------- | --------------- | | 0 | Capilano Suspension Bridge Park | +1 604-985-7474 |
</details> <details> <summary>Which yoga store has the highest review count in Vancouver?</summary>from dataprep.connector import connect
# You can get ”yelp_access_token“ by following https://www.yelp.com/developers/documentation/v3/authentication
conn_yelp = connect("yelp", _auth={"access_token":yelp_access_token}, _concurrency = 1)
# Check all supported categories: https://www.yelp.ca/developers/documentation/v3/all_category_list
df = await conn_yelp.query("businesses", categories = "yoga", location = "Vancouver", sort_by = "review_count", _count = 1)
df[["name", "review_count"]]
| id | name | review_count | | --- | ------------------- | ------------ | | 0 | YYOGA Downtown Flow | 107 |
</details> <details> <summary>How many Starbucks stores in Seattle and where are they?</summary>from dataprep.connector import connect
# You can get ”yelp_access_token“ by following https://www.yelp.com/developers/documentation/v3/authentication
conn_yelp = connect("yelp", _auth={"access_token":yelp_access_token}, _concurrency = 5)
df = await conn_yelp.query("businesses", term = "Starbucks", location = "Seattle", _count = 1000)
# Remove irrelevant data
df = df[(df['city'] == 'Seattle') & (df['name'] == 'Starbucks')]
df[['name', 'address1', 'city', 'state', 'country', 'zip_code']].reset_index(drop=True)
| id | name | addre
