Trane is a software package that automatically generates problems for temporal datasets and produces labels for supervised learning. Its goal is to streamline the machine learning problem-solving process.

Install

Install Trane using pip:

python -m pip install trane

Usage

Here's a quick demonstration of Trane in action:

import trane

data, metadata = trane.load_airbnb()
problem_generator = trane.ProblemGenerator(
  metadata=metadata,
  entity_columns=["location"]
)
problems = problem_generator.generate()

for problem in problems[:5]:
    print(problem)

A few of the generated problems:

==================================================
Generated 40 total problems
--------------------------------------------------
Classification problems: 5
Regression problems: 35
==================================================
For each <location> predict if there exists a record
For each <location> predict if there exists a record with <location> equal to <str>
For each <location> predict if there exists a record with <location> not equal to <str>
For each <location> predict if there exists a record with <rating> equal to <str>
For each <location> predict if there exists a record with <rating> not equal to <str>

With Trane's LLM add-on (pip install "trane[llm]"), we can determine the relevant problems with OpenAI:

from trane.llm import analyze

instructions = "determine 5 most relevant problems about user's booking preferences. Do not include 'predict the first/last X' problems"
context = "Airbnb data listings in major cities, including information about hosts, pricing, location, and room type, along with over 5 million historical reviews."
relevant_problems = analyze(
    problems=problems,
    instructions=instructions,
    context=context,
    model="gpt-3.5-turbo-16k"
)
for problem in relevant_problems:
    print(problem)
    print(f'Reasoning: {problem.get_reasoning()}\n')

Output

For each <location> predict if there exists a record
Reasoning: This problem can help identify locations with missing data or locations that have not been booked at all.

For each <location> predict the first <location> in all related records
Reasoning: Predicting the first location in all related records can provide insights into the most frequently booked locations for each city.

For each <location> predict the first <rating> in all related records
Reasoning: Predicting the first rating in all related records can provide insights into the average satisfaction level of guests for each location.

For each <location> predict the last <location> in all related records
Reasoning: Predicting the last location in all related records can provide insights into the most recent bookings for each city.

For each <location> predict the last <rating> in all related records
Reasoning: Predicting the last rating in all related records can provide insights into the recent satisfaction level of guests for each location.

Community

Questions or Issues? Create a GitHub issue.
Want to Chat? Join our Slack community.

Cite Trane

If you find Trane beneficial, consider citing our paper:

Ben Schreck, Kalyan Veeramachaneni. What Would a Data Scientist Ask? Automatically Formulating and Solving Predictive Problems. IEEE DSAA 2016, 440-451.

BibTeX entry:

@inproceedings{schreck2016would,
  title={What Would a Data Scientist Ask? Automatically Formulating and Solving Predictive Problems},
  author={Schreck, Benjamin and Veeramachaneni, Kalyan},
  booktitle={Data Science and Advanced Analytics (DSAA), 2016 IEEE International Conference on},
  pages={440--451},
  year={2016},
  organization={IEEE}
}

Trane

Install / Use

README

Install

Usage

Community

Cite Trane