We’re interested in supporting researchers using our products to study areas related to the responsible deployment of AI and mitigating associated risks, as well as understanding the societal impact of AI systems. If you are interested in an opportunity for subsidized access, please apply for API credits.
Note that this will take you to a third-party provider, SurveyMonkey Apply, where you’ll need to create an account to apply.
We encourage applications from early stage researchers in countries supported by our API, and are especially interested in subsidizing work from researchers with limited financial and institutional resources. Please note that the expected turnaround time for accepted applicants would be around 4–6 weeks.
Before applying, please take a moment to review our Research Policy.
Areas of interest include
How can we understand what objective, if any, a model is best understood as pursuing? How do we increase the extent to which that objective is aligned with human preferences, such as via prompt design or fine-tuning?
Fairness & representation
How should performance criteria be established for fairness and representation in language models? How can language models be improved in order to effectively support the goals of fairness and representation in specific, deployed contexts?
How do we create measurements for AI’s impact on society? What impact does AI have on different domains and groups of people?
How can AI development draw on insights from other disciplines such as philosophy, cognitive science, and sociolinguistics?
How do these models work, mechanistically? Can we identify what concepts they’re using, extract latent knowledge from the model, make inferences about the training procedure, or predict surprising future behavior?
How can systems like the API be misused? What sorts of “red teaming” approaches can we develop to help AI developers think about responsibly deploying technologies like this?
How robust are large generative models to “natural” perturbations in the prompt, such as phrasing the same idea in different ways or with typos? Can we predict the kinds of domains and tasks for which large generative models are more likely to be robust or not, and how does this relate to the training data? Are there techniques we can use to predict and mitigate worst-case behavior? How can robustness be measured in the context of few-shot learning (e.g., across variations in prompts)? Can we train models so that they satisfy safety properties with a very high level of reliability, even under adversarial inputs?
We’re initially scoping to these areas, but welcome suggestions for future focus areas. The questions under each area are illustrative and we’d be delighted for research proposals that address different questions.