← Back to dissertation-ai.dataimago.ai/onboard
Download as Markdown Download JSON template

dissertation.ai onboarding handout

Generated from dissertation.yaml on 2026-06-19.

This interview captures the context AI needs to be genuinely useful to your dissertation work. It is intentionally substantial — most users complete it in 75–120 minutes. You may save your draft and continue later, or complete the printable handout offline first and paste your answers here.

How to use this handout

The dissertation.ai onboarding interview asks 23 questions across 5 sections and takes 75–120 minutes typical. This handout helps you prepare your answers in advance.

Two workflows are supported:

  1. Read + transcribe (deep). Read each question carefully, think, draft your answers in this document or in your favorite editor. At session time, type your answers directly into the live interview UI.
  2. Fill the JSON template (efficient). Download dissertation-onboarding-template.json from the link above. Fill in your answers in the JSON file. Paste the resulting JSON into the live UI's Import JSON affordance for instant population.

Either path produces the same dissertation environment. The JSON path is faster; the markdown path supports richer thinking and AI assistance — paste this whole handout into Claude / ChatGPT and ask for help drafting your answers.

Section 2 (Your research) is the highest-leverage section. The four long-form questions there shape every AI-assisted suggestion for the next 1–3 years. Spend real time there — 30–60 minutes is appropriate.

Document uploads (Section 5) happen in the live UI directly. The handout helps you decide which documents to upload and prepares the descriptive metadata for each.

Section 1 — Identity ~2–5 min

The minimum we need to provision your dissertation repos.

Q1. What's your name?

Appears on your thesis title page.

Your answer: ___________________________________

Q2. What's your institutional email?

Your answer: ___________________________________

Q3. What's your GitHub username?

We'll create your dissertation repos under your account.

Your answer: ___________________________________

Q4. Which institution will you defend at?

Your answer: ___________________________________

Q5. What's your degree program?

e.g., Ph.D. Statistics, Ed.D. Curriculum and Instruction

Your answer: ___________________________________

Section 2 — Your research ~30–60 min

The highest-leverage section of the interview. The four narrative fields here shape every chapter draft, every literature suggestion, every AI-assisted writing decision for the next 1–3 years. Spend real time here — 30–60 minutes is appropriate.

Q6. What's your working dissertation title?

Don't worry about making it final — committees routinely refine the title in the last semester. A working title here just helps orient the project.

Your answer: ___________________________________

Q7. What is the central research question (or questions) your dissertation answers?

Be precise. A dissertation has one or two central questions; everything else is in service of answering them. State the question(s) as you would state them in your proposal defense — well-formed, with appropriate scope, identifying what's empirical / theoretical / methodological about them.

The more detail, the better. Spend 15–30 minutes here. The clearer your research question, the more useful every AI suggestion in your dissertation will be. If you have two questions, distinguish them clearly — primary vs. secondary, or two facets of one larger inquiry.

Your answer:

Q8. What brought you to this topic? What's the intellectual + personal context?

What does the AI assistant need to know about *you and this project* that it can't infer from your research question alone? Your prior work, the conversations that shaped the inquiry, the gap in the literature you noticed, the practical problem that motivates the work. Multi-paragraph is appropriate.

The more detail, the better. Two or three paragraphs. Cover both the intellectual genealogy (what work led you here, what scholarly conversations you're joining) and the personal / practical motivation (why this question, why now, why you).

Your answer:

Q9. What is the novel contribution? Why does this dissertation matter?

Three or four sentences minimum. State the contribution as a numbered list if you have multiple. Distinguish theoretical contribution, methodological contribution, empirical contribution, and practical contribution where relevant — most dissertations make 1–3 of these.

The more detail, the better. This is the elevator pitch for your dissertation. If a senior scholar in your field asked "what's new about your dissertation?", what would you say? The AI uses this to weight chapter-drafting against your contribution claims.

Your answer:

Q10. What's your methodology?

This question has 2 parts — answer each below.

Primary methodological approach

Choose one:

  • ☐ Quantitative (statistical models, surveys, experiments)
  • ☐ Qualitative (interviews, ethnography, textual analysis, archival)
  • ☐ Mixed-methods (both quantitative and qualitative)
  • ☐ Computational (simulation, algorithm design, ML, mathematical modeling)

Describe your methodology — whatever you have so far.

If you're early in your dissertation and still figuring out your methodology, that's expected — write what you know, even one or two sentences about your general approach. You can refine this later by editing your project's dataimago-spec.yaml. If you have a clearer picture: 2–4 paragraphs covering your data source(s), analytic approach, statistical / qualitative / computational techniques, unit of analysis, and key methodological challenges + how you'll address them.

The more detail, the better. The more detail you provide, the better AI assistance can engage with your methodological choices. But this question is also OK to answer briefly — your methodology will sharpen as your dissertation progresses, and you can update this field anytime. At the detailed end: "I'll use regression" is too generic; "I'll use hierarchical Bayesian regression with weakly informative priors, fit via Stan, with HMC diagnostics following Vehtari et al." is the level of specificity that's most useful.

Your answer:

Section 3 — Dissertation structure ~5–12 min

How your project is organized: R package, chapters, citation style, R ecosystem preferences.

Q11. Do you already have an R package for your dissertation research?

If you have an existing R package (one with a DESCRIPTION file, in a repo you can point us at), choose "Yes". If you don't — including if you have R scripts that aren't yet structured as a package, or if you don't write R yet at all — choose "No" and we'll create a fresh R package alongside your dissertation repo, modeled on dataimago-rpkg's structure. AI assistance during your dissertation work can help you migrate any existing scripts into the new package over time; you don't need to organize them in advance.

Choose one:

  • ☐ Yes, I have an R package in a repo
  • ☐ No — please create a fresh one for me

Q12. What's the URL of your existing R package repo?

Answer this only if your earlier answer was Q11 → Yes, I have an R package in a repo.

e.g., https://github.com/jsmith/smith-thesis-rpkg

Your answer: ___________________________________

Q13. What should we name your new R package?

Answer this only if your earlier answer was Q11 → No — please create a fresh one for me.

Kebab-case. Convention: <yourname>-thesis-rpkg, e.g., smith-thesis-rpkg. The package will be modeled on dataimago-rpkg with ui/www as its Quarto root.

Your answer: ___________________________________

Q14. What chapters will your dissertation have?

We pre-fill a standard outline based on your methodological approach. Edit titles, add chapters, remove chapters, or reorder freely. You can also change this later by editing dataimago-spec.yaml in your repo.

Add as many entries as you need. Each entry has the following fields:

  • Short id (kebab-case)
  • Chapter title

Entry 1:

  • Short id (kebab-case): ___________________________________
  • Chapter title: ___________________________________

Entry 2: (add more as needed)

  • Short id (kebab-case): ___________________________________
  • Chapter title: ___________________________________

Q15. Citation style + thesis class file

This question has 2 parts — answer each below.

What citation style does your institution require?

Choose one:

  • ☐ APA 7th edition
  • ☐ Chicago
  • ☐ MLA
  • ☐ Harvard
  • ☐ Custom (specify in spec.yaml)

How should your thesis PDF be formatted?

How your LaTeX-typeset thesis PDF gets formatted. If unsure, pick the framework default — you can switch later.

Choose one:

  • ☐ Use the framework's default formatting (recommended if unsure)
  • ☐ I have my institution's official .cls file (I'll add it after generation)
  • ☐ Generate a custom .cls from my institution's formatting guidelines (I'll upload them in Section 5)

Q16. Which R package ecosystems do you build around?

If you're an experienced R user with strong preferences — you consistently use the tidyverse, you prefer data.table for performance, you're committed to base R — tell us. Your dissertation's R package will adopt those conventions. If you're not sure or don't have strong preferences, leave this blank. AI assistance during your dissertation work will choose sensible defaults appropriate to your methodology, and you can adjust later. For most users this is the right answer — let the AI help structure the R package around your ideas. We're learning that AI-assisted packages tend to be better-structured and better-documented than hand-rolled ones, even for experienced programmers; the same logic applies here.

The more detail, the better. This question is optional and most users should leave it blank. Only fill it in if you have established strong R-ecosystem preferences over years of work that you want carried into the dissertation package. Otherwise, trust the framework + AI to choose well — you can always adjust later by editing dataimago-spec.yaml in your repo.

Choose all that apply:

  • ☐ tidyverse (dplyr, ggplot2, tidyr, ...)
  • ☐ data.table (high-performance data manipulation)
  • ☐ collapse (fast statistical & data manipulation)
  • ☐ base R only (no extra dependencies)
  • ☐ Stan / rstan / brms / cmdstanr (Bayesian)
  • ☐ tidymodels (modeling framework)
  • ☐ Shiny (interactive web apps)
  • ☐ targets (reproducible pipelines)

Section 4 — Your committee ~3–7 min

Who's on your dissertation committee. The first entry is your chair; the chair is structurally distinct and weighted most heavily for voice/style modeling.

Q17. Who's on your dissertation committee?

First entry is your chair. The chair is structurally distinct — they sign first on the signature page, are listed first in the README, and become the default reviewer suggestion for major-decision PRs. You can add or remove committee members later.

Add as many entries as you need. Each entry has the following fields:

  • Role
  • Name (with title, e.g., Dr. Jane Doe)
  • Email (optional) (optional)
  • Institution (if different from yours) (optional)

Entry 1:

  • Role: ___________________________________
  • Name (with title, e.g., Dr. Jane Doe): ___________________________________
  • Email (optional): ___________________________________
  • Institution (if different from yours): ___________________________________

Entry 2: (add more as needed)

  • Role: ___________________________________
  • Name (with title, e.g., Dr. Jane Doe): ___________________________________
  • Email (optional): ___________________________________
  • Institution (if different from yours): ___________________________________

Section 5 — Sources & context (document bundle) ~30–60 min optional

The section where AI assistance gets its teeth. A user who invests 30–45 minutes here gets AI that knows their literature, their advisor's voice, and the writing register they aspire to. A user who skips this section gets generic AI assistance — fully functional, but less tailored.

Q18. Upload the document(s) your institution supplies describing thesis formatting requirements.

Required if you chose "Generate a custom .cls" in Q15; otherwise optional but useful documentation. Goes to your repo at `context/thesis-formatting/`.

The more detail, the better. One document is typically sufficient; some institutions split requirements across multiple (general formatting + discipline- specific addendum). Describe each so AI knows the scope.

Uploads happen in the live UI — you cannot attach files via this handout.

Q19. Upload research documents you're building on, citing, critiquing, or replicating.

Papers, drafts, your own qualifying paper, foundational works. These shape every chapter-drafting suggestion. The more carefully you curate here, the better the AI's literature engagement will be. Aim for 5–20 documents (more is fine; AI weights documents by your `description` field).

The more detail, the better. For each document, write a 2–3 sentence description: what's the document about, why is it on your list, what's your relationship to it (building on / citing / critiquing / replicating). This metadata matters more than you might think — it's what tells the AI how to weight the document.

Uploads happen in the live UI — you cannot attach files via this handout.

Q20. Upload published work by your chair and committee members.

Your dissertation should engage with your committee's intellectual project. These documents let the AI suggest citations, model voice + register appropriate to your committee, and help you anticipate feedback. Aim for 2–3 representative papers per committee member.

The more detail, the better. The 'role' field is required: chair or committee. The AI weights chair's writing most heavily — your chair is your primary editorial relationship, and AI suggestions should reflect their priorities.

Uploads happen in the live UI — you cannot attach files via this handout.

Q21. Upload writing you aspire to emulate.

Academic or non-academic. The clarity, voice, structure, or register you want your dissertation to achieve. The AI uses these to tone-match its chapter-drafting suggestions. A handful (2–5) is sufficient — depth over breadth matters here.

The more detail, the better. Examples: a well-written dissertation in your subfield; a Bayesian Data Analysis chapter; a New Yorker science article; the introduction of a book you admire. Be specific in the description about *what* you want to emulate — the precise quality you're aiming for.

Uploads happen in the live UI — you cannot attach files via this handout.

Q22. Upload other supporting documents.

Anything else the AI should know about: methodological references your chair recommended, departmental style guides, fieldwork notes, lab protocols, prior coursework that's directly relevant. Catch-all category; skip if nothing fits.

Uploads happen in the live UI — you cannot attach files via this handout.

Q23. Anything else AI should know about your project?

Free-form. Constraints (e.g., DUAs around your data), unusual project elements (multi-language work, multi-site fieldwork), personal context that affects your timeline, ethical considerations, anything you wish your AI assistant knew before drafting your first chapter.

Your answer:

When you're done

Go to dissertation-ai.dataimago.ai/onboard. There are two ways to get your answers in:

Option 1: Transcribe. Type your answers directly into the form, question by question — useful if you want to refine your phrasing while typing. The form auto-saves to your browser as you go, so you can leave and come back.

Option 2: Import the JSON template. Look for the Import JSON button at the top of the form (in the strip showing "Saved …"). Paste your filled dissertation-onboarding-template.json content and click "Import these answers". The form populates with your text answers instantly.

About file uploads (Section 5)

Whether you transcribe or import JSON, you upload your actual files through the live UI's drag-and-drop area — JSON can't carry file bytes. You can fill in everything else first, then upload files at the end.

For the AI-assisted workflow: if you draft your dissertation answers in a document and use AI to convert them to JSON, the AI can produce metadata-shaped entries for your file-upload questions even though it can't include the file bytes. Each entry should follow this shape (use the field set appropriate to each category):

"researchDocuments": [
  {
    "file": { "name": "rubin-1976-inference-and-missing-data.pdf" },
    "metadata": {
      "title": "Inference and missing data",
      "author": "Donald Rubin",
      "year": "1976",
      "description": "The foundational MAR/MNAR taxonomy I'm working within.",
      "relationship": "building-on"
    }
  }
]

On import, the form populates with these metadata-shaped entries — your titles, authors, descriptions, and relationships pre-fill the per-file metadata form. You then drag in the actual PDFs (the file descriptor's name is just a placeholder; the file bytes come from your drag-and-drop). A future iteration (D.2.4) will match the placeholder names to the dropped files automatically; for now, the metadata is preserved but you may need to manually associate dropped files with the right entry if filenames differ.

Generate

Once your answers are in and your files are uploaded, click Review answers to see everything in one place. From the review screen, Generate my platform provisions your dissertation repositories.

Additional information (fill in over time)

Your dataimago-spec.yaml accommodates several optional fields the onboarding interview deliberately doesn't ask about. They're real dissertation concerns — they affect your title page, signature page, README, and front matter — but they're not what you should be initially burdened with looking up. Fill them in by editing dataimago-spec.yaml directly as they become relevant. AI assistance in your repo can help.

Spec fieldWhen it matters
institution.submissionDeadlineWhen your defense date solidifies — affects timeline-aware AI suggestions
institution.archiveUrlThe institutional dissertation library URL where you'll finally deposit
institution.administratorContactThe person to email for procedural questions
thesis.embargoIf you plan to embargo (e.g., during a journal publication window)
thesis.coauthorsIf your dissertation is multi-authored
thesis.classFile.guidelinesDocumentsIf your institution publishes a format guide (Q21 in this handout)
compliance.irbIf your work needs IRB approval — number goes on the title page at many institutions
compliance.duaIf you have data-use agreements that restrict what you can commit publicly
fundingIf you have grant or fellowship support to acknowledge

The principle: the spec.yaml is the complete representation of your dissertation; the interview asks only for the essentials. Add structured logistical data over time as it becomes settled — the same AI assistance you use for drafting chapters can help you fill these fields in.