# dissertation.ai onboarding handout

> Generated from `dissertation.yaml` on 2026-06-19.

This interview captures the context AI needs to be genuinely useful to your
dissertation work. It is intentionally substantial — most users complete it
in 75–120 minutes. You may save your draft and continue later, or complete
the printable handout offline first and paste your answers here.


## How to use this handout

The dissertation.ai onboarding interview asks **23 questions across 5 sections** and takes **75–120 minutes typical**. This handout helps you prepare your answers in advance.

Two workflows are supported:

1. **Read + transcribe (deep).** Read each question carefully, think, draft your answers in this document or in your favorite editor. At session time, type your answers directly into the live interview UI at <https://dissertation-ai.dataimago.ai/onboard>.
2. **Fill the JSON template (efficient).** Download `dissertation-onboarding-template.json` alongside this handout. Fill in your answers in the JSON file. Paste the resulting JSON into the live UI's **Import JSON** affordance for instant population.

Either path produces the same dissertation environment. The JSON path is faster; the markdown path supports richer thinking + AI assistance (paste this whole handout into Claude / ChatGPT and ask for help drafting your answers — the structure is designed for that).

**Section 2 (Your research) is the highest-leverage section.** The four long-form questions there shape every AI-assisted suggestion in your dissertation environment for the next 1–3 years. Spend real time there — 30–60 minutes is appropriate. The other sections are quicker.

**Document uploads (Section 5) are not in this handout.** Files have to be uploaded through the live UI directly. The handout helps you decide which documents to upload and prepares the descriptive metadata for each.

---

## Section 1 — Identity · ~2–5 min

The minimum we need to provision your dissertation repos.

### Q1. What's your name?

_Appears on your thesis title page._

**Your answer:** _____________________________________________

### Q2. What's your institutional email?

**Your answer:** _____________________________________________

### Q3. What's your GitHub username?

_We'll create your dissertation repos under your account._

**Your answer:** _____________________________________________

### Q4. Which institution will you defend at?

**Your answer:** _____________________________________________

### Q5. What's your degree program?

_e.g., Ph.D. Statistics, Ed.D. Curriculum and Instruction_

**Your answer:** _____________________________________________


---

## Section 2 — Your research · ~30–60 min

The highest-leverage section of the interview. The four narrative fields
here shape every chapter draft, every literature suggestion, every
AI-assisted writing decision for the next 1–3 years. Spend real time
here — 30–60 minutes is appropriate.

### Q6. What's your working dissertation title?

_Don't worry about making it final — committees routinely refine the
title in the last semester. A working title here just helps orient
the project._

**Your answer:** _____________________________________________

### Q7. What is the central research question (or questions) your dissertation answers?

_Be precise. A dissertation has one or two central questions; everything
else is in service of answering them. State the question(s) as you would
state them in your proposal defense — well-formed, with appropriate
scope, identifying what's empirical / theoretical / methodological
about them._

> **The more detail, the better.**
>
> Spend 15–30 minutes here. The clearer your research question, the more
> useful every AI suggestion in your dissertation will be. If you have
> two questions, distinguish them clearly — primary vs. secondary, or
> two facets of one larger inquiry.

**Your answer:**

```


```

### Q8. What brought you to this topic? What's the intellectual + personal context?

_What does the AI assistant need to know about *you and this project*
that it can't infer from your research question alone? Your prior
work, the conversations that shaped the inquiry, the gap in the
literature you noticed, the practical problem that motivates the work.
Multi-paragraph is appropriate._

> **The more detail, the better.**
>
> Two or three paragraphs. Cover both the intellectual genealogy (what
> work led you here, what scholarly conversations you're joining) and
> the personal / practical motivation (why this question, why now,
> why you).

**Your answer:**

```


```

### Q9. What is the novel contribution? Why does this dissertation matter?

_Three or four sentences minimum. State the contribution as a numbered
list if you have multiple. Distinguish theoretical contribution,
methodological contribution, empirical contribution, and practical
contribution where relevant — most dissertations make 1–3 of these._

> **The more detail, the better.**
>
> This is the elevator pitch for your dissertation. If a senior scholar
> in your field asked "what's new about your dissertation?", what would
> you say? The AI uses this to weight chapter-drafting against your
> contribution claims.

**Your answer:**

```


```

### Q10. What's your methodology?

This question has 2 parts — answer each below.

#### Primary methodological approach

**Choose one:**

- [ ] Quantitative (statistical models, surveys, experiments)
- [ ] Qualitative (interviews, ethnography, textual analysis, archival)
- [ ] Mixed-methods (both quantitative and qualitative)
- [ ] Computational (simulation, algorithm design, ML, mathematical modeling)

#### Describe your methodology — whatever you have so far.

_If you're early in your dissertation and still figuring out your
methodology, that's expected — write what you know, even one or
two sentences about your general approach. You can refine this
later by editing your project's dataimago-spec.yaml.

If you have a clearer picture: 2–4 paragraphs covering your data
source(s), analytic approach, statistical / qualitative /
computational techniques, unit of analysis, and key methodological
challenges + how you'll address them._

> **The more detail, the better.**
>
> The more detail you provide, the better AI assistance can engage
> with your methodological choices. But this question is also OK to
> answer briefly — your methodology will sharpen as your dissertation
> progresses, and you can update this field anytime.
> 
> At the detailed end: "I'll use regression" is too generic; "I'll
> use hierarchical Bayesian regression with weakly informative
> priors, fit via Stan, with HMC diagnostics following Vehtari et
> al." is the level of specificity that's most useful.

**Your answer:**

```


```


---

## Section 3 — Dissertation structure · ~5–12 min

How your project is organized: R package, chapters, citation style,
R ecosystem preferences.

### Q11. Do you already have an R package for your dissertation research?

_If you have an existing R package (one with a DESCRIPTION file, in a
repo you can point us at), choose "Yes". If you don't — including if
you have R scripts that aren't yet structured as a package, or if you
don't write R yet at all — choose "No" and we'll create a fresh R
package alongside your dissertation repo, modeled on dataimago-rpkg's
structure. AI assistance during your dissertation work can help you
migrate any existing scripts into the new package over time; you
don't need to organize them in advance._

**Choose one:**

- [ ] Yes, I have an R package in a repo
- [ ] No — please create a fresh one for me

### Q12. What's the URL of your existing R package repo?

_Answer this only if your earlier answer was Q11 → Yes, I have an R package in a repo._

_e.g., https://github.com/jsmith/smith-thesis-rpkg_

**Your answer:** _____________________________________________

### Q13. What should we name your new R package?

_Answer this only if your earlier answer was Q11 → No — please create a fresh one for me._

_Kebab-case. Convention: <yourname>-thesis-rpkg, e.g., smith-thesis-rpkg.
The package will be modeled on dataimago-rpkg with ui/www as its
Quarto root._

**Your answer:** _____________________________________________

### Q14. What chapters will your dissertation have?

_We pre-fill a standard outline based on your methodological approach.
Edit titles, add chapters, remove chapters, or reorder freely. You
can also change this later by editing dataimago-spec.yaml in your repo._

Add as many entries as you need. Each entry has the following fields:

**Per-entry fields:**
- **Short id (kebab-case)** 
- **Chapter title** 

**Entry 1:**
- Short id (kebab-case): _____________________________________________
- Chapter title: _____________________________________________

**Entry 2:** _(add more as needed)_
- Short id (kebab-case): _____________________________________________
- Chapter title: _____________________________________________


### Q15. Citation style + thesis class file

This question has 2 parts — answer each below.

#### What citation style does your institution require?

**Choose one:**

- [ ] APA 7th edition
- [ ] Chicago
- [ ] MLA
- [ ] Harvard
- [ ] Custom (specify in spec.yaml)

#### How should your thesis PDF be formatted?

_How your LaTeX-typeset thesis PDF gets formatted. If unsure,
pick the framework default — you can switch later._

**Choose one:**

- [ ] Use the framework's default formatting (recommended if unsure)
- [ ] I have my institution's official .cls file (I'll add it after generation)
- [ ] Generate a custom .cls from my institution's formatting guidelines (I'll upload them in Section 5)


### Q16. Which R package ecosystems do you build around?

_If you're an experienced R user with strong preferences — you
consistently use the tidyverse, you prefer data.table for
performance, you're committed to base R — tell us. Your
dissertation's R package will adopt those conventions.

If you're not sure or don't have strong preferences, leave this
blank. AI assistance during your dissertation work will choose
sensible defaults appropriate to your methodology, and you can
adjust later. For most users this is the right answer — let the
AI help structure the R package around your ideas. We're learning
that AI-assisted packages tend to be better-structured and
better-documented than hand-rolled ones, even for experienced
programmers; the same logic applies here._

> **The more detail, the better.**
>
> This question is optional and most users should leave it blank.
> Only fill it in if you have established strong R-ecosystem
> preferences over years of work that you want carried into the
> dissertation package. Otherwise, trust the framework + AI to
> choose well — you can always adjust later by editing
> dataimago-spec.yaml in your repo.

**Choose all that apply:**

- [ ] tidyverse (dplyr, ggplot2, tidyr, ...)
- [ ] data.table (high-performance data manipulation)
- [ ] collapse (fast statistical & data manipulation)
- [ ] base R only (no extra dependencies)
- [ ] Stan / rstan / brms / cmdstanr (Bayesian)
- [ ] tidymodels (modeling framework)
- [ ] Shiny (interactive web apps)
- [ ] targets (reproducible pipelines)


---

## Section 4 — Your committee · ~3–7 min

Who's on your dissertation committee. The first entry is your chair;
the chair is structurally distinct and weighted most heavily for
voice/style modeling.

### Q17. Who's on your dissertation committee?

_First entry is your chair. The chair is structurally distinct — they
sign first on the signature page, are listed first in the README, and
become the default reviewer suggestion for major-decision PRs. You
can add or remove committee members later._

Add as many entries as you need. Each entry has the following fields:

**Per-entry fields:**
- **Role** 
- **Name (with title, e.g., Dr. Jane Doe)** 
- **Email (optional)** (optional)
- **Institution (if different from yours)** (optional)

**Entry 1:**
- Role: _____________________________________________
- Name (with title, e.g., Dr. Jane Doe): _____________________________________________
- Email (optional): _____________________________________________
- Institution (if different from yours): _____________________________________________

**Entry 2:** _(add more as needed)_
- Role: _____________________________________________
- Name (with title, e.g., Dr. Jane Doe): _____________________________________________
- Email (optional): _____________________________________________
- Institution (if different from yours): _____________________________________________


---

## Section 5 — Sources & context (document bundle) · ~30–60 min · _optional_

The section where AI assistance gets its teeth. A user who invests
30–45 minutes here gets AI that knows their literature, their advisor's
voice, and the writing register they aspire to. A user who skips this
section gets generic AI assistance — fully functional, but less tailored.

### Q18. Upload the document(s) your institution supplies describing thesis formatting requirements.

_Required if you chose "Generate a custom .cls" in Q15; otherwise
optional but useful documentation. Goes to your repo at
`context/thesis-formatting/`._

> **The more detail, the better.**
>
> One document is typically sufficient; some institutions split
> requirements across multiple (general formatting + discipline-
> specific addendum). Describe each so AI knows the scope.

_(Uploads happen in the live UI — you cannot attach files via this handout.)_

**For each file you plan to upload, prepare the following metadata in advance:**


### Q19. Upload research documents you're building on, citing, critiquing,
or replicating.


_Papers, drafts, your own qualifying paper, foundational works.
These shape every chapter-drafting suggestion. The more carefully
you curate here, the better the AI's literature engagement will be.
Aim for 5–20 documents (more is fine; AI weights documents by your
`description` field)._

> **The more detail, the better.**
>
> For each document, write a 2–3 sentence description: what's the
> document about, why is it on your list, what's your relationship to
> it (building on / citing / critiquing / replicating). This metadata
> matters more than you might think — it's what tells the AI how to
> weight the document.

_(Uploads happen in the live UI — you cannot attach files via this handout.)_

**For each file you plan to upload, prepare the following metadata in advance:**


### Q20. Upload published work by your chair and committee members.

_Your dissertation should engage with your committee's intellectual
project. These documents let the AI suggest citations, model voice
+ register appropriate to your committee, and help you anticipate
feedback. Aim for 2–3 representative papers per committee member._

> **The more detail, the better.**
>
> The 'role' field is required: chair or committee. The AI weights
> chair's writing most heavily — your chair is your primary editorial
> relationship, and AI suggestions should reflect their priorities.

_(Uploads happen in the live UI — you cannot attach files via this handout.)_

**For each file you plan to upload, prepare the following metadata in advance:**


### Q21. Upload writing you aspire to emulate.

_Academic or non-academic. The clarity, voice, structure, or register
you want your dissertation to achieve. The AI uses these to tone-match
its chapter-drafting suggestions. A handful (2–5) is sufficient —
depth over breadth matters here._

> **The more detail, the better.**
>
> Examples: a well-written dissertation in your subfield; a Bayesian
> Data Analysis chapter; a New Yorker science article; the introduction
> of a book you admire. Be specific in the description about *what* you
> want to emulate — the precise quality you're aiming for.

_(Uploads happen in the live UI — you cannot attach files via this handout.)_

**For each file you plan to upload, prepare the following metadata in advance:**


### Q22. Upload other supporting documents.

_Anything else the AI should know about: methodological references
your chair recommended, departmental style guides, fieldwork notes,
lab protocols, prior coursework that's directly relevant. Catch-all
category; skip if nothing fits._

_(Uploads happen in the live UI — you cannot attach files via this handout.)_

**For each file you plan to upload, prepare the following metadata in advance:**


### Q23. Anything else AI should know about your project?

_Free-form. Constraints (e.g., DUAs around your data), unusual project
elements (multi-language work, multi-site fieldwork), personal
context that affects your timeline, ethical considerations, anything
you wish your AI assistant knew before drafting your first chapter._

**Your answer:**

```


```


---

## When you're done

Go to <https://dissertation-ai.dataimago.ai/onboard>. There are two ways to get your answers in:

**Option 1: Transcribe.** Type your answers directly into the form, question by question — useful if you want to refine your phrasing while typing. The form auto-saves to your browser as you go, so you can leave and come back.

**Option 2: Import the JSON template.** Look for the **Import JSON** button at the top of the form (in the strip showing "Saved …"). Paste your filled `dissertation-onboarding-template.json` content and click "Import these answers". The form populates with your text answers instantly.

### About file uploads (Section 5)

Whether you transcribe or import JSON, you upload your **actual files** through the live UI's drag-and-drop area — JSON can't carry file bytes. You can fill in everything else first, then upload files at the end.

**For the AI-assisted workflow:** if you draft your dissertation answers in a document and use AI to convert them to JSON, the AI can produce metadata-shaped entries for your file-upload questions even though it can't include the file bytes. Each entry should follow this shape (use the field set appropriate to each category):

```json
"researchDocuments": [
  {
    "file": { "name": "rubin-1976-inference-and-missing-data.pdf" },
    "metadata": {
      "title": "Inference and missing data",
      "author": "Donald Rubin",
      "year": "1976",
      "description": "The foundational MAR/MNAR taxonomy I'm working within.",
      "relationship": "building-on"
    }
  }
]
```

On import, the form populates with these metadata-shaped entries — your titles, authors, descriptions, and relationships pre-fill the per-file metadata form. You then drag in the actual PDFs (the file descriptor's `name` is just a placeholder; the file bytes come from your drag-and-drop). A future iteration (D.2.4) will match the placeholder names to the dropped files automatically; for now, the metadata is preserved but you may need to manually associate dropped files with the right entry if filenames differ.

### Generate

Once your answers are in and your files are uploaded, click **Review answers** to see everything in one place. From the review screen, **Generate my platform** provisions your dissertation repositories.

## Additional information (fill in over time)

Your dataimago-spec.yaml accommodates several optional fields the onboarding interview deliberately doesn't ask about. They're real dissertation concerns — they affect your title page, signature page, README, and front matter — but they're not what you should be initially burdened with looking up. **Fill them in by editing dataimago-spec.yaml directly as they become relevant.** AI assistance in your repo can help.

| Spec field | When it matters | Example |
|---|---|---|
| `institution.submissionDeadline` | When your defense date solidifies — affects timeline-aware AI suggestions | "2027-05-30" |
| `institution.archiveUrl` | The institutional dissertation library URL where you'll finally deposit | "https://gradworks.umi.com/..." |
| `institution.administratorContact` | The person to email for procedural questions | { name, email, role: "Dissertation Coordinator" } |
| `thesis.embargo` | If you plan to embargo (e.g., during a journal publication window) | { type: "two-year", reason: "..." } |
| `thesis.coauthors` | If your dissertation is multi-authored | [{ name, role, chaptersContributed }] |
| `thesis.classFile.guidelinesDocuments` | If your institution publishes a format guide (Q21 in this handout) | The uploaded PDF + AI-generated thesis.cls |
| `compliance.irb` | If your work needs IRB approval — number goes on the title page at many institutions | { status: "approved", number: "IRB-2024-..." } |
| `compliance.dua` | If you have data-use agreements that restrict what you can commit publicly | [{ source, restrictions }] |
| `funding` | If you have grant or fellowship support to acknowledge | [{ source, grantNumber, acknowledgmentText }] |

**The principle:** the spec.yaml is the complete representation of your dissertation; the interview asks only for the essentials. Add structured logistical data over time as it becomes settled — the same AI assistance you use for drafting chapters can help you fill these fields in.