Final Report Prompts

Author

Lindsay Poirier

Why am I asking you to do this?

Writing a final report has historically been a required component of the capstone course for a few reasons:

  1. The Statistical & Data Sciences Program is committed to ensuring that our students learn the skills necessary to become exemplary writers within our field. Because of this SDS recently adopted a curricular model called Writing Enriched Curriculum, which has enabled us to articulate the writing skills we hope our graduates will acquire. Since all majors are required to take capstone, the SDS Program has prioritized capstone as a space where writing should be meaningfully integrated into the curriculum. In other words, capstone is a space where you have an opportunity to demonstrate how your writing skills have developed.
  2. Since you are working with a sponsor, the final report serves as an important resource for generating detailed documentation of your project, including its aims, its methods, and its outcomes.
  3. The drafting of a report gives you an opportunity to practice producing text that aligns with the genre conventions of our field.
  4. The final report can hopefully serve as important portfolio material for you.

First Draft

Please note that the first draft of the report should be submitted in the MDPI Rticles format with all citation metadata included in a separate .bib file and referenced in-line.

Introduction

Your introduction should be 300-500 words and should introduce your project to an audience unfamiliar with our capstone course. This section will be some combination of your problem definition, sponsor description, the scope of your solution, and an outline of the remaining sections of the report.

Problem Definition

Be prepared to submit 300-500 detailing the problem that your project is addressing. Your goal in this section is to convince a stakeholder that this problem exists by describing the problem and then gathering evidence to support its existence. No solutions should be referenced at this point! Be sure to cite sources (if you’re not sure how to sync with bibliography.bib, no worries. We will go over that soon.)

As you are proving the existence of this problem, you may decide to organize your thoughts according to different scales of the problem, outline different causes of the problem, or to reference different stakeholders that may be impacted by the problem. …while these provide good context for structuring your thinking, they shouldn’t distract from the main goal - convincing that this problem exists.

Sponsor Description

In a .qmd file, submit 300-500 words, introducing the organization that you are working with. In your write-up, you should describe organization and its overarching mission, along with the mission of the specific teams in the organization that you are working with. Finally, you should describe the sponsor’s “ask” and how this relates to their mission.

As you prepare your introduction, you should think about how you can synthesize some of what you wrote in these earlier sections into a piece writing that will frame your project to an unfamiliar audience.

Note that this will require more than just copying and pasting previously written sections of the report. You should think carefully about how to integrate this writing in a meaningful way to introduce your project.

Data Description

In 300-500 words, introduce the data resources that you will be working with to a lay audience. Some teams might not yet know all of these data sources, so you need only detail the ones that you know you will be working with.

I have found in the past that detailing datasets to an audience unfamiliar with that data is really challenging. We have a tendency to want to describe the data’s technical details rather than summarizing more generally what the data represents. I’ve included an example below to help you think through how to approach this. At the very least this section should include:

  • What the data represents
  • Who produced it
  • Its observational unit
  • Some variables that describe that observational unit (but be careful about getting too technical here)
  • A bit of its history/context
  • A few data limitations

To address this problem, we analyzed a dataset put out by the US EPA [<– who produced it] that documents the environmental compliance and enforcement history of every EPA-regulated facility in the US, including prisons [<– what it represents]. For every EPA-regulated facility in the US in a given year [<– observational unit], the dataset reports information such as the permits the facility has been awarded, enforcement actions taken against the facility, and penalties it has assessed [<– variables]. The EPA has been integrating this data from a number of different compliance databases for major federal regulations (such as the Clean Air Act and the Safe Drinking Water Act) for over ten years [<– history/context]. The data tends to be more comprehensive for larger facilities than for smaller facilities and does not include information about compliance with all environmental laws [<–limitations].

Detailed Methodology

This section should be about 800-1000 words. This is going to look quite different for each of the projects. This section might include things like:

  1. How did you acquire the data?
  2. How did you clean and format the data?
  3. How did you integrate the data?
  4. How did you determine which metrics to use or how to calculate certain metrics?
  5. How did you analyze the data?
  6. What assumptions did you have to make when drawing conclusions from the data?
  7. How did you automate certain processes?
  8. What was the purpose of certain key functions you wrote?
  9. Why did you select certain plots to visualize the data?

As you are drafting this section, I encourage you to think about the key information a non-technical audience would need in order to understand how you plugged along towards the development of your deliverables. Remember that this is a draft, and aspects may change or be extended as you continue the project.

Findings/Outputs

This section should be about 600-800 words. This is going to look quite different for each of the projects, depending on the scope of your deliverables. This section might include things like:

  1. What were the primary findings of your statistical analyses (in relation to original research questions)? (e.g. “We found that…[give us key numbers, stats, and summaries of figures].”)
  2. What were some of the secondary findings of your analyses (perhaps beyond the scope of the original research questions)?
  3. How might we contextually interpret the project findings? (e.g. “This indicates that…”)
  4. What products did you produce (e.g. R packages, shiny dashboards), and what were some of their key features?
  5. In what ways did your deliverables match the expectations of the project sponsor?

Conclusion

Your conclusion should be about 300-500 words and do four things:

  1. Briefly summarize what you completed in your project
  2. Discuss how your project addressed (or didn’t address) the problems outlined in your problem statement
  3. Outline some of the limitations of your approach
  4. Present some suggestions for further work (i.e. How could this project be extended with more time and resources?)

Final Draft

Be sure that the final draft includes a brief abstract summarizing the contents of the report, along with keywords.

Ethics Statement

In 400-500 words, identify ethical issues that emerged in the course of your project, along with how your team grappled with/addressed them. Issues may relate to:

  1. Data availability
  2. Reductionist data categorization or semantics
  3. Privacy and/or confidentiality concerns
  4. Potentially incorrect assumptions your team needed to make (or use of proxies)
  5. Possibility of algorithmic biases
  6. Reliance on certain data infrastructures
  7. Your team’s cultural/social identities (and what identities are not at the table)?
  8. Your team’s knowledge of/relationship with the topic or organization
  9. Challenges in public data communication

I imagine that most teams will focus on 1-3 of these topics - not all of them.

For each topic you cover in this section, you should be sure to discuss four things - 1) what was the issue? 2) why did the issue come up? 3) what social harm might emerge from the issue? 4) how did your team respond to this issue? However, I encourage you not to think of this like a checklist. The text of this section should be well-integrated into the flow of your report.