Quarto for an Academic Website

website

I continue my long search for a way to generate a nicely formatted website with publication list based on adding publication information to a single source of truth without re-remembering how all the formatting works each time.

Author
Affiliation
Published

May 11, 2022

Intro

I’ve never been good at keeping my website updated. I always go through two different phases of maintenance:

  1. Rushing around creating a new website with bells and whistles using whatever the flavor of the month is
  2. Never updating an existing website

I’m hoping to break out of this cycle, but am currently solidly within Phase 1.

A highlight from my time in Phase 2 was when I forgot to update my DNS and I totally lost control of drewdimmery.com (don’t go there, it has a squatter). I think my website at that time was some Octopress monstrosity. There are a few reasons I think Quarto might help with my vicious circle.

  • Serving static HTML pages is about as easy as it gets
  • Very little Quarto-specific syntax to recall (e.g. CLI commands or abstruse markup)
  • Lots of flexibility (Python / R) in how to generate that static content
  • Full programmability means that generation can be based on arbitrary data structures of my choosing

I previously used Hugo Academic for building my website, which was much better than just editing the content directly, but I never remembered the right way to generate a new publication definition (there was a CLI, but I never remembered the syntax). Each publication got its own file describing its details, and I found this quite clunky. I wanted something extremely lightweight: there isn’t much reason for my individual publications to get pages of their own, and I really don’t need a lot of information on each of them. I just want some basic information about each and a set of appropriate links to more details.

This post will detail how I’ve set up Quarto to accomplish this task. I’ve nearly completely separated the two main concerns around maintaining an academic website / CV, which to me are data on publications and software from the design elements of how to display them. It’s entirely possible that my particular issues are unique and this post won’t be useful to anyone else. Luckily, the marginal cost of words on the internet is essentially zero (and maybe the marginal value is, too).

Setup

Setting up Quarto was very easy, so I won’t belabor this. The combination of the Get Started guide with the Website Creation guide kept everything very straightforward. I also used Danielle Navarro’s post and her blog’s code to get everything set up.

I decided late in the setup process to add a blog, so I will mention that it’s actually very easy to do: it basically just requires adding a Listing page (i.e. the blog’s index), a folder to contain the various posts and a _metadata.yml file in that folder to describe global settings to apply to all posts. I just created these manually without too much trouble. This is one of the great things about building sites with tools like Quarto: everything is extremely transparent: just put a couple files in the right places and you’re good to go.

Site Design

To demonstrate how I’ve set things up to populate the website from data about my academic life, I’ll focus on my publications page. There are two main files undergirding this page:

papers.yaml
a data file in YAML with standardized information on each publication. I chose YAML because it’s fairly easy to write correctly formatted YAML by hand (and I’ll be updating)
research.qmd
The page which takes the data in papers.yaml and turns it into nicely formatted Markdown / HTML. This is setup as a Jupyter-backed qmd file (essentially a Jupyter notebook).

This idea of separating the data side (information about publications) from formatting is aimed at making my life easier. One of the reasons I often stop updating my website is because when I come back in 3 months with a new publication, I never remember all the details about how I formatted entries in whatever flavor of Bootstrap I happened to be using when I built the website. Moreover, because I know that there’s a barrier to understanding before I can get started, it’s extremely easy to put off (and therefore it never gets done).

By separating out the data entry from the formatting, this simplifies matters substantially.

Data

I put data about each publication in a basic YAML format:

See example data
softblock:
  title: Efficient Balanced Treatment Assignments for Experimentation
  authors:
    - David Arbour
    - me
    - Anup Rao
  year: 2021
  venue: AISTATS
  preprint: https://arxiv.org/abs/2010.11332
  published_url: https://proceedings.mlr.press/v130/arbour21a.html
  github: https://github.com/ddimmery/softblock

This is basically like a simplified bibtex entry with more URLs so I can annotate where to find replication materials for a given paper, as well as distinguish between preprints (always freely accessible) versus published versions (not always open access). A convenience that I add in the markup here is referring to myself as me in the author list (which is an ordered list). This allows me to add in extra post-processing to highlight where I sit in the author list.

Some additional things I considered adding but chose to ignore for a first version:

  • An abstract
  • A suggested bibtex entry

Both of these would be easy to add, but I chose to start simpler. I don’t love YAML for entering long blocks of text, which both of these are.

Formatting

Since I can write the generation logic for page in Python, this puts me on comfortable ground to hack something together. To knit the above publication data into HTML, I just literally bind together the programmatically generated raw HTML and print it onto the page.

I do a couple additional useful things in this process: - Separate out working papers or non-archival papers from published work (I make this distinction based on whether I include a published_url field or not). - Order and categorize papers by year - Provide nice Bootstrappy buttons for external links (e.g. to Preprints / Code / etc)

See research.qmd fragment
import yaml
from IPython.display import display, Markdown, HTML

def readable_list(_s):
  if len(_s) < 3:
    return ' and '.join(map(str, _s))
  *a, b = _s
  return f"{', '.join(map(str, a))}, and {b}"

def button(url, str, icon):
    icon_base = icon[:2]
    return f"""<a class="btn btn-outline-dark btn-sm", href="{url}" target="_blank" rel="noopener noreferrer">
        <i class="{icon_base} {icon}" role='img' aria-label='{str}'></i>
        {str}
    </a>"""

yaml_data = yaml.safe_load(open("papers.yaml"))
pub_strs = {"pubs": {}, "wps": {}}
for _, data in yaml_data.items():
    title_str = data["title"]
    authors = data.get("authors", ["me"])
    authors = [
        aut if aut != "me" else "<strong>Drew Dimmery</strong>" for aut in authors
    ]
    author_str = readable_list(authors)
    year_str = data["year"]

    buttons = []
    preprint = data.get("preprint")
    if preprint is not None:
        buttons.append(button(preprint, "Preprint", "bi-file-earmark-pdf"))

    github = data.get("github")
    if github is not None:
        buttons.append(button(github, "Github", "bi-github"))

    pub_url = data.get("published_url")
    venue = data.get("venue")
    working_paper = pub_url is None
    
    pub_str = f'{author_str}. ({year_str}) "{title_str}."'

    if venue is not None:
        pub_str += f" <em>{venue}</em>"

    if working_paper:
        if year_str not in pub_strs["wps"]:
            pub_strs["wps"][year_str] = []
        pub_strs["wps"][year_str].append(
            "<li class='list-group-item'>" + pub_str + "<br>" + " ".join(buttons) + "</li>"
        )
    else:
        if year_str not in pub_strs["pubs"]:
            pub_strs["pubs"][year_str] = []
        buttons.append(button(pub_url, "Published", "ai-archive"))
        pub_strs["pubs"][year_str].append(
            "<li class='list-group-item'>" + pub_str + "<br>" + " ".join(buttons) + "</li>"
        )

I then print this out using the display functions from the IPython module and using the asis chunk option:

See research.qmd fragment
for year in sorted(pub_strs["pubs"].keys(), reverse=True):
    display(Markdown(f"### {year}" + "{#" + f"published-{year}" + "}"))
    display(HTML(
        "<ul class='list-group list-group-flush'>" + '\n'.join(pub_strs["pubs"][year]) + "</ul>"
    ))

The full code is on GitHub.

It’s worth noting that to get the years to show up in the Table of Contents its necessary to be careful exactly how the content is stuck onto the page. If you don’t use the asis chunk option, you can still get all the right content to show up, but it won’t necessarily appear in the ToC. I also found it necessary to include section-divs: false in the header, or else the output would get wrapped in additional div tags which made it harder to get the right classes in the right divs. There are probably more elegant ways to do all of this.

I use the same basic setup to populate the Software page, albeit with simpler logic.

Additions

I debated adding an abstract that expands out on click (like the code folding above in this post). This would actually be more or less trivial to add using a <details> HTML tag if I wanted to provide the data in the YAML. I’m ignoring this for now because I want to minimize data entry for my future self (and it’s anyway just a click away at the Preprint link).

Deployment

It’s extremely easy to build a new version of the website locally (quarto render from CLI), but there’s no guarantee I’ll remember that off the top of my head in a month without Googling, so I think it’s worthwhile to setup automatic building after I push a commit to GitHub.

GitHub Actions is incredible. I adapted the example config from Quarto to the following (also on GitHub here):

GitHub Actions for Netlify
on:
  push:
    branches: main
  pull_request:
    branches: main
  # to be able to trigger a manual build
  workflow_dispatch:

name: Render and deploy website to Netlify

jobs:
  build-deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      - uses: actions/setup-python@v3
        with:
          python-version: '3.9'
          cache: 'pip'
      - run: pip install -r requirements.txt

      - uses: r-lib/actions/setup-r@v2
        with:
          use-public-rspm: true

      - uses: r-lib/actions/setup-renv@v2
      
      - name: Install Quarto
        uses: quarto-dev/quarto-actions/install-quarto@v1
        with:
          # To install LaTeX to build PDF book 
          tinytex: true 
          # uncomment below and fill to pin a version
          # version: 0.9.105

      - name: Render website
        # Add any command line argument needed
        run: |
          quarto render
      - name: Deploy to Netlify
        id: netlify-deploy
        uses: nwtgck/actions-netlify@v1
        with:
          # The folder the action should deploy. Adapt if you changed in Quarto config
          publish-dir: './_site'
          production-branch: main
          github-token: ${{ secrets.GITHUB_TOKEN }}
          deploy-message:
            'Deploy from GHA: ${{ github.event.pull_request.title || github.event.head_commit.message }} (${{ github.sha }})'
          enable-pull-request-comment: true #  Comment on pull request
          enable-commit-comment: true # Comment on GitHub commit
          enable-commit-status: true # GitHub commit status 
        env:
          NETLIFY_AUTH_TOKEN: ${{ secrets.NETLIFY_AUTH_TOKEN }}
          NETLIFY_SITE_ID: ${{ secrets.NETLIFY_SITE_ID }}
        timeout-minutes: 1

This Action requires two pieces of information from Netlify entered as secrets in GitHub. The NETLIFY_SITE_ID may be found in the site configuration settings, while the NETLIFY_AUTH_TOKEN may be found in personal settings (the personal access token).

One thing I have not yet done is set up an renv to ensure dependencies for blog posts are taken care of in GitHub Actions. This means that posts like the experimental design demo can’t be knit via GitHub Actions. I did this for two reasons (other than laziness). First, it’s a pain to get GIS tools working on any environment (ok, so its part laziness). I’ve actually done this before for automated R CMD checking of the regweight package, but didn’t feel like it was worthwhile here.

The reason it’s not worth it is that Quarto has a great feature called “freezing”. Essentially, it knits blog posts or pages, and only re-renders them when something about the source changes. This means that the vast majority of posts don’t need to be rendered on each build. If I’m working on a blog post, I can write it locally, render on my machine, commit that pre-rendered post and then all future builds on Actions won’t get held up by their inability to render that post.

As I type this, it becomes clear that I’ll forget how to do this pretty often (given that there’s been about an 8 year delay since my next most recent blog, I likely won’t stay in practice). But blogs aren’t my main concern on my website: keeping a software and publication list up-to-date is.

Setting up Actions means that simple updates to pages (or YAML files) can actually be done directly in the GitHub editing UI, which further lowers the barrier for my future self. I don’t even need to clone the repository to whatever computer I’m working on to add a publication!

Future dreams

I imagine my CV is similar to most academics’ in that it’s built like a house of cards (and overfull hboxs). Whenever I add something new to it, I have to copy some lines from elsewhere and modify them to fit the new entry. This always takes me way more time than I’d like. If I mashed together my current About page with the Research page, it’s like 90% of the way to a full CV. It should presumably be pretty easy to do explicitly combine them and output a reasonable-looking CV.

This is a project for another day, though. Too much of the Research page directly outputs HTML, which makes it difficult to naïvely import into a \LaTeX CV.

An almost completely naïve approach to directly importing the relevant pages creates this ugly document.

Naïve CV
---
title: "Curriculum Vitae"
format: pdf
---

{{< include about.qmd >}}

{{< include research.md >}}

It’s definitely possible to improve on this. The easiest hacky approach is to just write a whole alternative version of the HTML formatting code which resides in research.qmd to output appropriately formatted \LaTeX markup.

For now, I’m pretty pleased with the system I have, but ask me again in three months.

Reuse

Citation

BibTeX citation:
@online{dimmery2022,
  author = {Drew Dimmery},
  title = {Quarto for an {Academic} {Website}},
  date = {2022-05-11},
  url = {https://ddimmery.com/posts/quarto-website},
  langid = {en}
}
For attribution, please cite this work as:
Drew Dimmery. 2022. “Quarto for an Academic Website.” May 11, 2022. https://ddimmery.com/posts/quarto-website.