<?xml version="1.0" encoding="UTF-8"?>
<rss  xmlns:atom="http://www.w3.org/2005/Atom" 
      xmlns:media="http://search.yahoo.com/mrss/" 
      xmlns:content="http://purl.org/rss/1.0/modules/content/" 
      xmlns:dc="http://purl.org/dc/elements/1.1/" 
      version="2.0">
<channel>
<title>Drew Dimmery</title>
<link>https://ddimmery.com/blog.html</link>
<atom:link href="https://ddimmery.com/blog.xml" rel="self" type="application/rss+xml"/>
<description>Personal website of Drew Dimmery</description>
<generator>quarto-1.8.27</generator>
<lastBuildDate>Mon, 16 Feb 2026 00:00:00 GMT</lastBuildDate>
<item>
  <title>Causal ML (the class)</title>
  <dc:creator>Drew Dimmery</dc:creator>
  <link>https://ddimmery.com/posts/causal-ml-class/</link>
  <description><![CDATA[ 





<section id="teaching-causalml" class="level2">
<h2 class="anchored" data-anchor-id="teaching-causalml">Teaching CausalML</h2>
<p>The most important paper I’ve ever read was David Freedman’s <a href="https://doi.org/10.2307/270939">Statistical Models and Shoe Leather</a>. As always, when I mention this paper, I must provide the following quote:</p>
<blockquote class="blockquote">
<p>Given the limits to present knowledge, I doubt that models can be rescued by technical fixes. Arguments about the theoretical merit of regression or the asymptotic behavior of specification tests for picking one version of a model over another seem like arguments about how to build desalination plants with cold fusion as the energy source. The concept may be admirable, the technical details may be fascinating, but thirsty people should look elsewhere.</p>
</blockquote>
<p>I find this to be a critically important lens through which to view the field of Causal Machine Learning, a class I just taught in the Fall at Hertie about a field that forms the core of a lot of my research. In many ways, this class is about me trying to square my love for design-based causal inference with the realities of modern Causal Machine Learning which is maddeningly focused on cold-fusion-powered desalination. A large amount of the literature in this field is beset by strong super-population assumptions, asymptotic theory and, thanks to the fundamental problem of causal inference, little way to determine what will actually be effective on the data that one has in front of them.</p>
<p>As such, my overriding goal for the course was to make students skeptical and informed consumers and producers of CML. I tried to get them to think about actual experiments whenever possible, and to think about the conditions under which methods work and do not work (and how you might be able to know)<sup>1</sup>. Then I expected everyone to get their hands dirty by actually implementing these methods and testing them through simulation studies. They did this both as part of in-class demos as well as a final project meant to extend and flesh out the demo (and fix any issues surfaced during the demo).</p>
<p>I expected a lot from students. Of the nearly two-hour class, I lectured for only around 30 minutes, while students led discussion for the remainder. They gave presentations about papers (with my intervention when I wanted to add additional color / complaints) and performed high quality simulation studies set up so their colleagues could tweak things about the setting or the method to see the results.</p>
<p>The course, therefore, operated on a few operating principles:</p>
<ul>
<li><strong>Read papers</strong>: If you can’t read new work, the field will leave you behind.</li>
<li><strong>Present main ideas</strong>: Nothing forces you to understand a paper like presenting its main ideas to your classmates.</li>
<li><strong>Write code</strong>: Implementing and stress testing methods is a powerful way to get a deep understanding of how they work.</li>
<li><strong>Don’t fight AI</strong>: Focus on in-class performance. If students just let AI create a slide deck/demo and don’t understand it, this will be very obvious when they get up to talk about their work. In practice, there were very few moments when students had offloaded too much of their preparation to AI. I think the incentives here are good. Students can use AI however helps them, but they cannot avoid learning enough of what they need to know to feel comfortable standing in front of the room to talk about it.</li>
</ul>
<p>The nagging problem is this: we spent the semester stress-testing methods through simulation to see how they work and where they break. This is genuinely useful. But it only tells us how methods perform in worlds we can imagine (and the world is much more complicated than that). The whole point of causal inference is that we never observe the counterfactual, so we can never confirm on our <em>actual</em> dataset that a method did what it promised. I don’t think the course resolves this—I don’t think anything really can. What I think it does is produce people who understand the machinery well enough to know exactly where the leap of faith is, and who can be honest about when they’re making it. I think that matters, even if it’s not enough. There’s still a lot of work to be done on understanding what we can actually learn from these methods and what we can’t—and frankly, a lot of the field doesn’t always seem particularly interested in that question.</p>
<p>With that, check out the syllabus I landed on:</p>
</section>
<section id="syllabus" class="level1">
<h1>Syllabus</h1>
<section id="course-overview" class="level2">
<h2 class="anchored" data-anchor-id="course-overview">Course overview</h2>
<p>The syllabus begins with the building blocks—design-based inference, covariate adjustment, and propensity scores—before moving to the doubly robust and semiparametric methods that form the core of modern CML. From there it turns to heterogeneous treatment effects (first with forests and meta-learners, then neural networks), policy learning, and experimental design. The final weeks cover topics where the standard assumptions start to break down: panel data, partial identification, adaptive experimentation, and interference. Students wrote a referee report on one paper, presented another, gave a live code demonstration of one week’s methods, and produced a final expository project extending the demonstration. All readings are papers; there is no textbook.</p>
</section>
<section id="session-by-session" class="level2">
<h2 class="anchored" data-anchor-id="session-by-session">Session-by-session</h2>
<section id="session-1-design-based-causal-inference-and-monte-carlo-simulation" class="level3">
<h3 class="anchored" data-anchor-id="session-1-design-based-causal-inference-and-monte-carlo-simulation">Session 1: Design-based Causal Inference and Monte Carlo Simulation</h3>
<p>The potential outcomes framework, what randomization buys you, and the <a href="https://doi.org/10.1002/sim.8086">ADEMP framework</a> for simulation studies that students use to stress-test every method in the course.</p>
<p><a href="https://conjugateprior.org/">Will Lowe</a> teaches a fantastic Causal course at Hertie that focuses more on DAG-world, so laying out some strong arguments around manipulability and the implied metaphysics of potential outcomes. This comes early to set the stage for the rest of the course. My computational demo aimed at understanding the difference between inference on the SATE and the PATE as a way to get them thinking about sources of randomization (and better prepare them for the many superpopulations to come).</p>
</section>
<section id="session-2-covariate-adjustment" class="level3">
<h3 class="anchored" data-anchor-id="session-2-covariate-adjustment">Session 2: Covariate Adjustment</h3>
<p>If randomization already gives you unbiased estimates, should you adjust for covariates at all? If not, is there any role for ML? I think there are actually pretty good answers in this setting! So we talked about the <a href="https://doi.org/10.1214/12-AOAS583">Lin-style regression</a> and AIPW as ways to be both safe <em>and</em> efficient.</p>
</section>
<section id="session-3-balancing-weights" class="level3">
<h3 class="anchored" data-anchor-id="session-3-balancing-weights">Session 3: Balancing Weights</h3>
<p>Only in this session are we really getting to the observational world at all. What is “balance”? Why might we want it? How should we define it? We went through a variety of modern approaches to this including <a href="https://proceedings.mlr.press/v139/arbour21a.html">permutation weighting</a>, which holds a special place in my heart as it reframes the problem as classification.</p>
</section>
<section id="session-4-doubly-robust-methods-double-ml-and-tmle" class="level3">
<h3 class="anchored" data-anchor-id="session-4-doubly-robust-methods-double-ml-and-tmle">Session 4: Doubly Robust Methods, Double ML, and TMLE</h3>
<p>Sessions 2 and 3 each model one side of the problem; doubly robust methods combine both, giving you two chances to get it right. Is it actually useful to get two chances at this? We work through the error decompositions that show why it might be helpful even when you get <em>neither</em> right. We hammer back on this many times throughout the course, too.</p>
<p>A recent paper we read was <a href="https://arxiv.org/abs/2203.06469">Kennedy (2022)</a>.</p>
</section>
<section id="session-5-heterogeneous-treatment-effects-i" class="level3">
<h3 class="anchored" data-anchor-id="session-5-heterogeneous-treatment-effects-i">Session 5: Heterogeneous Treatment Effects I</h3>
<p>How do effects vary across individuals? This is categorically harder than estimating averages—there is no observed target, no natural loss function, and no straightforward way to validate predictions, which is why my lecture focuses on the <a href="https://doi.org/10.1080/01621459.1986.10478354">Fundamental Problem of Causal Inference</a>. I’m just categorically unable to let people have fun.</p>
</section>
<section id="session-6-heterogeneous-treatment-effects-ii-neural-networks" class="level3">
<h3 class="anchored" data-anchor-id="session-6-heterogeneous-treatment-effects-ii-neural-networks">Session 6: Heterogeneous Treatment Effects II — Neural Networks</h3>
<p>This session focuses on using neural networks. My lecture focused on representation learning and why this is actually really tricky in the causal setting: what do you mean we require an invertible representation? In <strong>this</strong> economy?</p>
<p>Some recent papers we read were <a href="https://doi.org/10.1093/biomet/asaa068">Nie &amp; Wager (2021)</a> and <a href="https://arxiv.org/abs/2506.10914">Ma et al.&nbsp;(2025)</a>.</p>
</section>
<section id="session-7-off-policy-evaluation-and-optimization" class="level3">
<h3 class="anchored" data-anchor-id="session-7-off-policy-evaluation-and-optimization">Session 7: Off-Policy Evaluation and Optimization</h3>
<p>A pivot from “how large is the effect?” to “who should we treat?” I pull in the discussion over causal decision making (i.e.&nbsp;ignore causal effects, <a href="https://doi.org/10.1287/ijds.2021.0006">Learning the Sign is All You Need</a>), as I think it’s worth problematizing what the actual task is. Is there actually an important role for understanding HTEs for decision-making? For many reasons, I still think the answer is yes.</p>
<p>Some recent papers we read were <a href="https://doi.org/10.3982/ECTA15732">Athey &amp; Wager (2021)</a> and <a href="https://doi.org/10.1038/s43588-025-00814-9">Kern et al.&nbsp;(2025)</a>.</p>
</section>
<section id="session-8-experimental-design" class="level3">
<h3 class="anchored" data-anchor-id="session-8-experimental-design">Session 8: Experimental Design</h3>
<p>Probably the session most near and dear to my heart. Always take the opportunity to give ’em the Fisher quote:</p>
<blockquote class="blockquote">
<p>To consult the statistician after an experiment is finished is often merely to ask him to conduct a post mortem examination. He can perhaps say what the experiment died of.</p>
</blockquote>
<p>Some recent papers we read were <a href="https://arxiv.org/abs/1911.03071">Harshaw et al.&nbsp;(2019)</a> and <a href="https://proceedings.mlr.press/v162/arbour22a.html">Arbour et al.&nbsp;(2022)</a>.</p>
</section>
<section id="session-9-panel-data-and-modern-difference-in-differences" class="level3">
<h3 class="anchored" data-anchor-id="session-9-panel-data-and-modern-difference-in-differences">Session 9: Panel Data and Modern Difference-in-Differences</h3>
<p>The course turns to settings without randomization (but, of course, problematizes this framing). We go over the “New Diff-in-diff” and pivot quickly to <a href="https://doi.org/10.1198/jasa.2009.ap08746">synthetic control</a>, because I want to keep as much attention on the actual <em>identification</em> problems of the setting (which I think are often elided because they do not admit simple technical fixes: they require thinking about the actual problem setting).</p>
<p>Some recent papers we read were <a href="https://doi.org/10.1080/01621459.2021.1929245">Ben-Michael et al.&nbsp;(2021)</a> and <a href="https://doi.org/10.1214/22-AOAS1654">Ben-Michael et al.&nbsp;(2023)</a>.</p>
</section>
<section id="session-10-partial-identification" class="level3">
<h3 class="anchored" data-anchor-id="session-10-partial-identification">Session 10: Partial Identification</h3>
<p>We cover <a href="https://www.jstor.org/stable/2006592">Manski’s no-assumptions bounds</a> and look at a variety of ways that ML can be used to support tighter intervals. I find this really important as a w ay to tie together the class, but I find that students don’t get that excited by the idea of estimating an interval rather than getting a single holy point estimate.</p>
<p>Some recent papers we read were <a href="https://proceedings.mlr.press/v235/khan24b.html">Khan, Saveski &amp; Ugander (2024)</a> and <a href="https://arxiv.org/abs/2309.08985">Samii, Wang &amp; Zhou (2023)</a>.</p>
</section>
<section id="session-11-adaptive-experimentation-and-reinforcement-learning" class="level3">
<h3 class="anchored" data-anchor-id="session-11-adaptive-experimentation-and-reinforcement-learning">Session 11: Adaptive Experimentation and Reinforcement Learning</h3>
<p>I always tell students that explore-exploit is a natural framework that can be helpful to them generally as they live their lives (do we go to the same sushi spot as always or try somewhere new?). I spend a lot of time talking about how simple effect estimators fail. We don’t get much into the solutions for this, unfortunately (I am not fully happy with how this has been solved thus far in the literature).</p>
<p>Some recent papers we read were <a href="https://arxiv.org/abs/2203.02155">Ouyang et al.&nbsp;(2022)</a> and <a href="https://doi.org/10.1073/pnas.2014602118">Hadad et al.&nbsp;(2021)</a>.</p>
</section>
<section id="session-12-interference" class="level3">
<h3 class="anchored" data-anchor-id="session-12-interference">Session 12: Interference</h3>
<p>We end by blowing up the possibility of causal inference via exponential explosion of potential outcomes under interference. Is it possible to do anything about it? I throw in a lot of attention to <a href="https://doi.org/10.1145/2487575.2487695">design-based solutions</a> which (alas, don’t they always?) require understanding things about the world.</p>
<p>A recent paper we read was <a href="https://doi.org/10.1073/pnas.2322232121">Shirani &amp; Bayati (2024)</a>.</p>
</section>
</section>
<section id="closing-thoughts" class="level2">
<h2 class="anchored" data-anchor-id="closing-thoughts">Closing thoughts</h2>
<p>That was my attempt to square the circle. I think it went pretty well all things considered. There’s such a huge amount to cover (jamming DR + DML + TMLE into one session is crazy, as is combining CB + RL into one session). Alas, there was a lot I wanted to cover. I think more than half the classes had some kind of important design-based connection, which I think is good. The machinery is all genuinely useful and extremely interesting, but it doesn’t replace knowing where your identification comes from. I would much rather send students out the door a little paranoid than too comfortable that technical fixes will solve their problems.</p>


</section>
</section>


<div id="quarto-appendix" class="default"><section id="footnotes" class="footnotes footnotes-end-of-document"><h2 class="anchored quarto-appendix-heading">Footnotes</h2>

<ol>
<li id="fn1"><p>One of many frustrations I have is with the term “testing assumptions”. If you can test it, then it isn’t an assumption!↩︎</p></li>
</ol>
</section><section class="quarto-appendix-contents" id="quarto-reuse"><h2 class="anchored quarto-appendix-heading">Reuse</h2><div class="quarto-appendix-contents"><div><a rel="license" href="https://creativecommons.org/licenses/by/4.0/">CC BY 4.0</a></div></div></section><section class="quarto-appendix-contents" id="quarto-citation"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">BibTeX citation:</div><pre class="sourceCode code-with-copy quarto-appendix-bibtex"><code class="sourceCode bibtex">@online{dimmery2026,
  author = {Dimmery, Drew},
  title = {Causal {ML} (the Class)},
  date = {2026-02-16},
  url = {https://ddimmery.com/posts/causal-ml-class/},
  langid = {en}
}
</code></pre><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-dimmery2026" class="csl-entry quarto-appendix-citeas">
Dimmery, Drew. 2026. <span>“Causal ML (the Class).”</span> February 16,
2026. <a href="https://ddimmery.com/posts/causal-ml-class/">https://ddimmery.com/posts/causal-ml-class/</a>.
</div></div></section></div> ]]></description>
  <category>methodology</category>
  <category>teaching</category>
  <guid>https://ddimmery.com/posts/causal-ml-class/</guid>
  <pubDate>Mon, 16 Feb 2026 00:00:00 GMT</pubDate>
  <media:content url="https://ddimmery.com/posts/causal-ml-class/main-image.png" medium="image" type="image/png" height="78" width="144"/>
</item>
<item>
  <title>Website Refresh</title>
  <dc:creator>Drew Dimmery</dc:creator>
  <link>https://ddimmery.com/posts/website-refresh/</link>
  <description><![CDATA[ 





<p>I’ve made a few changes and updates to this website. None of them are particularly earth-shattering, but I think building one’s own tools—rather than accepting the constraints of platforms—is worth the effort.</p>
<p>Here’s what I’ve been working on:</p>
<ol type="1">
<li>Semantic Scholar integration for updating publication records</li>
<li>Listmonk tools for sharing blog posts as a newsletter</li>
<li>Improvements to the design and performance, particularly around fonts</li>
</ol>
<p>I think these changes are consistent with <a href="../../posts/back-to-basics/">the philosophy I introduced in the last post</a>. I’ll introduce each briefly and discuss how they work—perhaps they’ll be useful to others building their own corners of the web.</p>
<section id="semantic-scholar-integration" class="level2">
<h2 class="anchored" data-anchor-id="semantic-scholar-integration">Semantic Scholar integration</h2>
<p>I’ve had a consistent desire to avoid tedious data entry in updating my website. This was part of my intention when I <a href="../../posts/quarto-website/">moved it to Quarto in the first place</a>: Quarto gives great opportunity to programmatically generate a static website. I can easily mix in rich documents that combine code (R and Python in particular) and easily formatted Markdown text. This is all great. It let me hack together a nice way to display my publications in a consistent format based on data stored in a hand-curated YAML file. The next step was to automate the process of updating the publication records. I used the <a href="https://api.semanticscholar.org/api-docs/">Semantic Scholar API</a> to fetch the latest information about my publications and then used <a href="https://quarto.org/docs/websites/">Quarto’s templating capabilities</a> to generate the updated publication list.</p>
<p>Most of the work of this system is in two functions which, because I vibe-coded it, I’ve never really even looked at.</p>
<details>
<summary>
See the code
</summary>
<div id="83c49da4" class="cell" data-execution_count="1">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">def</span> fetch_author_papers(author_id):</span>
<span id="cb1-2">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">"""Fetch papers from Semantic Scholar API"""</span></span>
<span id="cb1-3">    base_url <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"https://api.semanticscholar.org/graph/v1"</span></span>
<span id="cb1-4">    fields <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"paperId,title,authors,year,venue,publicationDate,externalIds,openAccessPdf,url"</span></span>
<span id="cb1-5"></span>
<span id="cb1-6">    url <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>base_url<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">/author/</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>author_id<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">/papers"</span></span>
<span id="cb1-7">    params <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> {<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'fields'</span>: fields, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'limit'</span>: <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1000</span>}</span>
<span id="cb1-8"></span>
<span id="cb1-9">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">try</span>:</span>
<span id="cb1-10">        response <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> requests.get(url, params<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>params)</span>
<span id="cb1-11">        response.raise_for_status()</span>
<span id="cb1-12">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">return</span> response.json()</span>
<span id="cb1-13">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">except</span> requests.RequestException <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> e:</span>
<span id="cb1-14">        <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"Error fetching data from Semantic Scholar: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>e<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb1-15">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">return</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">None</span></span>
<span id="cb1-16"></span>
<span id="cb1-17"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">def</span> create_yaml_entry_from_ss_paper(paper):</span>
<span id="cb1-18">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">"""Convert Semantic Scholar paper to YAML entry format"""</span></span>
<span id="cb1-19">    paper_id <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> paper.get(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'paperId'</span>)</span>
<span id="cb1-20">    title <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> paper.get(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'title'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Untitled'</span>)</span>
<span id="cb1-21">    year <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> paper.get(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'year'</span>)</span>
<span id="cb1-22">    venue <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> paper.get(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'venue'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">''</span>)</span>
<span id="cb1-23"></span>
<span id="cb1-24">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Process authors</span></span>
<span id="cb1-25">    authors <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> []</span>
<span id="cb1-26">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> author <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> paper.get(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'authors'</span>, []):</span>
<span id="cb1-27">        author_name <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> author.get(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'name'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">''</span>)</span>
<span id="cb1-28">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Dimmery'</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> author_name:</span>
<span id="cb1-29">            authors.append(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"me"</span>)</span>
<span id="cb1-30">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">else</span>:</span>
<span id="cb1-31">            authors.append(author_name)</span>
<span id="cb1-32"></span>
<span id="cb1-33">    entry <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> {</span>
<span id="cb1-34">        <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'title'</span>: title,</span>
<span id="cb1-35">        <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'authors'</span>: authors,</span>
<span id="cb1-36">        <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'year'</span>: year,</span>
<span id="cb1-37">        <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'venue'</span>: venue <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> venue <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">else</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">None</span>,</span>
<span id="cb1-38">        <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'visible'</span>: <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>,  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Default to not visible</span></span>
<span id="cb1-39">        <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'ssid'</span>: paper_id,</span>
<span id="cb1-40">    }</span>
<span id="cb1-41"></span>
<span id="cb1-42">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Add external links if available</span></span>
<span id="cb1-43">    external_ids <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> paper.get(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'externalIds'</span>, {})</span>
<span id="cb1-44">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> external_ids.get(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'DOI'</span>):</span>
<span id="cb1-45">        entry[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'published_url'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"https://doi.org/</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>external_ids[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'DOI'</span>]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span></span>
<span id="cb1-46"></span>
<span id="cb1-47">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> external_ids.get(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'ArXiv'</span>):</span>
<span id="cb1-48">        entry[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'preprint'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"https://arxiv.org/abs/</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>external_ids[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'ArXiv'</span>]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span></span>
<span id="cb1-49"></span>
<span id="cb1-50">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Add open access PDF if available</span></span>
<span id="cb1-51">    open_access <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> paper.get(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'openAccessPdf'</span>)</span>
<span id="cb1-52">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> open_access <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">and</span> open_access.get(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'url'</span>):</span>
<span id="cb1-53">        entry[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'pdf_url'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> open_access[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'url'</span>]</span>
<span id="cb1-54"></span>
<span id="cb1-55">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">return</span> entry</span></code></pre></div></div>
</div>
</details>
<p>Critically, this is all pretty trivial, but it’s also <em>boring</em>, so I’m glad I didn’t need to think particularly hard about it. It also means I didn’t need to personally dig through the Semantic Scholar API documentation. The code quality is not great, but neither was my own code before!</p>
<p>So now when I build the site, it automatically updates the YAML with my latest papers. I can just browse through them and flip the <code>visible</code> flag to show them once I’m satisfied they’re right.</p>
</section>
<section id="listmonk-tools" class="level2">
<h2 class="anchored" data-anchor-id="listmonk-tools">Listmonk tools</h2>
<p>The second main improvement I made was to add an email distribution list to my website’s blog. This was pretty easy because of the great open source email distribution software, <a href="https://listmonk.app/">Listmonk</a>, and the super easy deployment option at <a href="https://www.pikapods.com/">Pikapods</a>. The system I’ve come up with is basically the following:</p>
<ol type="1">
<li>Write a blog post in a Quarto document.</li>
<li>Push it to a branch if I want to have sensitivity readers.</li>
<li>Once I’m satisfied, push it to main, which automatically triggers a Quarto build of the website and makes the blog post “live” on the website (and RSS feed)</li>
<li><strong>Manually trigger a Github Actions workflow to schedule a Listmonk email campaign.</strong></li>
</ol>
<p>It’s this last part that I think is particularly cool. Basically the way it works is I vibecoded an over-engineered system that pulls the necessary metadata from a given blog post (in the <code>.qmd</code> file), renders the post to an HTML version, extracts the HTML from that, and sends all of this through Listmonk’s API to schedule an email campaign with nicely formatted HTML content. I’m not going to embed all of the scripts for this here, but you can take a look at them in my website’s Github: <a href="https://github.com/ddimmery/quarto-website/tree/main/scripts"><code>scripts/</code></a>.</p>
<p>To actually trigger this monstrosity, I used a nice feature that Github Actions provides: <code>workflow_dispatch</code> event triggers. Basically, what this means is that <a href="https://docs.github.com/en/actions/how-tos/manage-workflow-runs/manually-run-a-workflow">you can configure a Github Action to trigger manually through the Github UI</a>. It looks something like this:</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://ddimmery.com/posts/website-refresh/workflow-dispatch.png" class="img-fluid figure-img" style="width:20.0%"></p>
<figcaption>Workflow Dispatch example</figcaption>
</figure>
</div>
<p>So basically all I need to do is tell the workflow what post to schedule (the <code>post_slug</code> part), when to schedule it (the <code>send_at</code>). There are also a few optional things I can plug in, such as which list to send to (Listmonk lets me manage multiple lists easily), or I can send test emails.</p>
<p>This is a solution I’m mostly happy with. I think there’s still a bit of tweaking about some of the practicalities. I think there are two things I’m trying to figure out:</p>
<ol type="1">
<li>Whether I should maintain separate lists for different categories of post. It wouldn’t be hard to maintain category-specific lists so that people could easily opt-out of posts about (e.g.) how I built things on my website.</li>
<li>How exactly to manage the sending of emails. When I sent out the last email, it only sent to like two-thirds of my subscribers because my email provider is Protonmail setup with a custom <code>blog@</code> email address for sending/receiving. Unfortunately, there were some rate limits that I didn’t adequately respect the first time.</li>
</ol>
<p>I’m overall pretty pleased with this system. You’re welcome to <a href="https://list.ddimmery.com/subscription/form">subscribe here!</a></p>
</section>
<section id="improvements-in-design-and-performance" class="level2">
<h2 class="anchored" data-anchor-id="improvements-in-design-and-performance">Improvements in design and performance</h2>
<p>Within me are two wolves: Sometimes I miss the days of HTML-only websites. Other days, I long for the ability to have nicely reactive and responsive design elements that require richer web frameworks. On some personal projects over the last few months, I’ve learned a bit more about modern webdev tools. One thing I’m particularly taken by is the way that modern webdev tools like <a href="https://vitejs.dev/">Vite</a> manage javascript dependencies.</p>
<p>In short, the problem with a lot of these big web frameworks is that they’re really big (duh)! They have a lot of stuff that you never use in a small personal website like mine, but they still get bundled into the library that is sent down to the website viewer. We don’t need to do that: we can be smarter. That’s basically what tools like Vite do. They only bundle together the parts of code that are necessary for the particular site being built. They also can help with lazy loading (so they’re only loaded when needed).</p>
<p>That’s all a bit of a tangent, but it’s gotten me thinking about why I actually need to load so many Javascript libraries. Quarto ultimately doesn’t seem to be particularly optimized on this front. This is kind of the tradeoff you get: sure, it’s easy to throw something together with Quarto that has a lot of the bells and whistles a data scientist craves, but that doesn’t mean it’s going to be super efficient. There’s a <a href="https://emilhvitfeldt.com/post/quarto-performance/">really nice post by Emil Hvitfeldt discussing performance considerations in Quarto</a>.</p>
<section id="purge-unused-resources" class="level3">
<h3 class="anchored" data-anchor-id="purge-unused-resources">Purge unused resources</h3>
<p>I in-lined many icons to embedded svgs, but, alas, Quarto still loads the <code>bootstrap-icons.css</code> that it does not, as far as I can tell, use anywhere. The solution to this was described in <a href="https://github.com/orgs/quarto-dev/discussions/9330">a discussion about Quarto performance by Charles Nepote</a>. Quarto makes it possible to add a post-rendering step to builds which purges unused CSS and minifies JS and CSS (<a href="https://github.com/ddimmery/quarto-website/blob/main/scripts/purge-css.sh">see here</a>). I found that this did a really nice job of reducing the size of resources that needed to be fetched from my site. This helped, and brought the critical path latency down a good bit. After reducing as much of these resources as I could, <a href="https://pagespeed.web.dev/analysis/https-ddimmery-com/2zf25s9mbr?form_factor=mobile">the critical path is currently loading <code>quarto.js</code></a>. I think this is probably about as good as it gets<sup>1</sup>. Or, at least, as good as it gets without me just basically giving up on letting Quarto manage this at all.</p>
</section>
<section id="fonts" class="level3">
<h3 class="anchored" data-anchor-id="fonts">Fonts</h3>
<p>Here’s some fun Drew-lore: I’m weirdly into fonts. In the summer before my PhD I read like 3 books about fonts that I marked up more than almost anything else I’ve read.</p>
<section id="my-font-choices" class="level4">
<h4 class="anchored" data-anchor-id="my-font-choices">My font choices</h4>
<p>Long story short, I’ve currently landed on the following font styling that I intend to use pretty generally across the web, in documents and on presentations<sup>2</sup>:</p>
<ul>
<li>Headers in <a href="https://en.wikipedia.org/wiki/Optima#Clones_and_derivatives"><span class="sans">URW Classico</span></a>. This is a font based largely on <a href="https://en.wikipedia.org/wiki/Optima">Optima</a>.</li>
<li>Body text in <a href="https://en.wikipedia.org/wiki/Palatino#Derivatives">Domitian</a>. This is a font based largely on the venerable <a href="https://en.wikipedia.org/wiki/Palatino">Palatino</a>.</li>
<li>Code in <a href="https://monaspace.githubnext.com/"><code>Monaspace Argon</code></a>. This is part of the suite of fonts developed by Github Next. It’s also the primary font I use locally in my IDE. It’s a Humanist Sans font (I have tried to use Serifed fonts for coding, but it just looks utterly wrong to me).</li>
<li>Math in <a href="https://en.wikipedia.org/wiki/AMS_Euler"><img src="https://latex.codecogs.com/png.latex?%5Ctext%7BNeo-Euler%7D"></a>. When I use math, I want it to stand out as distinct from the surrounding text. I think Neo-Euler does a good job of walking the line between being a little weird while remaining easy to read.</li>
</ul>
</section>
<section id="why-humanist-fonts-matter" class="level4">
<h4 class="anchored" data-anchor-id="why-humanist-fonts-matter">Why humanist fonts matter</h4>
<p>These fonts are all strongly humanist in orientation and <a href="https://www.gnu.org/philosophy/free-sw.en.html">free</a><sup>3</sup>. Despite a brief flirtation with <a href="https://en.wikipedia.org/wiki/Futura_(typeface)">Futura</a>, I have come around to the importance of fonts which embrace more complexity than such geometric renderings. Ultimately, I think the omni-presence of sans-serifed fonts on the web is a reflection of its inhumanity. Good text should have character in style as well as in substance. I think the ubiquity of sans-serifs on the web is indicative of the fact that it isn’t really a place for <em>reading</em>. The medium corresponding to sans-serifs is the microblog, not the book. Bringing back subtle variations in stroke width, text-oriented serifs and other organic qualities of humanist fonts is a (very) small step towards reclaiming the web.</p>
</section>
<section id="loading-fonts-performantly" class="level4">
<h4 class="anchored" data-anchor-id="loading-fonts-performantly">Loading fonts performantly</h4>
<p>To use these custom fonts on the web without screwing with performance too much, I used <a href="https://css-tricks.com/how-to-load-fonts-in-a-way-that-fights-fout-and-makes-lighthouse-happy/">the solution detailed here</a> to make sure fonts can be preloaded and also that they aren’t part of the critical path for rendering. When you load a page here, you might see a replacement background font for a split-second before the font faces are loaded into the CSS.</p>
</section>
</section>
</section>
<section id="remaining-concerns" class="level2">
<h2 class="anchored" data-anchor-id="remaining-concerns">Remaining concerns</h2>
<p>I’m happy with a lot of what has come together. There are a few issues that I think can be improved. I’m considering whether it might make sense to switch to a hybrid system which would use Quarto to build minimal HTML or Markdown documents which are then knit together using a modern web development framework like <a href="https://docs.astro.build/en/tutorial/0-introduction/">Astro</a>. This would allow much more granular control over the website and how it loads resources, but could still get the benefits of Quarto where it is most useful (for pre-rendering rich documents full of Python and R code). This is probably a bigger project than I imagine it to be, so I don’t intend to commit to it soon<sup>4</sup>.</p>
<p>I’m also considering whether it would make more sense to switch to a different hosting solution. I’m not particularly pleased with the quality of Netlify’s CDN, which seems to be a bit slow in Europe (I suspect they just have fewer edge locations)<sup>5</sup>. I’m considering whether it might make sense to switch to a system which is self-hosted in a VPS like DigitalOcean/Hetzner (possibly with a tool like Coolify to manage deployment) and then add a CDN like Cloudflare over the top. The benefit to this is that it would combine control (from the self-hosted VPS) with the performance benefits of a reliable CDN through Cloudflare, but it would remain quite cheap (albeit not free).</p>
</section>
<section id="tending-the-garden" class="level2">
<h2 class="anchored" data-anchor-id="tending-the-garden">Tending the garden</h2>
<p>There’s something satisfying about all of this tinkering that I find hard to articulate. It’s not that any individual piece is particularly impressive or novel. Rather, it’s the act of construction itself—of shaping raw materials into something that reflects intention and care. McLuhan observed that we shape our tools, and thereafter our tools shape us. The platforms we’ve ceded our digital lives to have shaped us into consumers of content, optimized for something outside of our control. Building my own corner of the web is a small act of resistance against this.</p>
<p>I don’t think everyone needs to hand-roll their entire tech-stack or obsess over font choices (I will tell you fonts are a huge net-negative on my measurable productivity). But I do think there’s value in the process of understanding and improving the systems we inhabit, even if only to understand what we’ve given up by outsourcing them. The web was supposed to be a place where anyone could plant a flag and build something. Somewhere along the way, we traded that for the convenience of renting space in someone else’s domain.</p>
<p>These updates—the automation, the newsletters, the performance tweaks, the typography—are all just ways of making this place more my own. The work is never finished. There’s always another edge to smooth, another inefficiency to address. But that’s the point, isn’t it?</p>


</section>


<div id="quarto-appendix" class="default"><section id="footnotes" class="footnotes footnotes-end-of-document"><h2 class="anchored quarto-appendix-heading">Footnotes</h2>

<ol>
<li id="fn1"><p>It seems like there are also some other problems, as Firefox and Chrome both intermittently show several seconds to establish a SSL connection to the website (i.e.&nbsp;before any resources are actually loaded). Unclear what’s going on with this…↩︎</p></li>
<li id="fn2"><p>The ease of writing Markdown documents which are rendered into <a href="https://revealjs.com/">reveal.js</a> HTML presentations is really nice, and yet another way that I’m aiming at convergence between these media. I ask myself whether I should also move away from directly writing LaTeX, too. Ensuring visual consistency by using literally identical styling directives is very appealing.↩︎</p></li>
<li id="fn3"><p>We’re <a href="https://en.wikipedia.org/wiki/Hermann_Zapf">Hermann Zapf</a> fans over here if you couldn’t tell.↩︎</p></li>
<li id="fn4"><p>Truly famous last words.↩︎</p></li>
<li id="fn5"><p>Making a connection via HTTPS occasionally takes 3s or more, which is just absurd and is by far the slowest part of page-load, although it is also intermittent which makes me suspect something with CDNs.↩︎</p></li>
</ol>
</section><section class="quarto-appendix-contents" id="quarto-reuse"><h2 class="anchored quarto-appendix-heading">Reuse</h2><div class="quarto-appendix-contents"><div><a rel="license" href="https://creativecommons.org/licenses/by/4.0/">CC BY 4.0</a></div></div></section><section class="quarto-appendix-contents" id="quarto-citation"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">BibTeX citation:</div><pre class="sourceCode code-with-copy quarto-appendix-bibtex"><code class="sourceCode bibtex">@online{dimmery2025,
  author = {Dimmery, Drew},
  title = {Website {Refresh}},
  date = {2025-11-19},
  url = {https://ddimmery.com/posts/website-refresh/},
  langid = {en}
}
</code></pre><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-dimmery2025" class="csl-entry quarto-appendix-citeas">
Dimmery, Drew. 2025. <span>“Website Refresh.”</span> November 19, 2025.
<a href="https://ddimmery.com/posts/website-refresh/">https://ddimmery.com/posts/website-refresh/</a>.
</div></div></section></div> ]]></description>
  <category>technology</category>
  <category>website</category>
  <guid>https://ddimmery.com/posts/website-refresh/</guid>
  <pubDate>Wed, 19 Nov 2025 00:00:00 GMT</pubDate>
  <media:content url="https://ddimmery.com/posts/website-refresh/main-image.png" medium="image" type="image/png" height="144" width="144"/>
</item>
<item>
  <title>Back to Basics</title>
  <dc:creator>Drew Dimmery</dc:creator>
  <link>https://ddimmery.com/posts/back-to-basics/</link>
  <description><![CDATA[ 





<p>I started a Substack not that long ago, because I wanted to get back into blogging. I think blogging is great and <a href="https://ddimmery.com/posts/stop-looking-for-the-next-twitter/">microblogs are bad</a>. I’ve had a bit of time to think over the past year about my usage of technology. I’ve come to believe that most platforms are (at best) not really for me, or (at worst) broadly <a href="https://en.wikipedia.org/wiki/Considered_harmful">considered harmful</a>. If you previously subscribed to my Substack, I’ve migrated you to a subscription service of my own. If this isn’t what you signed up for, please unsubscribe<sup>1</sup>.</p>
<p>I think there are a few dimensions to my thinking that I’m aiming to work out in this post:</p>
<ol type="1">
<li>Platforms constrain freedom</li>
<li>Platforms enshittify</li>
<li>AI coding tools make smaller projects feasible</li>
</ol>
<p>I’ll then talk about why I think we should fight against the ease of platforms. Let’s get into it.</p>
<section id="platforms-constrain-freedom" class="level2">
<h2 class="anchored" data-anchor-id="platforms-constrain-freedom">Platforms constrain freedom</h2>
<p>Platforms exist to make it easier to do things by creating one streamlined avenue to accomplish a goal. A canonical example is Facebook winning over other social media sites like MySpace. It might be hard to recall now, but one of the main reasons for this was that, in contrast to MySpace, Facebook provided a simple, clean, consistent user experience. On MySpace you would arrive on a page, have music randomly play at you and be barraged by weird fonts, colors and images. Facebook constrained how users could express themselves quite substantially, and this was very popular. It let people ignore all this wacky design stuff created by thirteen-year-olds and focus on what they wanted to do with social media: poke each other, write weird status updates and check out relationship statuses<sup>2</sup>.</p>
<p>I think this is a pretty common pattern for new platforms: They do, in fact, generally make <em>something</em> that people want to do easier! Of course, given the nature of this post, I need to address the Substack in the room. Substack is generally a good platform. It makes it easy to write (certain kinds of) blogs and it makes (certain kinds of) monetization easier. The parentheticals, however, are what I find frustrating about platforms. By their nature, they constrain the user to a particular form of the thing they provide.</p>
<p>Scale means that Substack can’t and won’t cater to all of the small, weird features that people want. If I were designing my own blog system, I would want (a) control over exactly the fonts and styles my content is displayed with, (b) easy integration of math via the great tools like MathJax, (c) the ability to have code-rich documents like Quarto in which code runs in-line and displays output in-line, (d) the ability to embed any kinds of content I want wherever I want it. Ultimately, I think this is the dream of the web: I can have a little homestead on the internet that is exactly the way I want it: <a href="https://ddimmery.com/posts/is-bluesky-convivial/">a <em>convivial</em> web</a><sup>3</sup>. Platforms cannot provide this: their <em>raison d’être</em> is reducing complexity as they make human behavior digitally legible down a particular prescribed path.</p>
<p>I dislike this constraint: I want to use tools that work the way I choose for them to work.</p>
</section>
<section id="platforms-enshittify" class="level2">
<h2 class="anchored" data-anchor-id="platforms-enshittify">Platforms enshittify</h2>
<p>Even if you feel well-served by a platform now, this is no guarantee that the platform will continue to serve you well in the future once you’re stuck there. I won’t belabor this point, because it’s been <a href="https://www.versobooks.com/products/3341-enshittification">talked about ad nauseum</a> by others<sup>4</sup>. Platforms, as they mature, need to monetize more aggressively as well as build moats to competitors. This often leads to more constraining of user freedom. I think the real “break-glass” moment for Substack is the ability to collect and export all subscribers’ email addresses. This is, right now, a defining feature of Substack that makes it safe for people to use. This drastically reduces their moat, however. At the moment, switching to <a href="https://ghost.org/">Ghost</a> is (technically, at least) pretty trivial. Switching to a Quarto blog deployed through Github Actions and which sends a newsletter through <a href="https://listmonk.app/">Listmonk</a> (deployed via <a href="https://www.pikapods.com/">Pikapods</a>) when triggered through Github Actions is not trivial, but it also wasn’t exactly <em>hard</em> to setup<sup>5</sup>.</p>
<p>The main way that Substack appears to be aiming at constructing a moat is in their push for discovery. That is, they want to make it easy to find <em>Substack</em> writers on Substack. They have no interest in doing this with writers hosted elsewhere. This provides an incentive for people to stick around, even if the actual writing and publishing experience is worse. I don’t think Substack is made up of evil people, quite the opposite, I think they truly wish to help writers make a living (while taking their cut). The problem is that they have few incentives to make it possible for writers to truly experiment. Why would they? Writers will want to stick around because they can get more subscribers there, not because the CMS is uniquely good or they can uniquely realize their vision there. The one exception is that many writers’ visions are simply “to be able to feed themselves through their writing”. I think this goal would be better served by increased experimentation<sup>6</sup>, but I recognize beggars feel that they can’t be choosers. I think this is a false dilemma, because:</p>
</section>
<section id="ai-coding-tools-make-smaller-projects-feasible" class="level2">
<h2 class="anchored" data-anchor-id="ai-coding-tools-make-smaller-projects-feasible">AI coding tools make smaller projects feasible</h2>
<p>Platforms work by making a particular workflow easy enough that we accept lock-in. An alternative path exists, however, which is to build our own tools to support that workflow. A critical fact of contemporary society is that this is becoming increasingly easy. I think a lot of people have, thus far, only interacted with LLM systems through the lens of the chat interface. While this is impressive technology, I think it blinds a lot of people to what is now possible. Let me briefly detail an example.</p>
<p>For <a href="https://ddimmery.com/posts/quarto-website/">my website</a>, I decided I didn’t want to manually write out the metadata for my publications anymore. So in one afternoon, I vibecoded <a href="https://github.com/ddimmery/quarto-website/blob/main/research.qmd">a system</a> which connects to the <a href="https://www.semanticscholar.org/product/api">Semantic Scholar API</a><sup>7</sup>, checks for new papers on my profile and if it finds them, sets up the metadata in <a href="https://github.com/ddimmery/quarto-website/blob/main/papers.yaml"><code>papers.yaml</code></a>. It doesn’t display until I flip a switch from <code>visible: false</code> (the default) to <code>visible: true</code>. Thus, I’m not too worried about the system going haywire: nothing is posted without me getting the chance to review it, but it reduces tedious data entry. The technology is there to vastly reduce the time between an idea and a prototype of that idea. For this Semantic Scholar system, it took me just a few minutes to set Claude up with instructions, and I had a complete and functional prototype no more than one hour later, most of which did not require me to be thinking about the implementation (verified via <a href="https://github.com/ddimmery/quarto-website/commits/main/?since=2025-08-05&amp;until=2025-08-05">commit log</a>).</p>
<p>It’s substantially easier to build things now. As the effort to build novel computing systems shrinks, it’s easier to build exactly the systems you want for yourself (or for yourself and a few others). We should do more of this, on the margin. Individuals’ capabilities to personalize their computing environment has literally never been higher than it now is. Choosing not to take advantage of this is a choice to let other people (who <em>do not have your best interests at heart</em>) choose your tools for you. The point of platforms is to capture you by making it easy for you to achieve particular goals (but restrict you to those goals they deem worthy), but the thing that’s different <em>now</em> is that it’s also easier than ever for you to build what you need for yourself.</p>
</section>
<section id="technological-vanguardism" class="level2">
<h2 class="anchored" data-anchor-id="technological-vanguardism">Technological vanguardism</h2>
<p>Open source software is a critical part of this story, as it serves as the backbone for building any kind of new software in the current era. This is recognized in a variety of initiatives as part of <a href="https://eurostack.eu/">EuroStack</a><sup>8</sup>. That is, the idea is that Europe can’t or shouldn’t try to compete with American and Chinese hyperscalers by trying to incubate European hyperscalers as competitors. Rather, Europe should commit to smaller alternatives based on a robust shared digital infrastructure of open-source software. AI-supported individuals able to mix-and-match together the computing systems they want to support their lives is a rare hopeful vision of the future.</p>
<p><a href="https://www.bbc.com/news/articles/cly7n2jm5m5o">A lot of the maintainers of critical open source projects are getting older</a><sup>9</sup>. I wonder whether the next generation will have the same commitment to the principles that underlie the open source movement. In particular, we’ve all discussed how “the youths” increasingly don’t understand file-systems or whatever your preferred concern is. My worry is that they have become so used to an Internet made up entirely of platforms that they can’t even imagine anything different: an Internet where when you want to create a space to chat with friends or colleagues the only option is Discord or Slack controlled by someone else, never hosting <a href="https://en.wikipedia.org/wiki/Ventrilo">a Ventrilo server</a>, a <a href="https://en.wikipedia.org/wiki/PhpBB">phpBB forum</a>, an IRC server or, in modern terms, software like <a href="https://zulip.com/">Zulip</a> or <a href="https://matrix.org/">Matrix</a>.</p>
<p>Extensible tools like this often have a higher entry cost than platforms do: they may require standing up servers, writing some code or doing some non-obvious configuration. This has been a major problem with Open Source Software for ages: they’re built by developers for developers. Open source developers and maintainers of open source projects are a sort of technological vanguard who accept some of these costs on themselves. They build tools, often <a href="https://archive.org/details/justforfun00linu">just for fun</a>, because they <em>can</em> and it scratches a weird itch they have in their own lives. Sometimes, these projects fit the needs of the moment and take off, but many times they remain personal or niche. It’s important that the people who can build such tools do so to pave a (code)path for others to follow, but they are not usually about making that path highly accessible.</p>
<p>The costs of building one’s own systems are lower than they’ve ever been because of AI code assistance. This provides a critical link between the Open Source Software ecosystem (which is the fundamental infrastructure underlying modern software of all kinds) and personal use. That big open source project which was <em>almost</em> right for you? It’s not hard to make minor modifications to it now to make it <em>just right</em>.</p>
<p>The problem, however, is that every person who accepts Platforms into their hearts does create a social cost as well. If everyone only goes to Substack to get blogs, then blogs hosted elsewhere (no matter their quality) will be read less. This is what Substack is hoping to achieve, and I think it’s important for those of us comfortable with forging a different path to try.</p>
<p>I’m a technological determinist: I believe that the technologies we create fundamentally transform our lives and societies. I also believe strongly in the power of human agency: we can reshape the technostructure to harness these technologies the way that we want. We can <a href="https://blog.citp.princeton.edu/2024/04/29/a-syllabus-of-actions-for-building-the-society-we-want/">build the society we want</a>, but that does mean that we need to actually do the building. I, for one, am planning to become much more serious about putting in the work on this. Moving this blog back to my own website is a first step in this direction, but it is not the last.</p>
<hr>
<p><em>Image source</em>: <a href="Papilla Estelar: Remedios Varo">https://brooklynrail.org/2017/10/criticspage/Hidden-Figures/</a></p>


</section>


<div id="quarto-appendix" class="default"><section id="footnotes" class="footnotes footnotes-end-of-document"><h2 class="anchored quarto-appendix-heading">Footnotes</h2>

<ol>
<li id="fn1"><p>The link is at the bottom of the email, or let me know at <a href="mailto:blog@ddimmery.com">blog@ddimmery.com</a>↩︎</p></li>
<li id="fn2"><p>Or “connect the world” depending on how you feel about Facebook↩︎</p></li>
<li id="fn3"><p>Note also the importance of open-source software in what I’m looking for.↩︎</p></li>
<li id="fn4"><p>Apoorva Lal has a nice <a href="https://apoorvalal.github.io/lalgorithms/eternalizing_septembers">formal model of this</a>↩︎</p></li>
<li id="fn5"><p>I’ll write a rundown of how I did this in a later post.↩︎</p></li>
<li id="fn6"><p>crypto! micropayments! easy ways to bundle or unbundle author’s writings! different models of payment beyond mere “subscriptions”!↩︎</p></li>
<li id="fn7"><p>I am increasingly a fan of the work Semantic Scholar is doing as a more palatable alternative to Google Scholar. Unfortunately, they are probably doomed to failure because Google Scholar uses an expansive definition of citation that means their numbers are bigger. Even as a fan of Semantic Scholar, I definitely provide the Google Scholar citation count and h-index for annual reviews.↩︎</p></li>
<li id="fn8"><p>Of course, EuroStack is a bit of a classic European idea in which everyone who talks about it means something slightly different than one another. Nevertheless!↩︎</p></li>
<li id="fn9"><p>In some sense, we all are.↩︎</p></li>
</ol>
</section><section class="quarto-appendix-contents" id="quarto-reuse"><h2 class="anchored quarto-appendix-heading">Reuse</h2><div class="quarto-appendix-contents"><div><a rel="license" href="https://creativecommons.org/licenses/by/4.0/">CC BY 4.0</a></div></div></section><section class="quarto-appendix-contents" id="quarto-citation"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">BibTeX citation:</div><pre class="sourceCode code-with-copy quarto-appendix-bibtex"><code class="sourceCode bibtex">@online{dimmery2025,
  author = {Dimmery, Drew},
  title = {Back to {Basics}},
  date = {2025-10-27},
  url = {https://ddimmery.com/posts/back-to-basics/},
  langid = {en}
}
</code></pre><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-dimmery2025" class="csl-entry quarto-appendix-citeas">
Dimmery, Drew. 2025. <span>“Back to Basics.”</span> October 27, 2025. <a href="https://ddimmery.com/posts/back-to-basics/">https://ddimmery.com/posts/back-to-basics/</a>.
</div></div></section></div> ]]></description>
  <category>technology</category>
  <category>website</category>
  <guid>https://ddimmery.com/posts/back-to-basics/</guid>
  <pubDate>Mon, 27 Oct 2025 00:00:00 GMT</pubDate>
  <media:content url="https://ddimmery.com/posts/back-to-basics/main-image.png" medium="image" type="image/png" height="144" width="144"/>
</item>
<item>
  <title>What’s the point of RCTs?</title>
  <dc:creator>Drew Dimmery</dc:creator>
  <link>https://ddimmery.com/posts/whats-the-point-of-rcts/</link>
  <description><![CDATA[ 





<p>Kevin Munger and I recently had the joy to write a <a href="https://arxiv.org/abs/2501.12161">response</a> to “<a href="https://arxiv.org/abs/2108.11342">Nonparametric Identification is not enough, but randomized controlled trials are</a>” by the all-star team of Aronow, Robins, Saarinen, Sävje and Sekhon (ARSSS). You can tell Kevin and I like this paper because as of this moment, we make up about <a href="https://scholar.google.com/scholar?start=0&amp;hl=en&amp;as_sdt=2005&amp;sciodt=0,5&amp;cites=1083203567583178174&amp;scipsc=">one quarter of its total citations on Google Scholar</a>. I believe a number of other responses will be coming out soon; I’m very excited to read them.</p>
<p>I’m going to riff a bit more here about <a href="https://arxiv.org/abs/2501.12161">our response, “Enough?”</a>. In particular, I want to dig into what I see as the most important perspective we articulate in the paper and some of its further implications. Namely, we argue that the biggest value of experiments is ontological<sup>1</sup>.</p>
<section id="how-to-respond-to-a-great-paper" class="level1">
<h1>How to respond to a great paper</h1>
<p>This was my first reaction to thinking about how to do this. I love the paper, and I refer to it a lot. But a response isn’t that interesting if you’re just saying “ditto”. We started out with Kevin wanting to talk more about how uniform consistency of the SATE was still insufficient for science (plus various other big picture philosophy of science points) and me wanting to discuss different reasons why RCTs <em>are</em> special (e.g.&nbsp;correct specification through shoe-leather)<sup>2</sup>. Regardless, we quickly latched onto the most provocative word in the title: “enough”. What does it mean for a method to be “enough”? Musing on this formed the core of our response.</p>
<section id="enough-for-what" class="level2">
<h2 class="anchored" data-anchor-id="enough-for-what">Enough for what?</h2>
<p>We started off by trying to think through what it would, conceivably, take for an estimate to be generalizable knowledge. We identified 4 things that we certainly want to be able to generalize over: (1) sample, (2) site, (3) realization of a theory, and (4) time. The first two are fairly well-trodden. Convenience samples may make it hard to generalize to populations of interest, and <a href="https://academic.oup.com/qje/article-abstract/130/3/1117/1933105">site-selection bias</a> may make it difficult to generalize to the locations we’d like to know for future policy choices. Tal Yarkoni has <a href="https://www.cambridge.org/core/journals/behavioral-and-brain-sciences/article/abs/generalizability-crisis/AD386115BA539A759ACB3093760F4824">already talked through the multiple challenges of generalizations</a> across various dimensions in the language of random effects in Psychology.</p>
<p>The third element is kind of weird. What exactly is the set of treatments we could have defined based on a particular theory we’re trying to test? I have a bit of a pre-occupation with this question because it was my experience running message-tests at Facebook that while psychological mechanisms may be important, so are the specifics of wording that are largely unrelated to the mechanisms. If you write text that sounds like a robot, it’s gonna suck even if you’re pulling on the right mechanism. If we want to generalize the results to the <em>theory</em> being good or bad, then you have to be able to abstract away from this.</p>
<p>The final point, on time, is a long-time hobby horse of Kevin’s. One of the main criticisms of this focus is that it isn’t actually any different than other forms of generalizability. We identify one extremely important way that time is different: there’s no ‘shoe-leather’ based solution for it. You can random sample from a population, a set of sites or from potential treatment definitions. You cannot randomly sample time. This is an extraordinarily important difference, because it means, ultimately, that you must rely on modeling assumptions that may not be necessary in these other cases (for exactly the reasons ARSSS spell out: randomization is an extremely powerful tool to eliminate such assumptions).</p>
<p>I strongly believe in the power of shoe-leather<sup>3</sup>. The fact that it is insufficient for temporal generalization should be concerning<sup>4</sup>. Regardless, our point is just that randomization is good, sure, but it doesn’t get us to where we want to go scientifically. What <em>does</em> makes it good?</p>
</section>
<section id="control-of-what" class="level2">
<h2 class="anchored" data-anchor-id="control-of-what">Control of what?</h2>
<p>I think the best part of the paper focuses on the ontological value of experimentation. That is, experiments create novel states of the world, and this is a powerful way to imagine alternatives to the social reality we currently inhabit. It creates incentives for people to create cooler and more ambitious changes to the world while measuring what those changes do. But I think this gets at another reason that RCTs are particularly useful.</p>
<p>Observational methods <em>assume</em> a complicated ontology in the course of creating a DAG: how to chunk up the social world into nodes and how to assign values for those nodes to each unit. Put simply, the social world is extremely complex, and this process is subject to extraordinary error. RCTs circumvent this difficulty entirely: their accuracy does not depend on such ontological assumptions. Instead, they <em>impose</em> their ontology. In the paper, we focused on the ontology of <em>the treatment</em>, but the argument is much broader and deeper.</p>
<p>For example, suppose that we run some field experiment that changes substantial aspects of people’s media diet and we measure effects on various constructs that we think are interesting: ideology, perhaps, beliefs about facts, etc. It isn’t just that the RCT imposes the ontology of treatment and control, the RCT provides information on all of these constructs that we create (i.e.&nbsp;<em>causal effects</em> on these constructs). In survey research a common recommendation is to avoid regressing one survey construct on another<sup>5</sup>. But the truth is, we do this with all observational research, as we make assumptions about constructs everywhere. With the RCTs, we only do this on one side of the equation. The treatment ontology is imposed, but this allows us to weaken the assumptions of ontology in the outcomes. We will still have valid causal effects on whatever ontology we choose.</p>
<p>Or maybe in our field experiment, we measure something collected administratively about behavior (e.g.&nbsp;turnout in an election). In this case, both the treatment and the outcome have some ontological precision in their meaning<sup>6</sup>. What about the heterogeneity we may observe? Well, we don’t rely on the ontology the same way we do in observational settings: to take an ontology that is currently the subject of political contention, consider measuring heterogeneity by gender identity. There are different ways to construct this ontology, but when we choose such a construction, we can simply measure the heterogeneity <em>according to</em> this ontology. If we’re instead in an observational world, we would need to care much more about getting this ontology “correct” in some way as it pertains to selection into treatment<sup>7</sup>. In an RCT, we do not <em>rely</em> on this correctness. We can explore variation with respect to whatever ontology we choose and, perhaps, even make judgments about which ontology better reflects the variation in response.</p>
<p>RCTs give us a solid foundation from which to explore these questions, while observational work cannot do this so easily.</p>
</section>
<section id="for-whom" class="level2">
<h2 class="anchored" data-anchor-id="for-whom">For whom?</h2>
<p>Critically, however, the ontological power of experiments is premised on control of the world. It is power not just in the statistical sense of charts and numbers, but in the sense of guns and laws. Wielding this form of power requires responsibility and humility.</p>
<p>Experimentation can be a tool for <a href="https://en.wikipedia.org/wiki/Achieving_Our_Country">achieving our country</a>. By trying out new social possibilities, we can provide the foundation on which a better society can be built. Doing this requires venturing past <em>is</em> and into <em>ought</em>. This is fraught territory for scientists, so must be part of a <a href="https://scholarship.law.columbia.edu/faculty_scholarship/2038/">larger democratic process</a>. As the fabric of society is increasingly constructed of bits and code, we can increasingly implement creative new interventions that are meaningful to people’s lives. I think, in fact, we’re ethically <em><a href="https://medium.com/mit-media-lab/the-obligation-to-experiment-83092256c3e9">obligated</a></em><a href="https://medium.com/mit-media-lab/the-obligation-to-experiment-83092256c3e9">to do so</a>.</p>
<p>By making our focus broader in this framing, we do something that ARSSS could not: draw distinctions between <em>types</em> of experiments and demonstrate the value of experimentation within a larger <a href="https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1541-0072.1973.tb00128.x">experimenting society</a>. All experiments enjoy the statistical benefits that ARSSS describe. Not all experiments are similarly powerful ontological tools for considering new social worlds. The experiments that measure up best under this perspective are the ones that attempt ambitious changes to social reality. The experiments that do poorly are, mostly, what we have: survey experiments. These are, surely, not <em>enough</em>.</p>
<p><a href="https://arxiv.org/abs/2501.12161">Read the paper</a> and let us know what you think!</p>


</section>
</section>


<div id="quarto-appendix" class="default"><section id="footnotes" class="footnotes footnotes-end-of-document"><h2 class="anchored quarto-appendix-heading">Footnotes</h2>

<ol>
<li id="fn1"><p>You may have read some about <a href="https://kevinmunger.substack.com/p/the-ontological-case-for-rcts">this perspective on Kevin’s blog</a>.↩︎</p></li>
<li id="fn2"><p>Chris Harshaw <a href="http://www.chrisharshaw.com/wp-content/uploads/2025/01/ine-comment.pdf">has a response to ARSSS</a> that goes more in this direction focused on the importance of the design-based frame of thought (and its rhetorical power).↩︎</p></li>
<li id="fn3"><p>If you’re reading this blog, I hope you’re already familiar with the beautiful David Freedman paper on this subject, <a href="https://www.jstor.org/stable/270939">Statistical Models and Shoe Leather</a>. If you aren’t, read it immediately.↩︎</p></li>
<li id="fn4"><p><a href="https://upload.wikimedia.org/wikipedia/commons/0/03/David_Hume_Ramsay.jpg">Shoutout to the GOAT.</a>↩︎</p></li>
<li id="fn5"><p>I’ve done a little searching, but I’d be interested in some citation archeology / intellectual history. Where does this recommendation originate?↩︎</p></li>
<li id="fn6"><p>Or at least something like phenomenological precision.↩︎</p></li>
<li id="fn7"><p>If you’ve ever read into the literature on measurement error of covariates for observational causal inference (its a nearly intractable problem), this should make you sweat.↩︎</p></li>
</ol>
</section><section class="quarto-appendix-contents" id="quarto-reuse"><h2 class="anchored quarto-appendix-heading">Reuse</h2><div class="quarto-appendix-contents"><div><a rel="license" href="https://creativecommons.org/licenses/by/4.0/">CC BY 4.0</a></div></div></section><section class="quarto-appendix-contents" id="quarto-citation"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">BibTeX citation:</div><pre class="sourceCode code-with-copy quarto-appendix-bibtex"><code class="sourceCode bibtex">@online{dimmery2025,
  author = {Dimmery, Drew},
  title = {What’s the Point of {RCTs?}},
  date = {2025-02-13},
  url = {https://ddimmery.com/posts/whats-the-point-of-rcts/},
  langid = {en}
}
</code></pre><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-dimmery2025" class="csl-entry quarto-appendix-citeas">
Dimmery, Drew. 2025. <span>“What’s the Point of RCTs?”</span> February
13, 2025. <a href="https://ddimmery.com/posts/whats-the-point-of-rcts/">https://ddimmery.com/posts/whats-the-point-of-rcts/</a>.
</div></div></section></div> ]]></description>
  <category>methodology</category>
  <category>experiments</category>
  <guid>https://ddimmery.com/posts/whats-the-point-of-rcts/</guid>
  <pubDate>Thu, 13 Feb 2025 00:00:00 GMT</pubDate>
  <media:content url="https://ddimmery.com/posts/whats-the-point-of-rcts/main-image.png" medium="image" type="image/png" height="143" width="144"/>
</item>
<item>
  <title>What was US2020?</title>
  <dc:creator>Drew Dimmery</dc:creator>
  <link>https://ddimmery.com/posts/what-was-us2020/</link>
  <description><![CDATA[ 





<p>The US 2020 Facebook and Instagram Election Project (US2020) was a sprawling project with dozens of people involved at various levels: like little ol’ me (on the Meta side). Meta spent more than $20 million on it and committed the time of numerous experienced data scientists and engineers. It was huge. But it was also very small compared to the size of Meta.<sup>1</sup> In this post, I want to reflect on what, exactly, it was, particularly in light of the eLetters to <a href="https://www.science.org/doi/10.1126/science.abp9364">one of its marquee studies</a> and <a href="https://www.tandfonline.com/eprint/YGEGF9UQYJTMEAGSUFRF/full?target=10.1080/10584609.2024.2446351">Kevin Munger’s nice article asking what we learned from the project</a>. The fairest reading of the project is the result of a dialogue between internal and external researchers, so that’s the direction I’m going to work towards.</p>
<p>The idea here is that I want to try to develop a theory of what this project was<sup>2</sup>. That is, why does the collaboration exist? Should it exist? If and when we want to do Big Social Science like this, what will we be doing? What <em>should</em> we be doing?</p>
<p>To make the argument that US2020 is best understood as a dialogue, I need to start by talking through what and how large Facebook (the company) was and how experimentation works in that context which is defined by the scope of the collective endeavor that is Facebook (the platform). After that I can address what I see as fundamental misperceptions about the nature of US2020: the idea that the project was either industrial research or independent research. It wasn’t, and trying to fit it into one of these molds leads to many confused discussions about the research.</p>
<p>Independence in future projects should not be the goal, but this does not mean that we need to sacrifice reliability or trustworthiness. Quite the opposite. The very human trustworthiness of the project is as a result of the combination of the non-independence (of the internal researchers) and independence (of the academic researchers). The* tension* between the two of these was what made this project valuable. We can accept that this tension will exist, but resolve much of it through the use of pre-analysis plans. This will better allow us to express the agency and expertise of everyone involved in the project and, hopefully, make future projects more likely to exist and be better.</p>
<p>So, where to start?</p>
<section id="what-was-facebook" class="level2">
<h2 class="anchored" data-anchor-id="what-was-facebook">What was Facebook?</h2>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://ddimmery.com/posts/what-was-us2020/image_1.webp" class="img-fluid figure-img"></p>
<figcaption>Image</figcaption>
</figure>
</div>
<p>Facebook was an internet technology company from 2004 to 2021. It no longer exists, having been succeeded by “Meta”. At its peak in 2021, it had<a href="https://www.statista.com/statistics/273563/number-of-facebook-employees/">70,000 employees</a> and <a href="https://www.statista.com/statistics/268604/annual-revenue-of-facebook/">118 billion in revenue</a>, truly staggering numbers that are difficult to comprehend. Around half of those employees were in technical roles<sup>3</sup>: they directly worked on changing and improving the functioning of the services that Facebook offered. Can you imagine working on a project with 30,000 other people? It’s critical to understand that this is part of what it means to work at Facebook. And it really felt like this, sometimes! Internal tooling changed often (because people were actively working on it!), the use of a <a href="https://en.wikipedia.org/wiki/Monorepo">Monorepo</a> meant that all engineers pushed code to the same repository and there were meaningful connections <em>in code</em> between projects. To land a commit at Facebook famously only required you to get one other engineer to sign off on it<sup>4</sup>. The number of commits to the repository was so large that you couldn’t expect to have the same ground truth from one second to the next<sup>5</sup>. For a sense of the speed at which Facebook changed, read <a href="https://engineering.fb.com/2017/08/31/web/rapid-release-at-massive-scale/">this post detailing some of the deployment systems Facebook used</a> written by Chuck Rossi, a famous<sup>6</sup> release engineering guru at the company.</p>
</section>
<section id="experimentation-at-scale" class="level2">
<h2 class="anchored" data-anchor-id="experimentation-at-scale">Experimentation at scale</h2>
<p>Facebook changed quickly. Temporal validity is a real problem when things change at this pace. But the problem was actually even worse than the sheer scale would suggest! Many of the biggest changes to how Facebook work were gated not by commits of code, but by A/B testing<sup>7</sup>. The number of active experiments at any one time was on the order of <em>thousands</em>. Even assuming each test had only two groups, this number implies that no two people were likely to experience the same Facebook<sup>8</sup>. I wouldn’t be surprised if, given the increase in employees relative to when we roughed out this number in around 2016, that number was an order of magnitude higher by 2021<sup>9</sup>.</p>
<p>It’s worth thinking about how these experiments worked a little bit. One thing the handful of people implementing a single test absolutely could not do was to control everything about how the other 30k engineers run <em>their</em> tests. This is simply the reality of working on a project with this many other people.<sup>10</sup> This led to difficulties occasionally, in which stuff about the platform changed and rendered a prior test uninformative or incorrect about the current state of the world. On my team, this is why we encouraged two practices: (1) long-run holdouts (where for important features, the rollout of the feature would be to ~99% of users rather than 100%) (2) rerunning important tests to verify results continued to hold.<sup>11</sup></p>
<p>Everything is evaluated relative to the sum total of other <em>stuff</em> Facebook is doing on the platform.</p>
<p>The reality of massive simultaneous experimentation and change resulted in internal experiments using the term “status quo” condition rather than “control” condition in experiments. Everything was evaluated relative to the sum total of other <em>stuff</em> Facebook was doing on the platform. As an example, <a href="https://ax.dev/tutorials/human_in_the_loop/human_in_the_loop.html">check out the software my former team developed for doing complicated experimentation at platform scale</a>. There is no “control” group, there’s just the “status quo”: what the platform is <em>currently</em> doing. The other groups are potential modifications of that behavior. The reason we evaluate relative to this group is because <em>experimentation is a means of optimization</em><sup>12</sup><em>. </em>Experimentation was a tool to improve how the system worked, so we wanted to evaluate our <em>one change</em> relative to the otherwise current state of the system. We never care about comparing to the state of the system 1 day or even 5 minutes ago.</p>
<p>This is one reason why I welcomed the <a href="https://www.science.org/doi/10.1126/science.abp9364#elettersSection">eLetter by Bagchi et al at Science</a>: it’s important to remember that “Facebook” (the platform) is a moving target<sup>13</sup>. In short, during the course of US2020, the company implemented a number of other measures aimed at combatting bad behavior during the election (“break-glass” measures). A lot of people forget how large and diverse an organization Facebook (the company) is. The “break-glass” measures discussed in the eLetter are a great example of this. One hand certainly can’t control what the other hand is doing, but it often isn’t even aware of what that hand is doing<sup>14</sup>. Unfortunately, I think this important message is undercut by another one in the eLetter:</p>
<blockquote class="blockquote">
<p>This can lead to situations where social media companies could conceivably change their algorithms to improve their public image if they know they are being studied.</p>
</blockquote>
<p>This implies a single-minded agency of the daily practice of Facebook engineers that is unreasonable (and largely impossible). As I tried to emphasize above, the company is huge, diverse and fast-moving<sup>15</sup>. The vision of a monolithic company exercising its will is simply inaccurate, especially when considering a project as small as US2020. Lots of things were changing during the course of this study; that’s (a) entirely normal at this scale and (b) why it’s important for us to be humble about its temporal validity (which I think we were)! The thing to be angry about here is not that the company tried to tweak the results of US2020 to make them come out hunky-dory: the thing to be mad about is that we have no truly reliable measures of the effects of these break-glass measures!<sup>16</sup> We need <em>more</em> testing, not less.</p>
<p>The thing to be angry about here is not that the company tried to tweak the results of US2020 to make them come out hunky-dory: the thing to be mad about is that we have no truly reliable measures of the effects of these break-glass measures!</p>
</section>
<section id="was-us2020-independent-or-industrial-research" class="level2">
<h2 class="anchored" data-anchor-id="was-us2020-independent-or-industrial-research">Was US2020 independent or industrial research?</h2>
<p>This leads to what I think is a more important question about the nature of this project. Was US2020 industrial research, guided primarily by the needs of Facebook (the company)? The answer is clearly no. The academics proposed the designs, they had final control rights on the resulting papers and their salaries were in no way dependent on the results of these papers<sup>17</sup>. This is all critical, as there is an important difference between the Facebook and academic researchers in the project: Academia has academic freedom, while industry researchers do not. Does that mean that these papers are actually independent, then? Of course not. Seeking independence is a category error for this project. The internal researchers have incentives, <em>as do the academics</em>.</p>
<p>Fundamentally, these parties have different things they bring to the project: internal researchers knew how the platform works in detailed ways that no external person could and they had the capacity to actually work with internal platform data. They had been studying the ins-and-outs of this platform full-time for years. External researchers, on the other hand, were not burdened by corporate incentives and they had the academic freedom to speak freely about the project and their beliefs<sup>18</sup>. All of these perspectives are necessary for a successful collaboration, but it would be incorrect to suggest that they were equal partners. They cannot be equal, because they contribute in radically different ways.</p>
<p>The best one can hope for is to reconcile competing interests by writing down what everyone wants from the project, having an argument about which things can be agreed upon and then choosing to do those things or not. We already have a great tool for this kind of collaboration<sup>19</sup>.</p>
</section>
<section id="pre-registration-as-a-contract" class="level2">
<h2 class="anchored" data-anchor-id="pre-registration-as-a-contract">Pre-registration as a contract</h2>
<p>The solution, I think, is setting out a clear contract between the parties at the start of the project. We actually already know what these are: pre-registrations. The pre-registration serves as a contract about exactly what you plan to study and how you plan to do that<sup>20</sup>. Filing it publicly in advance can then tie your hands and commit you to doing what you said. While problems inevitably arise post-data, it’s easy to file amendments as necessary. <a href="https://cyrussamii.com/?p=3154">Cyrus Samii wrote about how he (and EGAP) sees pre-registrations</a> that I think relates very nicely to this view, as he talks about the value of pre-registration as a conversation about the research you plan to do. It very much strengthens the quality of the work you will end up producing!<sup>21</sup> When parties are collaborating on research and the parties have different incentives, a pre-registration serves as a Memorandum of Understanding. It gives you a specific document on which you can have an explicit negotiation about what you will do in a transparent way. For example, if the parties can’t agree on a definition of the ‘control’ group, then it must be redefined to reflect what can be agreed on. If in the course of executing the project, one side makes a unilateral decision about something substantively important that would not have been agreed to by the other side, then this is an error in the contract for lacking specificity over something important. Take the lesson and apply it to the next project.</p>
<p>This is the most critical part of the design of US2020. Take expert internal researchers who have a clear conflict of interest (but know more about the actual workings of the platform than anyone outside it) and combine that with independent researchers who are not beholden to the company and are only interested in learning something about how it works. Let these two sides negotiate on what would make a mutually beneficial research project, write it down, do that project<sup>22</sup>. The appropriate reading, then, of these studies is as the result of a dialogue<sup>23</sup>. And to the point of the eLetter which sparked this meditation, the pre-registration documents we put together make clear our choice of comparison group <em>which is exactly why these documents are so valuable</em>.<sup>24</sup></p>
<p>Early on, it was decided that the project should speak with a single voice. This makes sense given how an academic paper typically works with only a couple authors: consensus works, and every author can speak their mind about the paper. In my opinion, it doesn’t work for a collaboration as large and diverse as US2020. In US2020, only the academics have the freedom to say what they think, and the design of the project means that they are always the ones who have the final determination over what gets written. In practice, this almost always resulted in decision by consensus. There was really very little disagreement because everyone sought the best scientific result. That said, disagreements existed. For example, from <a href="https://www.science.org/doi/10.1126/science.adi2430">Michael Wagner’s report</a> as the rapporteur of US2020:</p>
<blockquote class="blockquote">
<p>Specifically, Meta researchers wanted to be able to expressly disagree with lead author interpretations of findings or other matters in articles they coauthored. Stroud and Tucker opposed this effort, noting that collaborators can remove their name from a paper if they have a fundamental disagreement.</p>
</blockquote>
<p>From the way the project was designed, this was the appropriate decision (downstream from choosing to speak with only a single voice), but for the design of future projects, I vehemently disagree with this structure. All members of US2020 are expert scientists, many of whom have different perspectives. The remediation for disagreeing with anything that appears in one of these papers is, for Meta researchers, to entirely take their name off of a project which they have poured, likely, hundreds of hours into: the nuclear option. Much better to have some forum (an appendix?) for authors to include a personal statement clarifying their <em>personal</em> perspective on the paper and how that diverges from the main text<sup>25</sup>. This acknowledges that no written document can perfectly express the views of dozens of scientists, and provides far more information about what the project shows and what may be debateable matters of interpretation. Anyone who has talked with a scientist knows that if you put two together in a room together you’ll have <em>at least</em> two perspectives on any topic under the sun. It also better expresses the reality that everyone involved in the project is a scientific expert: not merely those in academia. This sets up what I think is a common misapprehension about the project.</p>
</section>
<section id="facebook-researchers-were-not-robots" class="level2">
<h2 class="anchored" data-anchor-id="facebook-researchers-were-not-robots">Facebook researchers were not robots</h2>
<p>In the editorial published along with the eLetters, <a href="https://www.science.org/doi/10.1126/science.adt2983">Holden Thorp and Valda Vinson wrote</a>:</p>
<blockquote class="blockquote">
<p>In a statement for this editorial, Chad Kiewiet de Jonge, a research director at the company, Meta, insisted (to H.H.T.) that it had been forthcoming about the emergency measures to the researchers.</p>
</blockquote>
<p>Reading this, it might surprise you to learn that not only is Chad an author of the published paper in question, he’s in fact one of the lead authors. That is, he has been in all of these meetings where such things would be discussed! To quote from the detailed list of contributions:</p>
<blockquote class="blockquote">
<p>N.J.S. and J.A.T. were joint principal investigators for the academic involvement on this project, responsible for management and coordination. <strong>C.K.d.J.</strong>, A.F., and W.M. <strong>led Meta’s involvement on this project</strong> and were responsible for management and coordination. [emphasis mine]</p>
</blockquote>
<p>Frankly, this has been a problem from the beginning. As I said immediately on publication, “<a href="https://x.com/DrewDim/status/1684660868133351426">This project was a deep collaboration, and generally, I think, not an adversarial one, either. Everyone wanted to get this right!</a>” Nevertheless, it hasn’t always been presented like this. If you read the project as a dialogue between internal and external researchers (the accurate view, I believe), the only way that this could possibly be <a href="https://www.science.org/doi/epdf/10.1126/science.adi2430">“independent” as in the standard set by Michael Wagner</a> is if the Meta coauthors were robots who merely carried out the precise will of the academics<sup>26</sup>. This is clearly bad for the scientific output, and it isn’t what actually happened, as Wagner notes:</p>
<blockquote class="blockquote">
<p>On occasion, this, and advice they received from Meta researchers, led the outside academics to reshape what it is they sought to study, or how they sought to study it, as they learned details about what data were collected and structured by Meta for analysis. For example, the outside academics were not precisely aware of how Facebook groups were joined and, in some cases, left, by users. Coming to understand this led to changes in the working paper examining behavioral polarization.</p>
</blockquote>
<p>And yet, there is an implication that Meta researchers were trying to hide things, as in the quote he shares from a former employee saying,</p>
<blockquote class="blockquote">
<p>Facebook researchers will answer every question they get from the professors honestly; they are ethical professionals. However, they are also corporate employees. If the professors don’t ask the exact right question, Facebook staff won’t volunteer that they know how to get what you (outside academics) are really asking for.</p>
</blockquote>
<p>It’s very weird that this is attributed to a former employee rather than either the academics or to anyone involved in the project. From everything I saw, Meta researchers didn’t just narrowly answer direct questions, but bent over backward to try and re-interpret requests to get at what the academics’ questions intended. For my part, on one paper I spent a lot of time on, I wrote dozens of pages trying to redefine an unworkable analysis an external academic suggested to one that would be workable — while preserving the research questions. Other Meta employees did similar work on basically every single paper. That work is, in effect, an act of persuasion aimed getting the academics, who have the control over the content, to adopt the change or redefinition based on their view of the scientific merit. To what extent was it based on corporate incentives and to what extent was it based on scientific judgment? Likely a combination: I believe more of the latter than the former. But the negotiation of the pre-analysis plan is the way that this is adjudicated, and in US2020, the external academics were the only ones with the power to decide.</p>
<p>I think the problem is that people can’t help but see Meta as a monolithic organization which exhibits a single will. This is seriously incorrect in almost obvious ways: employees disagree about, well, basically everything! <a href="https://profilebooks.com/work/the-unaccountability-machine/">Dan Davies’ book</a> on the flows of accountability through the cybernetic lens is a useful correction to this vision. The company “Meta” is distinct from the sum of its individual parts: the procedures and processes that it uses to determine how to operate are largely not under the control of any singular person. In short, it is an organization and this imposes constraints on employees. And yet, journalists and editors of Science seem to have trouble understanding this. Internal researchers must seek permission to do this kind of research (a heavy constraint), but conditional on permission to do so, there is no instrument of Facebook’s will hanging over every discussion and decision they make. This is simply not how information is transmitted within the company<sup>27</sup>.</p>
<p>Specifically on US2020, employees fought for years to enable this kind of research. There was extreme skepticism among executives precisely due to a belief that there’s no way that the project could be “good” for Facebook. The project exists despite the fact that Facebook (the company) was not particularly in favor of it. Rather, the researchers you see listed as coauthors are on there specifically because <em>they</em> thought it was important to do rigorous, transparent social science about the effects of social media. As a personal anecdote, I was advised against participating in the project if I wanted to optimize my career at Facebook: it’s much more professionally rewarding to make the system work better than it is to do this kind of evaluative work<sup>28</sup>. That said, I don’t mean to paint all internal researchers as exactly like people who choose not to work for Facebook. The internal researchers are unlikely to have seriously negative beliefs about the platform, so they may demand stronger standards of evidence for harm than might be acceptable to people more willing to see the inherent evil of the company.<sup>29</sup> That’s why you have the argument before you see the evidence: in a pre-analysis plan.</p>
</section>
<section id="what-does-the-future-hold" class="level1">
<h1>What does the future hold?</h1>
<p>I think there’s a real danger in misrepresenting what this project was, but that danger isn’t about falling prey to Facebook’s manipulation. Instead, the danger is that there may never be something else like it, even though it’s the single best way we’ve managed to study the crucial question of how social media affects societal outcomes. I think that collapsing all of the internal researchers into a single “Facebook” blob risks poisoning the well. This project exists only because of individual people—many working within Facebook—working hard to make it exist. Mark Zuckerberg, or any other embodiment of Facebook (the company), did not ask for this project. He did not push for it to exist, and he certainly did not ask for the rigorous structure imposed on how the collaboration would work. The right way to think about this project is that Mark Zuckerberg was willing to humor the desires of people within his organization for rigorous social science. After all, <strong>how much worse could the results be than what people already believed?</strong></p>
<p>For my part, future research collaborations I have with industry researchers will not pretend to independence. Where possible, I will manage conflicts of interest with pre-analysis plans (advocating for what I see as scientific merit), but I think aiming at subordination of these researchers is ultimately self-defeating. When interpretations of results differ, I will resolve disagreements through an appendix (like a “deviations from PAP” section) identifying the different camps of belief: thus bringing it out into the open of scientific discussion. I will bring the dialogue to the forefront, not hide it out of a misguided (and impossible) search for independence.</p>
<p>By far the most important lesson for social science when faced with a complex entity like the Facebook platform is that we should not be content with a single snapshot about its impact. Acknowledging the size and scale of the engineering effort behind this company must mean acknowledging the necessity to run and rerun tests about important potential interventions.</p>
<p>Would the results of the chronological feed experiment hold in a different election? Without the break-glass measures?<sup>30</sup> In a different country? The only way we can ever know is if we run more tests. The only way we can run those tests is if we understand what US2020 was and try to improve it so that the next version is better. Either way, when I was at Facebook, we would never leave something un-tested and un-optimized if we thought it was important. <a href="https://drewdimmery.substack.com/p/a-blueprint-for-the-regulation-of">Maybe the only way we’ll get such tests is if governments require them</a>.</p>
<p>On October 29, <a href="https://thehill.com/policy/technology/4960052-meta-study-impact-2024-election/">Senator Markey sent a letter to Zuckerberg urging further independent research on the societal effects of Meta in the vein of US2020</a>. A Meta spokesperson just pointed back at the work from 2020 (much of which is still yet to be published), as if this is the final word on the subject. There won’t be a US2024, and it looks like there won’t be a US2028, either.</p>


</section>


<div id="quarto-appendix" class="default"><section id="footnotes" class="footnotes footnotes-end-of-document"><h2 class="anchored quarto-appendix-heading">Footnotes</h2>

<ol>
<li id="fn1"><p>Kevin and I were noodling on what kind of commitment US2020 was on the scale of Meta, and he came up with the following: “Millions of dollars and thousands of hours by highly-paid employees — let’s go nuts and say it cost them $25 million. This is compared against $134 billion in revenue for 2023. That’s .019% of their annual revenue devoted to this 6-year project.” <a href="https://kevinmunger.substack.com/p/seeing-like-a-newsfeed">[source]</a> This is a huge commitment to social science research, but this is simply not a large scale project by Facebook standards. Reality Labs burns like $15 billion per year.In contrast, financial institutions <a href="https://www.economist.com/finance-and-economics/2019/05/02/the-past-decade-has-brought-a-compliance-boom-in-banking">commit around 15% of their employees to compliance</a>.↩︎</p></li>
<li id="fn2"><p>See <a href="https://kevinmunger.substack.com/p/the-theory-of-the-academic-firm">Kevin’s post on this</a>, I think it’s a critical standpoint for understanding modern scientific production.↩︎</p></li>
<li id="fn3"><p>I can’t find a definitive answer online, but a <a href="https://www.quora.com/How-many-Facebook-employees-are-engineers-or-work-in-technical-positions-and-how-many-non-engineers-are-there-working-on-Facebook-products-vs-infrastructure-backend-etc#:~:text=Of%20these%20employees%2C%20around%2058,%2C%20human%20resources%2C%20and%20finance.">Quora answer quotes 58%</a> and <a href="https://newsletter.pragmaticengineer.com/p/facebook">Gergely Orosz cites 43%</a>. 50% seems like it roughly jibes with my intuition.↩︎</p></li>
<li id="fn4"><p>The number of major SEVs (read as “major bad thing”) caused by a commit reviewed by a colleague with only the four letters “LGTM” as the review is… a lot.Note: if you do this in “someone else’s code” they are liable to get extremely mad at you.↩︎</p></li>
<li id="fn5"><p>A lot of the blame for this falls on bots that do “automatic” commits.↩︎</p></li>
<li id="fn6"><p>Infamous? Certainly had some of the best war stories of deployment at early Facebook I received in bootcamp.↩︎</p></li>
<li id="fn7"><p>This also led to plenty of bugs that only popped up in particular combinations of features gated by A/B tests, code, and a variety of other technical means.↩︎</p></li>
<li id="fn8"><p>(2 treatments per test) ^ (number of independent tests) is much larger than 8 billion for any reasonable number of independent tests. There are a lot of independent tests at Facebook due to the “universe” concept (the same <a href="https://dl.acm.org/doi/abs/10.1145/1835804.1835810">basic idea described by Google here</a>).↩︎</p></li>
<li id="fn9"><p>If A/B tests scale linearly with employees. There were 17k employees in 2016 and 72k in 2021, an increase of 4x.↩︎</p></li>
<li id="fn10"><p>There are, of course, some exceptions: global holdouts that all employees have to respect, or particular features which interfere with each other strongly. These are both handled with technical solutions. The former by allowing global restrictions on who can be put in experiments and the latter through the use of “universes” (for more, see <a href="https://hci.stanford.edu/publications/2014/planout/planout-www2014.pdf">the paper on PlanOut</a>, for instance) that ensure that users cannot be put in multiple conflicting tests.↩︎</p></li>
<li id="fn11"><p>If this post wasn’t already way too long, I would provide some examples. One notable one was when, between iterations of text optimization, the translation systems changed, vastly affecting our results.↩︎</p></li>
<li id="fn12"><p>This perspective, more than maybe any other, was key to what our team was good at. Figuring out what metric to optimize (metric definition) and what levers to pull were often (treatment definition) more essential than the ML toolkit we used for actually doing the optimization, which was fancy and useful, but also ultimately subservient to the problem definition.↩︎</p></li>
<li id="fn13"><p>They frame this as an issue of “validity”. In our response, we note that they are only referring to external validity, but I find the term misleading in general as Kevin and I discuss in <a href="http://osf.io/wmhc4">our working paper on partial identification and temporal validity</a>: This word has the unfortunate implication of being binary; computer login passwords and driver’s licenses are either valid or invalid. To say that a driver’s license is “mostly valid” is to say that it is “not valid.” Scientific knowledge is not binary, and while most practitioners can successfully keep this reality in mind when discussing “external validity,” the term introduces unnecessary confusion.And as far as temporal validity goes, as I quoted in my <a href="https://drewdimmery.substack.com/p/a-blueprint-for-the-regulation-of">first post on this newsletter</a>:<a href="https://investor.fb.com/investor-events/event-details/2022/Q2-2022-Earnings/default.aspx">Right now, about 15% of content in a person’s Facebook feed and a little more than that of their Instagram feed is recommended by our AI from people, groups, or accounts that you don’t follow. We expect these numbers to more than double by the end of next year.</a>The current News Feed on Facebook is simply not the same thing we evaluated: the company moves too fast for that. If the only research we deem valid is research which is perfectly temporally valid 4 years on from its setting, we will have no evidence-base on which to make policy at all. This is an important problem we must contend with as a question of meta-science.↩︎</p></li>
<li id="fn14"><p>It’s hard for me to remember, but despite being an employee and working on US2020 I think I only heard about these break-glass measures through public reporting, rather than from internal communications, but I generally worked on methods stuff rather than specifically on applied civic stuff.↩︎</p></li>
<li id="fn15"><p>It is not without reason that employees have been described as <a href="https://en.wikipedia.org/wiki/Chaos_Monkeys">Chaos Monkeys</a>.↩︎</p></li>
<li id="fn16"><p>As far as I know, these measures weren’t even implemented in a way to test things internally. but I could be wrong about that, I have no special vision into this.↩︎</p></li>
<li id="fn17"><p>I suppose you could say that the results would change the amount of prestige afforded to them. If this is the case, would the largely null results we found have been the best for their careers?↩︎</p></li>
<li id="fn18"><p>Unlike other participants in US2020 from the Meta side, I actually now have the benefit to speak freely, so I’ve got that going for me.↩︎</p></li>
<li id="fn19"><p>I’m specifically not referring to this as adversarial collaboration, because I don’t think that’s an accurate reading of the reality. It really wasn’t that adversarial, because the internal researchers also wanted to know the answers to these questions. We didn’t collect information on things like race or politics as a matter of course.↩︎</p></li>
<li id="fn20"><p>Note that my argument here is completely divorced from the statistical arguments in favor of pre-analysis plans.↩︎</p></li>
<li id="fn21"><p>Note that this justification isn’t a narrow one about getting a hypothesis test to cover at an appropriate rate, it’s about actual substantive concerns of what to study and how to study it.↩︎</p></li>
<li id="fn22"><p>Of course, no plan survives contact with the enemy and every data scientist knows that the only true enemy is data.↩︎</p></li>
<li id="fn23"><p>A dialectic?↩︎</p></li>
<li id="fn24"><p>Some selections from the pre-registrations: The News Feed of users who are assigned to the control condition will not change as a result of the experiment.from the Likeminded exposure PAPUsers assigned to the control condition will see their normal News Feed under existing Facebook policies and procedures.from an as-yet unpublished study’s PAPUsers who have consented are randomized into three treatment conditions (or a control condition shared with other studies).from the Chronofeed PAP (emphasis mine)The second statement, I think, makes this the most clear. Facebook has existing policies and procedures, and the scope of this study will not be changing that. Given the size of Facebook (which I have repeatedly emphasized), I don’t think this should be surprising!↩︎</p></li>
<li id="fn25"><p>Since the papers were not reviewed by Meta for content, this would give internal researchers some very small measure of the academic freedom that they do not, in general, enjoy.↩︎</p></li>
<li id="fn26"><p>Academics, mind you, who simply don’t know how internal systems, logging and metrics actually work. There’s no reason they could know this, in fact.↩︎</p></li>
<li id="fn27"><p>To fully think through the question would require a discussion about how performance is judged at Meta. I think the best way to think about this is that Facebook uses a form of <a href="https://factorialhr.com/blog/stack-ranking/">stack ranking</a>, where your manager has about 20 seconds to advocate on your behalf before putting you in a ranked list of your colleagues in the same organization and at the same level. The stuff that will be most persuasive are large impacts of your work on important corporate KPIs.↩︎</p></li>
<li id="fn28"><p>Because you can measure the value of making a system work better: you run an A/B test and see what your change does.↩︎</p></li>
<li id="fn29"><p>This is another argument for RCTs, as I think this debate pops up much more clearly in observational causal inference where you have to determine your willingness to balance the potential for bias against the desire to get some answer to a thorny question no matter how flawed. To be explicit, I suspect internal researchers are much less willing to accept any bias, partially due to a strong internal culture in favor of RCTs. <a href="https://www.tomleavitt.com/s/Leavitt_Fisher_Meets_Bayes.pdf">Thomas Leavitt</a> has a related argument on this point about the epistemic value of randomization.↩︎</p></li>
<li id="fn30"><p>Let’s leave a holdout of 0.5% of the Facebook population from receiving these measures next election to find out!↩︎</p></li>
</ol>
</section><section class="quarto-appendix-contents" id="quarto-reuse"><h2 class="anchored quarto-appendix-heading">Reuse</h2><div class="quarto-appendix-contents"><div><a rel="license" href="https://creativecommons.org/licenses/by/4.0/">CC BY 4.0</a></div></div></section><section class="quarto-appendix-contents" id="quarto-citation"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">BibTeX citation:</div><pre class="sourceCode code-with-copy quarto-appendix-bibtex"><code class="sourceCode bibtex">@online{dimmery2025,
  author = {Dimmery, Drew},
  title = {What Was {US2020?}},
  date = {2025-01-29},
  url = {https://ddimmery.com/posts/what-was-us2020/},
  langid = {en}
}
</code></pre><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-dimmery2025" class="csl-entry quarto-appendix-citeas">
Dimmery, Drew. 2025. <span>“What Was US2020?”</span> January 29, 2025.
<a href="https://ddimmery.com/posts/what-was-us2020/">https://ddimmery.com/posts/what-was-us2020/</a>.
</div></div></section></div> ]]></description>
  <category>social-media</category>
  <category>metascience</category>
  <guid>https://ddimmery.com/posts/what-was-us2020/</guid>
  <pubDate>Wed, 29 Jan 2025 00:00:00 GMT</pubDate>
  <media:content url="https://ddimmery.com/posts/what-was-us2020/main-image.png" medium="image" type="image/png" height="54" width="144"/>
</item>
<item>
  <title>Is Bluesky convivial?</title>
  <dc:creator>Drew Dimmery</dc:creator>
  <link>https://ddimmery.com/posts/is-bluesky-convivial/</link>
  <description><![CDATA[ 





<p>Back in November of 2023, I was lucky enough to participate in a “Morality in Tech” workshop hosted by Princeton’s CITP put on by Kevin Munger. It was a fascinating interdisciplinary group of people trying to wrap our heads around how to grapple with how to tie moral and ethical frameworks to technological change. It culminated in the “<a href="https://kmunger.github.io/pdfs/syllabus_citp.pdf">Building the Society We Want</a>” syllabus.</p>
<p>This post is a modified version of a talk I gave on the historical development of <a href="https://en.wikipedia.org/wiki/Smalltalk">Smalltalk</a>, and how I see it as a fundamentally different way of thinking about technology than we’re used to today. Unfortunately, I ultimately don’t see it as a way out of the mess we’re in on the social web. Bluesky cannot resolve the fundamental problems with microblogging, which are fundamental to the technology.</p>
<hr>
<section id="what-is-smalltalk" class="level2">
<h2 class="anchored" data-anchor-id="what-is-smalltalk">What is Smalltalk?</h2>
<p>Smalltalk was developed by Alan Kay, Dan Ingalls and Adele Goldberg at Xerox PARC in the 1970s. What it was is a little hard to frame in terms of our contemporary categories around computing. It was more than a programming language and not exactly an operating system as we’ve come to understand them. It was the software part of Alan Kay’s overarching personal computing vision (from back before computing was really <em>personal</em> computing). Smalltalk had a lot of things that are now very common, but were very uncommon when it came out. It was an early iteration of Object-oriented programming<sup>1</sup>, it had an early graphical interface, all code was just-in-time compiled (and could be recompiled on the fly), and it integrated a development environment into how the user interacted with the system. But Smalltalk was also a <em>philosophy</em> of computing:</p>
<blockquote class="blockquote">
<p>look for the distinction between a programming language and a programming system, and consider the difference in providing a system in which the user can <em>feel individual mastery over complexity</em></p>
</blockquote>
<p>(emphasis mine) The intention was to make the vast complexity of a computer system something that end-users could <em>control</em>. The system should allow the complexity of the system to be comprehensible to the user, and allow them to adapt and reorient it at their will.</p>
<p>A lot of what I’m drawing on here is from <a href="https://archive.org/details/byte-magazine-1981-08/">the 1981 issue of BYTE Magazine</a> in which PARC chose to write a public introduction to the newest iteration of the Smalltalk system. It’s available online and a truly fascinating read that I can’t recommend enough.</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://ddimmery.com/posts/is-bluesky-convivial/image_1.png" class="img-fluid figure-img"></p>
<figcaption>Image</figcaption>
</figure>
</div>
<p>Smalltalk was intended to give the User the power of God over the computing system. This isn’t really subtext. In this introduction to Smalltalk, they provide the following illustration:</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://ddimmery.com/posts/is-bluesky-convivial/image_2.png" class="img-fluid figure-img"></p>
<figcaption>Image</figcaption>
</figure>
</div>
<p>On the far left, the System Programmer (God) creates a system (the Taj Mahal). The Programmer creates things (a bridge) for the end-user using the things provided by the System Programmer. Smalltalk was intended to democratize the power of creation, moving into the second figure from the left, in which the Programmer creates kits that the User can use to construct what they want. In Figure 3 this goes even further, as Users can create the kits themselves that they can assemble into what they want. On the far right, however, the idea is that Programmers / Users should be able to climb up above God and impose their own will over that of the System Programmer: to become God.</p>
<p>It’s a vision of a radically empowered user who can manipulate every aspect of their computing environment. In my mind, this vision is exactly what it means to have a “convivial” tool in the sense of Ivan Illich: tools which empower human liberty rather than constrain.</p>
<p>There are costs, however, as Smalltalk asks much more of users, as Kay and Goldberg write in 1977:</p>
<blockquote class="blockquote">
<p>The burden of system design and specification is transferred to the user. This approach will only work if we […] allow ordinary users to casually and easily describe their desires for a specific tool.</p>
</blockquote>
<p>But the benefit is a fully “reactive” system<sup>2</sup>. What did this look like?</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://ddimmery.com/posts/is-bluesky-convivial/image_3.png" class="img-fluid figure-img"></p>
<figcaption>Image</figcaption>
</figure>
</div>
<p>Essentially, you had a browser like the above, through which you could navigate through your system, then the panel at the bottom allowed you to edit the code defining how that part of your system worked. You could then choose to re-compile that part of your system and it would automagically update to follow your new instructions. Voila, complete control over the system, manipulable at runtime.</p>
<p>But there’s one last quote, from Daniel Ingvalls from his article about the design principles of Smalltalk:</p>
<blockquote class="blockquote">
<p><strong>Natural Selection</strong>: Languages and systems that are of sound design will persist, to be supplanted only by better ones.</p>
</blockquote>
<p>Our computing is not based around Smalltalk these days, nor does it look very much like this: it died out in favor of opaque systems with a lack of user-control. It seems that users were, in the end, not willing to take the burden of full responsibility for the functioning of their own computing environment.</p>
</section>
<section id="empowerment-is-still-about-power" class="level2">
<h2 class="anchored" data-anchor-id="empowerment-is-still-about-power">Empowerment is still about power</h2>
<p>I think the question of inevitability is what I really struggle with here. The dream of a system that radically empowers the user is beautiful. But are users willing to actually take this power? Can they? In grappling with this question, I tried to think about affordances of social media and the extent to which they align with Smalltalky principles.</p>
<ul>
<li><a href="https://blueskyweb.xyz/blog/7-27-2023-custom-feeds">Bluesky lets users choose their own Feed algorithm</a>.</li>
<li><a href="https://web.archive.org/web/20140109112039/https://twitter.com/chrismessina/status/223115412">The hashtag was created</a> as a tool for users to take control of discovery and categorization themselves.</li>
<li>Distributed social media services (e.g.&nbsp;Mastodon) seek to allow individuals to choose the community (and, therefore, community norms) they wish to belong to.</li>
<li>individual choice about network links to create (e.g.&nbsp;friending on Facebook): <a href="https://www.science.org/doi/10.1126/science.ade7138">this is the source of a lot of ideological polarization</a>, rather than something done “to” users by algorithms. In each of these, the user gets pretty profound ability to shape what their social media <em>is</em>. But it’s still not enough! My choice of a feed algorithm doesn’t give me the experience of using Bluesky that I would have if <em>everyone</em> were to use my preferred feed. In order for me to “choose” my experience on the platform, I need the power to choose how <em>everyone</em> experiences the platform. Empowering users doesn’t avoid the critical question of who holds the power to make such changes.</li>
</ul>
<p>In a previous post, I <a href="https://drewdimmery.substack.com/p/stop-looking-for-the-next-twitter?r=bjnt&amp;triedRedirect=true">claimed that there were four main definitional characteristics of microblogging</a>: (1) short texts (2) virality (3) simple engagement possibilities and (4) an infinite feed. The users can’t alter these characteristics on BlueSky or on any other platform. <a href="https://www.science.org/doi/abs/10.1126/science.add8424">Removing reshares</a> from your <em>own</em> screen doesn’t change whether you experience virality and the associated <a href="https://www.tandfonline.com/doi/abs/10.1080/10584609.2019.1687626">credibility cascades</a> and everything else associated with virality. To kill virality, you’d need to make aggressive changes like removing the visibility of likes from everyone’s screen, removing ranking based on likes, views, clickthroughs, killing reshares <em>globally</em> and the list of changes goes on. After making all of those changes for <em>everyone</em> then maybe you’ve done a good job of breaking virality. Can you give virality back to a single user, at that point?</p>
<p>It is fundamentally impossible for <em>everyone</em> to alter <em>everything</em> about their social media experience to their whims, while continuing to make it meaningfully <em>social</em>. Even in a world where everyone gets to choose everything about how <em>they</em> interact with social media, they don’t get to choose how <em>everyone else</em> interacts with social media<sup>3</sup>. Our experiences online aren’t just shaped by our own choices, but by the choices of those around us<sup>4</sup>. We can’t “solve” our experience by changing only our own affordances.<sup>5</sup> Bluesky can only ever give the illusion of autonomy, it can never provide true control over the environment as envisioned by Smalltalk. That autonomy only works for a computing platform that is fundamentally a-social: it interacts <em>only</em> with you and responds <em>only</em> to your preferences.</p>
<p>Convivial, anarchist empowerment to shape one’s own technical reality doesn’t offer a way out of the shared reality we create with technologies. At best, it can offer a patchwork of small fixes. We cannot fundamentally alter the medium, because the medium is defined by what is shared. The critical question is who has the power to choose the shape of the overall technology, and <a href="https://drewdimmery.substack.com/p/a-blueprint-for-the-regulation-of">who gets to study how the world is changed by those choices.</a> Right now, only the platforms have these powers.</p>


</section>


<div id="quarto-appendix" class="default"><section id="footnotes" class="footnotes footnotes-end-of-document"><h2 class="anchored quarto-appendix-heading">Footnotes</h2>

<ol>
<li id="fn1"><p>In fact, Alan Kay doesn’t actually think modern OOP is at all related to what “OOP” was for Smalltalk. If I had to condense down what I understand he sees as the critical distinction, I’d say that it would be something about autonomy: his vision saw individual “objects” being more-or-less autonomous computing units, which would communicate with one another and this would allow the programmer to create complex systems based on these interactions. Crucially, he saw this as a lot more than a paradigm for writing and organizing a particular piece of software.↩︎</p></li>
<li id="fn2"><p>“The Smalltalk programming environment is reactive: the user tells it what to do and it reacts, instead of the other way around.” Tesler 1981↩︎</p></li>
<li id="fn3"><p>And, in particular, a world in which collaborative filtering continues to play a strong role in what content your network (or other people more broadly) sees.↩︎</p></li>
<li id="fn4"><p>The most banal and obvious platitude of social science: “other people matter for your experience of life”↩︎</p></li>
<li id="fn5"><p>Well, you can just sign off completely. And maybe you should! But that’s a different question altogether. For what it’s worth, I’ve basically signed off. I don’t think the bargain is a positive one in my life.↩︎</p></li>
</ol>
</section><section class="quarto-appendix-contents" id="quarto-reuse"><h2 class="anchored quarto-appendix-heading">Reuse</h2><div class="quarto-appendix-contents"><div><a rel="license" href="https://creativecommons.org/licenses/by/4.0/">CC BY 4.0</a></div></div></section><section class="quarto-appendix-contents" id="quarto-citation"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">BibTeX citation:</div><pre class="sourceCode code-with-copy quarto-appendix-bibtex"><code class="sourceCode bibtex">@online{dimmery2024,
  author = {Dimmery, Drew},
  title = {Is {Bluesky} Convivial?},
  date = {2024-11-21},
  url = {https://ddimmery.com/posts/is-bluesky-convivial/},
  langid = {en}
}
</code></pre><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-dimmery2024" class="csl-entry quarto-appendix-citeas">
Dimmery, Drew. 2024. <span>“Is Bluesky Convivial?”</span> November 21,
2024. <a href="https://ddimmery.com/posts/is-bluesky-convivial/">https://ddimmery.com/posts/is-bluesky-convivial/</a>.
</div></div></section></div> ]]></description>
  <category>technology</category>
  <category>social-media</category>
  <guid>https://ddimmery.com/posts/is-bluesky-convivial/</guid>
  <pubDate>Thu, 21 Nov 2024 00:00:00 GMT</pubDate>
  <media:content url="https://ddimmery.com/posts/is-bluesky-convivial/main-image.png" medium="image" type="image/png" height="144" width="144"/>
</item>
<item>
  <title>Examining the Apparatus</title>
  <dc:creator>Drew Dimmery</dc:creator>
  <link>https://ddimmery.com/posts/examining-the-apparatus/</link>
  <description><![CDATA[ 





<p>Kevin Munger has <a href="https://www.motherjones.com/politics/2024/01/the-algorithm-social-media-facebook-technology-the-apparatus/">a great article out problematizing the idea of “The Algorithm”</a>. You should read the Mother Jones article and then you should read <a href="https://kevinmunger.substack.com/p/the-algorithm-is-the-only-critique">his Substack</a><sup>1</sup>. They’re both great! And they touch on an important theme I’ll return to a lot, which is about the right ways to conceive of and examine complex systems.</p>
<p>The argument, in brief, is that “the Algorithm” is a misleading frame that elides our participation in the system. He proposes (based on the hottest philosopher of 2024, Vilém Flusser) “the Apparatus” as a better term that acknowledges each of our presence within this greater system which defines much of what we actually mean by “the Algorithm”.</p>
<p>In this blog, I’m going talk about the implications of re-formulating the source of our present weirdness as “the Apparatus”. How is the apparatus constructed? What are the nuts and bolts of how exactly we are implicated in its construction? Shockingly, I’ll argue we need more experiments.</p>
<hr>
<section id="the-apparatus-explained" class="level2">
<h2 class="anchored" data-anchor-id="the-apparatus-explained">The Apparatus, explained</h2>
<p>I think it’s worth talking through how change happens in Tech. I’ll focus on Meta, because it’s the place I know best. This is basically the guts of how the Apparatus gets created.</p>
<p>The problem faced by Tech companies is, put simply “how should we decide what to do?”. There are a number of approaches to this, but I’m going to contrast two:</p>
<ul>
<li>Build stuff and then ask people how/whether they like it.</li>
<li>Build stuff and then see how/whether people use it. We have to keep in mind, however, that we’re fundamentally talking about making decisions for a product used by (e.g.) 2 billion people. We have to think about how these two approaches would work in that setting, i.e.&nbsp;asking the question “Will it scale?”</li>
</ul>
<p>The first approach would rely on large-scale surveys. There’s no real way to build up qualitative knowledge about 2 billion people, so we need to be quantitative, and surveys are pretty much the only game in town if you want to get folks’ opinions like this. There are hard limits here, though. At the limit, you can’t (for instance) ask every user how their experience of the site is every five minutes. At Facebook, the standard “cool-down” period after taking a survey was six months when I was there. This imposes hard limits on the precision with which you can actually measure the opinions of users<sup>2</sup>.</p>
<p>The second approach would rely on directly observing behavior and encoding this into meaningful measures about what people are doing. Most of this is just counting things that people are doing<sup>3</sup>. Because you don’t ask for any of your users’ attention to do this, there aren’t really any practical limits to your measurement of on-platform behaviors<sup>4</sup>.</p>
<p>Given these practicalities, it’s unsurprising that the result is an almost ideological belief in <em>behavioralism</em>: that the only way to understand what users want is to observe what they do<sup>5</sup>. My understanding is that this was part of an inspiring introductory speech that Chris Cox would give to new hires<sup>6</sup>.</p>
</section>
<section id="the-apparatus-sustained" class="level2">
<h2 class="anchored" data-anchor-id="the-apparatus-sustained">The Apparatus, sustained</h2>
<p>This process results in a cycle of iterative improvement. <a href="https://press.princeton.edu/books/hardcover/9780691159263/the-internet-trap">Matt Hindman’s book</a> does a good job of describing the power of this cycle. It’s also the basis of the Agile software development philosophy, expressed at Facebook as “ship early, ship often” or, more provocatively, “move fast and break things”.</p>
<p>This saying has, I think, always been misinterpreted. The point is not that you’re okay with breaking things. The point is that you’re okay with <em>trying</em> new things, because you know that you have systems in place for identifying when things break. Many of the neurotic high-performers that get hired in tech are perfectionists, so you need to encourage them to not spend years working on something before launching it and seeing how it works. This should really be interpreted through <a href="https://agilemanifesto.org/">a mantra of the Agile Manifesto</a>:</p>
<blockquote class="blockquote">
<p>Our highest priority is to <strong>satisfy the customer</strong> through early and continuous delivery of valuable software.</p>
</blockquote>
<p>Over time, trying a bunch of stuff (some of which might <em>seem</em> crazy ex ante) and watching what users do in response to it is a great way to build a successful product. I think this is one of the defining features of the secret sauce of modern Big Tech.</p>
<p>This perspective flows into incentives throughout the company. Software engineers are, in large part, promoted and compensated based on how well their changes result in user behavior that is deemed “good”. The number went up? Congratulations, you get a bonus. You can break apart this monolith, but this isn’t a bad first order approximation<sup>7</sup>.</p>
</section>
<section id="the-apparatus-examined" class="level2">
<h2 class="anchored" data-anchor-id="the-apparatus-examined">The Apparatus, examined</h2>
<p>This discussion should make it clear that there’s no sense in which any person understands “the Algorithm”, as Kevin argues. Wouldn’t that be comforting? Stafford Beer had a famous saying, “<a href="https://en.wikipedia.org/wiki/The_purpose_of_a_system_is_what_it_does">The purpose of a system is what it does</a>”. This isn’t a reassuring sentiment. We have to work extremely hard to see and measure what a system does. There’s no easy shortcut through “intentions” to understand a system. Even with a truth-serum, we can’t just ask Zuck what he intended to build and then decide if that’s good or bad<sup>8</sup>. No engineer fully understands the entirety of the complex system they’re working on, so we can’t judge the worth of what they’ve built solely on what they intended to build.</p>
<p>So what <em>is</em> the purpose of the apparatus? Well, in one sense it’s essentially the aggregate of the incentives for all the people <em>building</em> the apparatus: making the numbers go up. If we want to understand the purpose of the apparatus, we need to encode the things it does into numbers, and measure how changes to the apparatus result in changes to those numbers. There’s only one consistent way to do that: experimentation<sup>9</sup>. And we need to find the right <em>new</em> numbers to measure, because there will undoubtedly be unintended consequences.</p>
<p>But we cannot forget that the essence of the behavioralism in Tech is that <em>we are those numbers</em>. We are as much a part of the apparatus as the engineers who build the system: every post, every click, every message, every linger over a story is part of the apparatus. This isn’t just true in the sense of our being complicit in the system. It’s true mechanically in the way engineers <em>construct</em> the system. The builders of the system <em>want</em> to give us what we want.</p>
<p>Thanks for reading Drew’s News! Subscribe for free to receive new posts and support my work.</p>


</section>


<div id="quarto-appendix" class="default"><section id="footnotes" class="footnotes footnotes-end-of-document"><h2 class="anchored quarto-appendix-heading">Footnotes</h2>

<ol>
<li id="fn1"><p>This editorial line of this blog is generally pro-Munger.↩︎</p></li>
<li id="fn2"><p>It would be interesting to write down, under reasonable assumptions, exactly what this would impose as the minimal detectable effect. It would probably be small (for social science), but implausibly large (for the practical decisions about product made 1000 times a day in tech).↩︎</p></li>
<li id="fn3"><p>There was a joke at Facebook that this was the main skill of data scientists: we invented a whole job just to do advanced counting. Move fast and count things!↩︎</p></li>
<li id="fn4"><p>Technical limits exist. Tracking and storing all mouse movement for all 2 billion users would be an absurdly large amount of data collection. More commonly, the practice would be to radically downsample any of this kind of data that is collected.↩︎</p></li>
<li id="fn5"><p>Of course, there’s also a “revealed preferences” element to this, but even if you ignore that, there’s still a practical matter of scaling.↩︎</p></li>
<li id="fn6"><p>Somehow, I didn’t get this either as an intern or when I started as a full-time. This is sad, as I’ve heard is was a fascinating romp through the history of technology in a very McLuhan way. Oh well.↩︎</p></li>
<li id="fn7"><p>I think you can break a lot of work in Tech into three groups (excluding support staff that doesn’t work directly on product): engineers (make numbers go up), UX researchers (make sure we choose the right numbers to go up), internal measurement tool builders (make sure when we say number goes up, it actually went up).↩︎</p></li>
<li id="fn8"><p>I think many critics would be surprised at the prevalence good intentions here.↩︎</p></li>
<li id="fn9"><p>There’s an argument here that this is granting too much to behavioralism: that framing the problem in these terms is already giving up the game. I don’t think this is true. A strong argument for behavioralism is that it grants a common epistemological playing field that leads to productive arguments about what should be measured and what should be valued: a regulatory framework. Those are much more productive, I think, than arguing about more basic epistemological points on which there can never be agreement.↩︎</p></li>
</ol>
</section><section class="quarto-appendix-contents" id="quarto-reuse"><h2 class="anchored quarto-appendix-heading">Reuse</h2><div class="quarto-appendix-contents"><div><a rel="license" href="https://creativecommons.org/licenses/by/4.0/">CC BY 4.0</a></div></div></section><section class="quarto-appendix-contents" id="quarto-citation"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">BibTeX citation:</div><pre class="sourceCode code-with-copy quarto-appendix-bibtex"><code class="sourceCode bibtex">@online{dimmery2024,
  author = {Dimmery, Drew},
  title = {Examining the {Apparatus}},
  date = {2024-01-30},
  url = {https://ddimmery.com/posts/examining-the-apparatus/},
  langid = {en}
}
</code></pre><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-dimmery2024" class="csl-entry quarto-appendix-citeas">
Dimmery, Drew. 2024. <span>“Examining the Apparatus.”</span> January 30,
2024. <a href="https://ddimmery.com/posts/examining-the-apparatus/">https://ddimmery.com/posts/examining-the-apparatus/</a>.
</div></div></section></div> ]]></description>
  <category>social-media</category>
  <category>technology</category>
  <guid>https://ddimmery.com/posts/examining-the-apparatus/</guid>
  <pubDate>Tue, 30 Jan 2024 00:00:00 GMT</pubDate>
  <media:content url="https://ddimmery.com/posts/examining-the-apparatus/main-image.png" medium="image" type="image/png" height="144" width="144"/>
</item>
<item>
  <title>Calibration as an HTE diagnostic</title>
  <dc:creator>Drew Dimmery</dc:creator>
  <link>https://ddimmery.com/posts/calibration-as-an-hte-diagnostic/</link>
  <description><![CDATA[ 





<p>In <a href="https://open.substack.com/pub/drewdimmery/p/stop-looking-for-the-next-twitter?r=bjnt&amp;utm_campaign=post&amp;utm_medium=web">an earlier post</a>, I promised that rather than writing threads about papers, I’d talk about them here. That’s what this post is!</p>
<p>I’ve just had <a href="https://doi.org/10.1287/isre.2021.0343">a paper (with Yan Leng)</a> accepted to Information Systems Research titled “Calibration of Heterogeneous Treatment Effects in Randomized Experiments”. The entire motivation of the paper can be summed up in the following chart, showing that the HTEs are not alright:</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://ddimmery.com/posts/calibration-as-an-hte-diagnostic/image_1.png" class="img-fluid figure-img"></p>
<figcaption>Image</figcaption>
</figure>
</div>
<hr>
<p>The data undergirding this chart came from a big (and important) experiment at Facebook which was run around 2019 or so. It is sufficiently sensitive that I (still) can’t talk about what the actual intervention was, but one of the big questions was about how much heterogeneity there was in the intervention. It would have been a big deal if a small fraction of people exhibited very large effects, so we needed to look into whether this was the case. We did this by applying some of the rapidly developing literature on ML estimation of heterogeneous effects.</p>
<p>We threw machine learning at the problem using a bunch of covariates we collected about users’ behavior. We started with the simplest possible model, the venerable T-learner<sup>1</sup>. When I was at Facebook, I did a bunch of interviews in which I asked candidates to walk me through their model-building and model-testing process (starting with very simple gut-checks). One of the open-ended questions I would always ask was “how would you know if this particular model was good enough (e.g.&nbsp;to launch to production)?” With ML, you can often give pretty sensible answers to this question depending on what you’re trying to do and thinking carefully about what errors mean<sup>2</sup>. We found this a surprisingly hard question to answer in the causal setting.</p>
<p>How can we have a sense as to whether our HTEs are reasonable? In a standard machine learning setting, you have true labels which you can compare to your estimates to see how well you do at prediction. Even in the case of non-standard prediction models where you only learn about the truth long after you predict, you can use the truth to get a sense of how well you do. A great example of this was when <a href="https://projects.fivethirtyeight.com/checking-our-work/">FiveThirtyEight evaluated all of their predictive models they’d ever done</a>. Of course, as you learn in the first lecture of any causal inference class, you never actually observe the true treatment effect: you only have exactly one of the treatment and control potential outcome for a unit.</p>
<p>The standard answer in the causal setting is that while you cannot observe <em>individual</em> effects, you can observe <em>average</em> effects. To bring things back to the motivating graph at the top, we started with the very simple exercise of splitting up our data based on quintiles of the estimated HTEs (from the T-learner) and comparing the average model-based estimates against the standard difference-in-means estimates in each quintile.</p>
<p>The results were pretty startling, as the ML-based HTEs were radically smaller in magnitude than the <em>known unbiased</em> estimates from a difference-in-means. In short, this plot instantly alerted us to the fact that something was going very wrong. As you can see in the plot, these differences cannot simply be explained away as sampling variation in the difference-in-means. Rather, we found the issue to be that the plugin estimators make a fundamentally incorrect bias-variance tradeoff because they’re modeling <em>response-surfaces</em> rather than <em>treatment effects</em>. If you were to have infinite data, this problem might go away, but we’re ultimately never in asymptopia, and we need some heuristics and diagnostics to help us navigate this tradeoff.</p>
<p>The core of our paper is thus to motivate people to <em>look at their data</em>. Have your models actually done a good job of replicating the effects you know you can have some faith in? If not, you need to try and do something else. You can fit a new model (e.g.&nbsp;<a href="https://ddimmery.github.io/tidyhte/">a DR-learner with tidyhte</a>), or you can use our method to do something basically like <a href="https://en.wikipedia.org/wiki/Platt_scaling">Platt scaling</a>, which is what we talk through in the paper: take the aggregated data and find a linear transformation which makes the resulting estimates line up as best they can with the subgroup effects<sup>3</sup>.</p>
<p>For the linear rescaling approach to work well, you have to make a lot of assumptions. The review process was very focused on demonstrating that this calibration procedure improves the underlying HTE estimates<sup>4</sup>, but I think the best way to think about this is as, fundamentally, a diagnostic method. What we’re doing here is showing you how to look at your data and your model and see whether they align in any way whatsoever<sup>5</sup>. The ultimate plot we landed on (and recommend) looks like the following, which is read basically like a QQ plot:</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://ddimmery.com/posts/calibration-as-an-hte-diagnostic/image_2.png" class="img-fluid figure-img"></p>
<figcaption>Image</figcaption>
</figure>
</div>
<p>Note that the axes are scaled by the arcsin. In our settings, at least, this was very important, as estimated HTEs tended to have somewhat long tails (you can see that the vast majority of units are estimated to have effects with magnitude less than 1 or so, but there are still some pretty large estimated effects). To be clear, at the tails of the data, the HTE model understates true effects by around 50% (the difference in the blue dashed line defined by the difference-in-means and the red one from the estimated HTEs). When you care about what your intervention is actually doing, that’s an unacceptable error that you must do something to correct.</p>
<p>In general, I think finite-sample performance for HTEs is vastly underrated in import. There’s some very weird asymptotic shenanigans that happen in some of these HTE papers that I think should be pushed on more. For instance, did you know that causal forests acquire a lot of their nice properties because asymptotically, each terminal node has a constant treatment effect? How often does our data have remotely constant treatment effects within estimated leaves when heterogeneity actually matters? A lot of how well CF will do on your particular data is going to come down to what your true CATE function looks like (just like a lot of other approaches). Luckily, random forests are generally pretty good models, so I don’t think this invalidates much, but it does suggest you need to think hard about model validation. You shouldn’t just rely on asymptotic results.</p>
<p>Another result from our paper I like is that you can exactly characterize the (conditional) bias that you’ll get from fitting a T-learner with ridge regression: it’s purely a function of the regularization parameter: more regularization means more bias<sup>6</sup>. With a T-learner, you typically manage the bias-variance tradeoff on the response surfaces <em>individually</em>, so if you have a very noisy underlying response function, you’ll get a lot of bias <em>even if your true CATE function is very well behaved</em>! If we want to be serious about estimating HTEs, we should take these kinds of finite sample properties seriously. It’s 100% a finite sample issue, as if your data gets big enough, you won’t need to regularize, so you won’t have any bias.</p>
<p>In <a href="https://arxiv.org/abs/2010.11332">one of my papers</a> (with David Arbour and Anup Rao), we come at this from a whole different angle, motivating an experimental design based on doing a good job for a particular (knn-based) HTE estimator. We show that one particular view of good design is explicitly trying to minimize the MSE of HTE estimators<sup>7</sup>. I really like this paper, although it’s admittedly pretty weird<sup>8</sup>. In particular, I think the connection between experimental design and the <a href="https://en.wikipedia.org/wiki/Maximum_cut">Maxcut problem</a> is a fun insight that makes <a href="https://arxiv.org/abs/1312.0531">Kallus’s great design paper</a> a little easier to understand.</p>
<p>Anyway, try talking through all this in a Twitter thread!</p>




<div id="quarto-appendix" class="default"><section id="footnotes" class="footnotes footnotes-end-of-document"><h2 class="anchored quarto-appendix-heading">Footnotes</h2>

<ol>
<li id="fn1"><p>There are some reasons to expect that in an RCT a T-learner shouldn’t be too bad: you don’t need to correct for propensity scores (they’re all the same), so you really just want a model to help you extrapolate the tiny amount from nearby factuals to counterfactuals. <a href="https://www.pnas.org/doi/10.1073/pnas.1804597116">Soren’s paper</a> on X-learner has some simulations that bear this out. I want to be very clear that I am not recommending that you just stick with T-learners in RCTs, though! I’m only making the very weak argument that it isn’t crazy to think they might do a decent job as a first-cut.↩︎</p></li>
<li id="fn2"><p>A lot of very good computer scientists completely balked at the question, though. It requires actually thinking about the specifics of the application rather than just looking at an AUC number and making an evaluation.↩︎</p></li>
<li id="fn3"><p>An alternative approach that may be even better is to use the <a href="https://arxiv.org/abs/1712.04802v7">Chernozhukov et al.&nbsp;procedure</a>. We don’t directly compare to this, largely because our motivation starts from the position of diagnostics rather than estimation.↩︎</p></li>
<li id="fn4"><p>This was definitely reasonable from their perspective: there’s no sense in which the procedure we’re proposing is a panacea to the problems of HTE estimation (I’m increasingly negative on whether there is such a “correct” procedure).↩︎</p></li>
<li id="fn5"><p>The <a href="https://grf-labs.github.io/grf/reference/test_calibration.html">GRF package</a> has something they refer to as a “test” of calibration, but I think it’s an important distinction to provide these kinds of tests in a visual way. You will almost certainly turn up more problems when you are looking at more than just a single number (e.g.&nbsp;criticisms of the NHST paradigm <a href="https://www.johnmyleswhite.com/notebook/2012/05/14/criticism-3-of-nhst-essential-information-is-lost-when-transforming-2d-data-into-a-1d-measure/">compressing all information into a single p-value</a>).↩︎</p></li>
<li id="fn6"><p>Basically, what you get is the following, using the fact that ridge regression is basically just a linear rescaling of OLS (assuming the covariates form an orthonormal basis). The coefficient vector, β, is the set of “true” linear coefficients of the best linear predictor to the CATE function, λ is the regularization parameter:<a href="https://substackcdn.com/image/fetch/$s_!Ncf0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6353e5dc-354d-4382-876e-ad3b895efc25_628x108.png"></a>7Technically we’re minimizing a bound on MSE (albeit a tight bound). We have a whole section (4.3 and 4.4) problematizing the idea of optimization here, because ultimately you just can’t know enough about your data a priori to make “optimization” something that fully makes sense. In particular, I like the counterexample we define in Section 4.4 which I think makes the dependence on unknown properties (the conditional variance) very clear.↩︎</p></li>
<li id="fn7"><p>Footnote 7↩︎</p></li>
<li id="fn8"><p>The assignment process is basically not randomized, except up to a relabelling (i.e.&nbsp;swapping treatment and control). There are randomized extensions to this that would probably work quite well and efficiently, but which we haven’t spent the time to put together — feel free to reach out if you want to talk about them.↩︎</p></li>
</ol>
</section><section class="quarto-appendix-contents" id="quarto-reuse"><h2 class="anchored quarto-appendix-heading">Reuse</h2><div class="quarto-appendix-contents"><div><a rel="license" href="https://creativecommons.org/licenses/by/4.0/">CC BY 4.0</a></div></div></section><section class="quarto-appendix-contents" id="quarto-citation"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">BibTeX citation:</div><pre class="sourceCode code-with-copy quarto-appendix-bibtex"><code class="sourceCode bibtex">@online{dimmery2024,
  author = {Dimmery, Drew},
  title = {Calibration as an {HTE} Diagnostic},
  date = {2024-01-16},
  url = {https://ddimmery.com/posts/calibration-as-an-hte-diagnostic/},
  langid = {en}
}
</code></pre><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-dimmery2024" class="csl-entry quarto-appendix-citeas">
Dimmery, Drew. 2024. <span>“Calibration as an HTE Diagnostic.”</span>
January 16, 2024. <a href="https://ddimmery.com/posts/calibration-as-an-hte-diagnostic/">https://ddimmery.com/posts/calibration-as-an-hte-diagnostic/</a>.
</div></div></section></div> ]]></description>
  <category>methodology</category>
  <category>experiments</category>
  <guid>https://ddimmery.com/posts/calibration-as-an-hte-diagnostic/</guid>
  <pubDate>Tue, 16 Jan 2024 00:00:00 GMT</pubDate>
  <media:content url="https://ddimmery.com/posts/calibration-as-an-hte-diagnostic/main-image.png" medium="image" type="image/png" height="89" width="144"/>
</item>
<item>
  <title>Plagiarism is bad</title>
  <dc:creator>Drew Dimmery</dc:creator>
  <link>https://ddimmery.com/posts/plagiarism-is-bad/</link>
  <description><![CDATA[ 





<p>I think it’s important to make the simple point that <strong>plagiarism is bad<sup>1</sup></strong>. I won’t be drawn into specifics of current events which have precipitated this discussion, but there are a lot of academics saying absurd things that I want to publicly disagree with in order to uphold worthwhile norms. This post is less about questions of academic integrity than straightforward questions of what the academic enterprise is.</p>
<hr>
<section id="what-are-we-doing" class="level2">
<h2 class="anchored" data-anchor-id="what-are-we-doing">What are we doing?</h2>
<p>First, a digression. Latour and Woolgar (1979) is often credited with the idea that the main thing academics do is publish papers. Our incentive systems are fundamentally built around this idea: if you publish more papers that appear in good journals, you will be more successful. It will cause you to be paid more, have more prestige and, generally, wield more power. I used to think that this meant that what many academics were aiming to do was to simply <em>write more papers</em>. I am increasingly feeling that this is incorrect. Academics want to <em>produce</em> more papers, but they have no interest whatsoever in writing them<sup>2</sup>.</p>
<p>In the course of events, a major perspective has arisen, probably expressed most forcefully like this:</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://ddimmery.com/posts/plagiarism-is-bad/image_1.png" class="img-fluid figure-img"></p>
<figcaption>Image</figcaption>
</figure>
</div>
<p>Andrew is a trained economist who does good empirical work in the way trained economists do it. And, as he is telling you here, he does not care that much about his lit review section. To be a bit more blunt, I think the argument is basically that empirical social scientists aren’t <em>writing papers</em>, we’re <em>producing estimates</em>. We produce a couple of these estimates per paper, and they are the only actual information in the paper. We’re forced to surround these numbers with words because of arcane medieval institutions, but we all recognize that this is absurd and nobody cares about the <em>words</em>, only the <em>numbers</em>. The right way to read a paper is to skip all the textual mumbo-jumbo, look at the charts, see the numbers, then maybe read the methods section if their approach isn’t obvious. This is reflected by the way we do meta-analysis: each paper is one dot on a single forest chart.</p>
<p>To be clear, I think this is a heinous misrepresentation of what we should think of ourselves as doing. If what we were doing actually mattered, I might even say that I think this view is evil.</p>
</section>
<section id="ok-but-what-are-we-doing" class="level2">
<h2 class="anchored" data-anchor-id="ok-but-what-are-we-doing">Ok, but what <em>are</em> we doing?</h2>
<p>The work of quantitative social science is to take an irreducibly complex system and reduce some aspect of it down to a 20 page pdf. If we have even the slightest amount of humility, we should instantly understand that this is an impossible task. It is an even more impossible task to reduce that same complex system down to a few quantitative estimates: &lt;20 bits of information<sup>3</sup>. The reason we have 19.9 pages of text around those numbers is for a few reasons:</p>
<ul>
<li>because we’re trying to be humble by acknowledging all of the uncertainty and context around this impossible task</li>
<li>because we’re trying to acknowledge all of the other human effort that has been put into understanding this aspect of the world before (“standing on the shoulders of giants”)<sup>4</sup></li>
</ul>
<p>The idea that either objective is well served by copying from the abstract of another person’s work is starkly offensive. It brings to mind a conversation I was having about Ian Hacking the other day. One of the truly remarkable things he does in his writing is create fluent and meaningful reviews of complex work by other people. That it is so easy to read is because he takes the time and effort to translate that other work to the specific questions and topics he’s aiming to consider. This is a monumental effort! It requires a detailed understanding of both the task he has set before himself (e.g.&nbsp;“what do we know about what science is?”) and each of the distinct efforts before that touch on this question — including all the myriad ways that those prior efforts failed to answer the fundamental question, answered a slightly different question or otherwise went astray.</p>
<p>Is what we’re doing different than this? Fundamentally, I argue it is not. Unless we are completely faithfully replicating an analysis that has been done before<sup>5</sup>, we ought to have some fresh perspective we are bringing to the table: the relationship between the current work being performed and what has been done in the past will not be the same as pre-existing work. I think the discussion about copying summaries of papers back and forth completely misses the task. A much better term than “literature review” is, I think, “related work”. It highlights that what is being done in the section is highlighting <em>connections</em> between works. Why on earth would you just provide a <em>generic</em> summary of someone else’s work in your paper? A good summary of related work should be opinionated and it should deal in comparisons: it should be providing a contribution in its own right as a way of synthesizing existing work to come to conclusions about what is known, what isn’t known, and what the strengths and weaknesses are of the existing body of work.</p>
<p>This is obviously not easy, and for shape-rotators like myself, this isn’t exactly work that is always <em>pleasant</em>. But, like, it’s part of the job, man.</p>
<p>I suppose the counter-argument is that time spent working on related work is time <em>not</em> spent working on the quantitative estimates. That’s true, but there are two major counterarguments: (i) the thing we’re producing is the <em>paper</em>, and the related work section is a meaningful fraction of that: Spend the time to make it better. (ii) Specialization means not everyone has to spread their time among all sections in the paper. If you’re doing a shitty job on one part of a paper, add a coauthor that can do a better job on that part<sup>6</sup>.</p>
</section>
<section id="in-defense-of-thinking" class="level2">
<h2 class="anchored" data-anchor-id="in-defense-of-thinking">In defense of thinking</h2>
<p>My point is simple. For historically contingent reasons (not necessarily <em>good</em> reasons), academia has chosen that the primary way that we should communicate is through the medium of the academic paper. Given this choice, I think it’s pretty important that we actually care about what we can communicate through this medium, which has the capacity to share much more than a few quantitative estimates.</p>
<p>An example given by Matt Blackwell<sup>7</sup> is illustrative:</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://ddimmery.com/posts/plagiarism-is-bad/image_2.png" class="img-fluid figure-img"></p>
<figcaption>Image</figcaption>
</figure>
</div>
<p>This was particularly funny to me because I just wrote a paper where this common boilerplate sentence was probably my most highly edited one through the entire revision process. I will soon publish a broader blogpost around <a href="https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3875850">this (now forthcoming) paper</a>, but first I will quote the relevant section:</p>
<blockquote class="blockquote">
<p>This relationship closely accords with the definition of calibration in classiﬁcation and regression problems (Kuleshov et al.&nbsp;2018), with <strong>the added challenge resulting from the fundamental problem of causal inference: Labels (i.e., ITEs) are never observed (Holland 1986).</strong></p>
</blockquote>
<p>The bolded sentence roughly accords with Matt’s example of how people usually cite Holland (1986). Crucially, the boilerplate version is not useful for the specific point I wanted to make in this selection! I specifically wanted to indicate (by my use of the very machine-learning term “label” when referring to individual treatment effects) the challenge this well-trodden conceptual problem poses for simple supervised learning. This required <em>different words</em>. It’s my contention that this is almost always the case. If you find yourself repeating something that has been said many times before, maybe you don’t actually need to say it!</p>
<p>We should therefore not normalize the reuse of other people’s words (suited to one particular context, audience and purpose) and re-use them for our own (different!!) context, audience and purpose. I think it is both true that we have sleep-walked into reusing text pretty commonly, as in Matt’s example, and also that <em>we should not do that</em>.</p>
</section>
<section id="climbing-off-the-high-horse" class="level2">
<h2 class="anchored" data-anchor-id="climbing-off-the-high-horse">Climbing off the high horse</h2>
<p>I recognize that it isn’t always possible to achieve the kind of writing that I’m talking about here. We live in a fallen world. I don’t particularly consider myself a good writer or editor, and everyone faces constraints that lead us to behave in ways that we don’t think are fully consistent with our values. But that doesn’t mean that it’s good or neutral when we make those compromises. It is bad. Plagiarism is bad.</p>
<p>The real problem here is, of course, systemic. We are rewarded for producing as many papers as possible, not particularly for having a well crafted discussion on related work. I do, however, want to at least reinforce the norm here: you <em>should</em> write your own papers, even the parts you don’t like (or get someone else to write them). You <em>should</em> care about writing those sections well, even if they aren’t your favorite part of the paper. They are all part of the work you are signing your name to.</p>
<p>We should not make light of the actual work we do as academics by implying that it is fine to reuse text that is not our own. It is not.</p>


</section>


<div id="quarto-appendix" class="default"><section id="footnotes" class="footnotes footnotes-end-of-document"><h2 class="anchored quarto-appendix-heading">Footnotes</h2>

<ol>
<li id="fn1"><p>Yes, this is also surprising to me.↩︎</p></li>
<li id="fn2"><p>I’m very consciously not talking about GPT here, but, like, also that.↩︎</p></li>
<li id="fn3"><p>A single number with 4 decimal precision is equal to log₂(10^4) = 13.3 bits, assume there are a couple of estimates.↩︎</p></li>
<li id="fn4"><p>Citation also lets us bypass the hard limits of page limits, too — citations let us gesture at much more complicated ideas than we would be able to fully explain in our particular pdf.↩︎</p></li>
<li id="fn5"><p>Even in this case, we are different people than the authors of the replicated work and probably have different opinions about how it relates to the world. At the very least, time has passed and the literature is in a different place!↩︎</p></li>
<li id="fn6"><p>Harder to do this on a dissertation which may not be able to be coauthored. But honestly, to a first approximation I’ve only cared about one person’s dissertation (<a href="https://github.com/fhuszar/thesis/blob/master/submitted/thesis.pdf">Ferenc Huszár’s</a>).↩︎</p></li>
<li id="fn7"><p>I’m taking this example as a jumping off point, I don’t think it’s like, “academic misconduct” to use boilerplate language, and I think his example is a very good one, because tons of people have copied that kind of language.↩︎</p></li>
</ol>
</section><section class="quarto-appendix-contents" id="quarto-reuse"><h2 class="anchored quarto-appendix-heading">Reuse</h2><div class="quarto-appendix-contents"><div><a rel="license" href="https://creativecommons.org/licenses/by/4.0/">CC BY 4.0</a></div></div></section><section class="quarto-appendix-contents" id="quarto-citation"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">BibTeX citation:</div><pre class="sourceCode code-with-copy quarto-appendix-bibtex"><code class="sourceCode bibtex">@online{dimmery2024,
  author = {Dimmery, Drew},
  title = {Plagiarism Is Bad},
  date = {2024-01-04},
  url = {https://ddimmery.com/posts/plagiarism-is-bad/},
  langid = {en}
}
</code></pre><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-dimmery2024" class="csl-entry quarto-appendix-citeas">
Dimmery, Drew. 2024. <span>“Plagiarism Is Bad.”</span> January 4, 2024.
<a href="https://ddimmery.com/posts/plagiarism-is-bad/">https://ddimmery.com/posts/plagiarism-is-bad/</a>.
</div></div></section></div> ]]></description>
  <category>metascience</category>
  <guid>https://ddimmery.com/posts/plagiarism-is-bad/</guid>
  <pubDate>Thu, 04 Jan 2024 00:00:00 GMT</pubDate>
  <media:content url="https://ddimmery.com/posts/plagiarism-is-bad/main-image.png" medium="image" type="image/png" height="144" width="144"/>
</item>
<item>
  <title>Stop looking for the next Twitter</title>
  <dc:creator>Drew Dimmery</dc:creator>
  <link>https://ddimmery.com/posts/stop-looking-for-the-next-twitter/</link>
  <description><![CDATA[ 





<p>Everyone seems to agree now that <a href="https://kevinmunger.substack.com/p/ultima-ratio-twitterum">Twitter/X is bad</a>. They’re right. But the desire to recapture some idyllic ahistorical version of Twitter is wrongheaded. The nature of the platform has serious problems that cannot be solved by a different owner, CEO or by changes in content moderation/ranking. The medium is the message, and the medium is, overall, pretty bad for a lot of what we use it for.</p>
<p>I’m going to trace through my argument around why the design of microblogging platforms are bad and then sketch out how I’m personally choosing to engage with them in light of this badness.</p>
<hr>
<section id="all-microblogging-is-fundamentally-the-same" class="level2">
<h2 class="anchored" data-anchor-id="all-microblogging-is-fundamentally-the-same">All microblogging is fundamentally the same</h2>
<p>There are differences between microblogging sites (I’m primarily thinking about Twitter/X, BlueSky, Mastodon and Threads, but I imagine anything else basically fits the same mold). Nevertheless, I think the core properties are basically the following:</p>
<ul>
<li><strong>Short text snippets</strong>. I think this is definitional. If it weren’t based around this, it wouldn’t be microblogging.</li>
<li><strong>Virality</strong>. Content gets out of the small community in which it’s posted. This is fun (at first) for the poster and fun for the viewer. The nature of a viral platform means when you see stuff from outside your community, it has already been vetted, so it is almost definitionally ‘highly engaging’ for some highly specific definition of those words.</li>
<li><strong>Simple engagement possibilities</strong>. There are very simple, easy ways to engage with the content you see. A “like” or a “reskeet” is just a click of the mouse away. You don’t have to think very hard or construct a complete and cogent thought in order to “engage” with what you see<sup>1</sup>.</li>
<li><strong>A “feed” of posts</strong>. You are deluged with an infinitely scrolling progression of more content (making you <a href="https://www.youtube.com/watch?v=i2qx5P0kQSM">infinitely content</a>, right?). There are a lot of other features that matter on the margin. Exactly how long is a post? Can you Quote-tweet / Retweet? What content is allowed / promoted? This stuff matters, but I don’t think it really gets at the core of what microblogging is. (As an aside, I think we as social scientists are much better at testing the differences between possible affordances <em>within</em> a single platform than we are at testing the big meaningful differences <em>between</em> platforms).</li>
</ul>
<p>The thing that I find really clear about this is that these affordances are not built to enable deep and thoughtful exchange of ideas. They’re built to enable surface-level engagement that you’ll come back for again and again.</p>
</section>
<section id="some-microblogging-is-different-but-not-better" class="level2">
<h2 class="anchored" data-anchor-id="some-microblogging-is-different-but-not-better">Some microblogging is different (but not better)</h2>
<p>I think Jack may have learned exactly the wrong lessons from Twitter. It seems that the AT protocol which undergirds BlueSky is <a href="https://atproto.com/guides/overview#speech-reach-and-moderation">fundamentally built</a> around the idea that <em>Jack shouldn’t have to be the person that everyone gets mad at</em>. You don’t like how your feed is ranked? Not his fault, that’s a different layer than he controls. Don’t like how content moderation works? Well, you should be on a <a href="https://blueskyweb.xyz/blog/4-13-2023-moderation">different application with different rules</a>. This goes to extremes with platforms like Mastodon, where some servers won’t do any moderation whatsoever, while others will ban you aggressively for not giving adequate content warnings. This is <em>different</em> than other microblogging services, but I don’t think there’s much of a way to say it’s <em>better</em>. For instance, <a href="https://www.theverge.com/2023/7/24/23806093/mastodon-csam-study-decentralized-network">the massive CSAM problems there</a>.</p>
<p>How this has worked in practice with AT/BlueSky is kind of different. BlueSky is the only real application built on AT, which means that Jack doesn’t just get to sit around dealing with AT issues, he has to deal with BlueSky issues, too. All of a sudden (and despite his original intentions), he’s having to manage an actual platform rather than just a protocol. This means that BlueSky is (slowly) developing content moderation standards, some of which are actually implemented at the protocol layer (e.g.&nbsp;in AT rather than in BlueSky). As I understand, CSAM, for instance, is a protocol-level concern for AT, it is not federated out to individual applications. I can’t see how this won’t continue to be a giant wellspring of conflicts.</p>
</section>
<section id="what-are-we-doing-on-here" class="level2">
<h2 class="anchored" data-anchor-id="what-are-we-doing-on-here">What are we doing On Here?</h2>
<p>There are two main things I think happen on microblogs:</p>
<ul>
<li><strong>Discovery. </strong>There’s nothing I’ve seen that’s better at delivering interesting papers and news items to the front of my face (i.e.&nbsp;“pointing at things”). Microblogging is extremely effective at making serendipitous connections. Getting pointers to things of interest from people within a hop or two of you in a network which you form according to your whims is extremely powerful!</li>
<li><strong>Jokes/entertainment</strong>. There’s a reason that dril is the greatest poster on microblogs. One-liners are a classic joke format, and they’re enjoyable to consume on a feed (perhaps especially a feed that also has a lot of self-important academics on it). What I think <em>doesn’t</em> happen, is anything very much like “discussion”. Discussion requires grace between interlocutors, which is hard to extend given the threat of virality inherent in the platform. Discussion requires context, which is destroyed with virality and its associated <a href="https://journals.sagepub.com/doi/10.1177/1461444810365313">context collapse</a>. And important discussions require depth, which is not really possible in short snippets of text. I also think that <em>privacy</em> is often crucial to good discussion<sup>2</sup>. Real privacy is anathema to virality: you can’t really have both. When discussions are ephemeral and low-cost its much easier to try out ideas to see if they seem compelling and convincing.</li>
</ul>
<p>The easiest discussions to have on microblogs are ones in which the interlocutors already agree with one another and share full context (or which are just pointing to something and saying, essentially “good” or “bad”). In this case, a few words are all it takes to explain what one means and make one’s position clear. As discussions require more nuance and rely on context that is <em>not</em> shared (i.e.&nbsp;on more complex topics), then the goal should not be to write in chunks of a few hundred characters.</p>
<p>I think that a lot of the appreciation of tweet-length communication is a reflection of a failure of academic writing. For a lot of research, the first time someone takes the time to make it broadly accessible is when a tweet-thread is written out<sup>3</sup>. I’ll bite the bullet, and freely admit that was true for me with our ICML paper about online balanced experimental design. The version we submitted (and which was accepted) was essentially just math. The paper’s structure was, essentially, introduction → math → simulations → the end. Reviewers didn’t <em>love</em> it, but they were <em>intimidated</em> by it<sup>4</sup>. We <a href="https://arxiv.org/abs/2203.02025">cleaned it up</a> for the camera-ready version (yes, that’s the easy-to-digest version), but the first time we really thought about how to communicate our results to a broader audience was <a href="https://twitter.com/DrewDim/status/1556648259787149312">when I wrote about it for Twitter</a>. That’s bad!<sup>5</sup> While one solution to this problem is to just write papers that are more accessible, the incentives are currently entirely against this. Being rigorous but difficult to understand is, often, a <em>very</em> <em>good</em> move if your primary incentive is publication.</p>
<p>Twitter, however, has incentives which at least preference some amount of accessibility. If you want your work to be shared and consumed, you have to hook people and encourage them to “engage.” In contrast to typical academicese, I think this is a nice change of pace, and is one of the reason I sometimes appreciate them. “Engagement,” however, is quite different than “accessible.” To be accessible, I think you want things like easy hyperlinks/citation, formatting, figures, quotations and, yes, more than a couple hundred characters. Maybe you also want to throw in some code or math as a treat! We can obviously do better than tweets for this.</p>
</section>
<section id="what-am-i-doing" class="level2">
<h2 class="anchored" data-anchor-id="what-am-i-doing">What am I doing?</h2>
<p>I’m treating microblogging for what it is. I’m essentially indifferent between services, as I don’t think the differences are that meaningful relative to what’s the same. I enjoy finding things on these platforms, so I will peruse them, but I will not expect discussion, and I won’t use them as if I do. Since I think threads about new work are not a particularly good way to provide digestible versions of papers, I will avoid making them. I don’t want to push for “engagement” with my work; I want people to understand it and its context. Instead, I will share new work, and (if appropriate) will link to a longer-form accessible introduction for that work in this space (or somewhere like it)<sup>6</sup>. In short, I will try to make it easy to <em>discover</em> my work on microblogs, but I will not attempt to <em>explain</em> my work in that setting. The platform is not conducive to such explanation. I may also make jokes, but, well, most of those are for the groupchats where I can be <em>spicy</em>.</p>
<p>I think this is ultimately the kind of recommendation Neil Postman makes about television in <em>Amusing Ourselves to Death, </em>too: TV is fantastic entertainment device, but <em>don’t mistake it for something that it’s not</em>. Don’t treat it as a medium of explanation or of news or of education. What it <em>does</em> is entertain.</p>
<p>Microblogging platforms are not built for good discussion. I will avoid a poor facsimile of conversation there, and have conversations in places where those conversations can be at their best. For me, this means that my <em>conversation</em> happens outside of the “clearnet”. I have a variety of WhatsApp / Slack / Discord groups for talking with collaborators or just randomers with shared interests. This is, I believe, a much more productive approach than trying to make microblogging into something that it can’t ever really do well. In these environments, you can set up affordances to enable the kind of discussion you want: you can make it easy to share LaTeX, or code. You can make spaces for highly moderated discussions or free-flowing sharing of memes. But most fundamentally, you can retain so much more control and privacy<sup>7</sup>.</p>
</section>
<section id="build-alternative-communities" class="level2">
<h2 class="anchored" data-anchor-id="build-alternative-communities">Build alternative communities!</h2>
<p>About a year ago I started up a Discord for discussion about experimentation which ultimately hasn’t seen much discussion in it. I’m continuing to hang out there, so if anyone wants somewhere outside of the clearnet to talk about experiments, <a href="https://discord.com/invite/MrxjbHc3jD">feel free to join</a>! Or come there to tell me about why this post is bad.</p>
<p>But also, use microblogs to create your own semi-private communities and make them easy to discover! The magic of the internet is that it’s all code and you can create whatever you want within those (weak) bounds. There are a ton of tools out there to create whatever kind of community you want, and you don’t need to troll around for “the next Twitter.” <strong>Twitter wasn’t ever all that good</strong>, and you can just go ahead and make the <em>better</em> community that you’d rather see! Many of those communities won’t work<sup>8</sup>, and that’s totally fine; the process of figuring out what works is how we find a space for online discussion that’s <em>better</em> than Twitter, which is what we should aspire to, anyway.</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://ddimmery.com/posts/stop-looking-for-the-next-twitter/image_1.png" class="img-fluid figure-img"></p>
<figcaption>Image</figcaption>
</figure>
</div>
<p>What AI thinks “a twitter-like website” looks like. It isn’t <em>wrong</em>.Thank you for reading Drew’s News. Now please microblog this in a long thread.</p>


</section>


<div id="quarto-appendix" class="default"><section id="footnotes" class="footnotes footnotes-end-of-document"><h2 class="anchored quarto-appendix-heading">Footnotes</h2>

<ol>
<li id="fn1"><p>Note: this is bad. My hot take is that thinking is good and we should try to do more of it and improve how we do it. The simple binary “engagement” actions constrain human behavior into radically low dimensional signals. This is basically what Kevin means when he talks about <a href="https://kevinmunger.substack.com/p/the-discourse-is-the-cybernetic-event">online platforms making us behave like machines</a>. This is not good! There are no posts for which my reaction can truly be summed up by a binary “Like” action! A RT sometimes is an endorsement, but sometimes it isn’t! None of this is expressible in the platforms!↩︎</p></li>
<li id="fn2"><p>By privacy, I mean that the ability to pretty strongly control how wide the distribution of your content is (such as by sending something in a group chat where only your wife and brother-in-law can read it). Encryption, cybersecurity and data protection are important components of privacy, but not what I’m talking about.↩︎</p></li>
<li id="fn3"><p>An analogy I like is that threads are in the style of a poetry slam (derogatory). It’s not inherently “bad”, but it is a very distinct style which focuses on having meaningful “beats” every couple dozen words or so. Good tweets in a thread often are basically glorified figures (with caption). That’s a very constraining style!↩︎</p></li>
<li id="fn4"><p>Comments were essentially “the problem seems important and there’s a lot of math which seems correct”. Some reviewers even called it easy to read, which we all had a good laugh over. The math was abstruse and we literally introduced a parameter without ever explaining why it was important!↩︎</p></li>
<li id="fn5"><p>It also hasn’t always been this way. A good example is Norbert Wiener’s Cybernetics, which is a ludicrous book in a lot of ways, but it was also not targeted solely at an in-group of very technical readers. Only some chapters are.↩︎</p></li>
<li id="fn6"><p>I think an Oped length of about 1000 words is a good one to shoot for (don’t check how long this post is, sorry), e.g.&nbsp;<a href="https://www.nowpublishers.com/article/Details/QJPS-16112">Alex’s nice paper</a>.↩︎</p></li>
<li id="fn7"><p>Control and privacy seem to cut against the democratizing impulse, but they don’t have to. You can, for example, be gracious in admitting people and harsh in throwing them out when they don’t abide by community norms. The key democratizing element isn’t in making everything public, but in giving people a chance. Being public can actually be much worse in some cases: Imagine saying something kind of dumb and going viral as a grad student (when we are all very silly). Not good!↩︎</p></li>
<li id="fn8"><p>I don’t even know what “work” means here. Just think about this community formation as expanding the groupchats you’re already having good discussions in. Is it a “failed” groupchat if you stop talking in it at some point? Obviously not! Is a coffee meeting a failure if it doesn’t result in a permanent collaboration? Spin something up and see how it goes!↩︎</p></li>
</ol>
</section><section class="quarto-appendix-contents" id="quarto-reuse"><h2 class="anchored quarto-appendix-heading">Reuse</h2><div class="quarto-appendix-contents"><div><a rel="license" href="https://creativecommons.org/licenses/by/4.0/">CC BY 4.0</a></div></div></section><section class="quarto-appendix-contents" id="quarto-citation"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">BibTeX citation:</div><pre class="sourceCode code-with-copy quarto-appendix-bibtex"><code class="sourceCode bibtex">@online{dimmery2023,
  author = {Dimmery, Drew},
  title = {Stop Looking for the Next {Twitter}},
  date = {2023-10-03},
  url = {https://ddimmery.com/posts/stop-looking-for-the-next-twitter/},
  langid = {en}
}
</code></pre><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-dimmery2023" class="csl-entry quarto-appendix-citeas">
Dimmery, Drew. 2023. <span>“Stop Looking for the Next Twitter.”</span>
October 3, 2023. <a href="https://ddimmery.com/posts/stop-looking-for-the-next-twitter/">https://ddimmery.com/posts/stop-looking-for-the-next-twitter/</a>.
</div></div></section></div> ]]></description>
  <category>social-media</category>
  <category>technology</category>
  <guid>https://ddimmery.com/posts/stop-looking-for-the-next-twitter/</guid>
  <pubDate>Tue, 03 Oct 2023 00:00:00 GMT</pubDate>
  <media:content url="https://ddimmery.com/posts/stop-looking-for-the-next-twitter/main-image.png" medium="image" type="image/png" height="100" width="144"/>
</item>
<item>
  <title>A Blueprint for the Regulation of Tech</title>
  <dc:creator>Drew Dimmery</dc:creator>
  <link>https://ddimmery.com/posts/a-blueprint-for-the-regulation-of-tech/</link>
  <description><![CDATA[ 





<p>The US2020 Facebook and Instagram Election Project (US2020 from here out), the first four papers of which were recently published in Science and Nature, can be the blueprint for meaningful regulation of large online platforms. It’s crucial not just to be satisfied in access to data, but in access to rigorous tests of how changes on platforms affect society.</p>
<p>The key connection is to avoid examining the project through a purely academic lens about academic research and the knowledge gained through the studies.</p>
<p>The <a href="https://www.science.org/content/article/does-social-media-polarize-voters-unprecedented-experiments-facebook-users-reveal">news post at Science by Kai Kupferschmidt</a> focuses on this dimension, with input from Joe Bak-Coleman’s saying that “This is not how research on the potential dangers of social media should be conducted”. He focuses on the research and the project as a model for gaining academic knowledge.</p>
<p>The <a href="https://www.science.org/doi/10.1126/science.adi2430">commentary by project rapporteur Michael Wagner</a> focused, too, on the project as largely academic in nature. Wagner gets one element of the project exactly correct from this dimension: It requires the goodwill of Meta to work correctly, which makes a difficult model for future scholarly work.</p>
<p>Meta <a href="https://about.fb.com/news/2023/07/research-social-media-impact-elections/">wants the project interpreted solely as the results of academic research based on Nick Clegg’s news post</a>: “Its findings will be hugely valuable to us, and we hope they will also help policymakers as they shape the rules of the road for the internet”. The <em>findings</em> of these studies should shape discourse on regulation, and not the <em>structure</em> of the collaboration itself. These findings are (overall) pretty positive for Meta. They show that algorithms are <em>powerful </em>(they change a lot about on-platform behavior), but they aren’t <em>scary</em> (they don’t swing elections). There are subtleties here, and I don’t mean to get into a detailed accounting of the exact substantive results, but I think this is the reading of the Project Meta wants to push to advertisers (the former story) and to regulators (the latter one).</p>
<p><a href="https://twitter.com/brandonsilverm/status/1684976994449272832?s=20">Brandon Silverman gets nearly to the quick</a>: “That’s why ultimately <em>we need regulation</em> if we want more of this sort of thing and for me, that represents one of the biggest promises of the Digital Services Act[…]” We should think about regulation as a way to free this kind of information from tech companies. But he doesn’t go nearly far enough. A model of data-sharing (proposed in the next tweet) does not go far enough, and it sets up perverse incentives for companies to neglect rigorous measurements of changes that might place them in a negative light.</p>
<p>My argument in this piece is that the structure of collaboration of US2020—sophisticated experts telling online platforms what they must measure about societal impact—should be the future of the regulation of the internet. Don’t simply accept the framing that Nick Clegg pushes, that we should take these findings and make policies based solely on them (implicitly assuming they are the only evidence we might get). The way platforms work is constantly changing: we should not seek perfect <a href="https://twitter.com/aecoppock/status/1684911539390652416?s=20">generalizability of the findings</a>, we should instead seek systems which allow us to continue to probe and measure how these platforms work flexibly, as they change.</p>
<section id="what-do-we-want-to-measure" class="level2">
<h2 class="anchored" data-anchor-id="what-do-we-want-to-measure">What do we want to measure?</h2>
<p>Many of the most important questions we have about tech’s relationship to society are counterfactuals: If Facebook were otherwise identical but ranked Feed chronologically rather than by an algorithm, would there be more or less hate? If YouTube made different choices in recommendations, would it make people less extreme? A counterfactual, fundamentally, asks what would happen were the world just a little bit different. When thinking about the regulation of technology, we’re asking questions like these: if Facebook were to operate differently in some way, would society be better or would it be worse? This embeds two questions: One is a scientific question: what would society be like if Facebook were different? The second is a question of values: Would that counterfactual society be better or worse?&nbsp;</p>
<p>The scientific question can be answered through randomized control trials (RCTs). Such RCTs, however, can only truly be implemented by and within these large tech platforms, so under the status quo the question of values can only be answered by people within those walls. Engineers, data scientists, managers and executives of tech companies: mostly good hearted people, but people who work within institutions that require they think about the good of the platform rather than the good of society.</p>
<p>I worked within Meta for years on improving the process through which the company evaluated potential counterfactual versions of itself – in other words, testing changes through A/B tests. The entire business of software development is to try lots of stuff out but only keep the things that work towards the business’s goals. These goals are expressed in numbers (“KPIs” or “metrics”). <a href="https://ax.dev/">My team built machine learning tools</a> to realize these business goals. When business values change, so too, can the platform. Look no further than <a href="https://about.fb.com/news/2018/01/news-feed-fyi-bringing-people-closer-together/">Facebook’s 2018 pivot to “meaningful social interactions”</a>. Once new numbers are chosen as the target, machine learning gears turn and the platform (in many ways a black box to everyone) pivots on a dime to optimize these new metrics.</p>
<p>Many of these business values are widely held societal norms: nobody in tech wants more spam or child sexual abuse material, for instance. As such, companies develop sophisticated (albeit <a href="https://www.washingtonpost.com/technology/2023/06/07/meta-instagram-child-porn/">imperfect</a>) systems for detecting and removing such content and ensuring that it is not distributed widely. Other metrics, like the time spent on a platform, are less clear cut: essentially all internet platforms prize this as a KPI, but it does not necessarily align with what is best for society.</p>
<p>Democratic societies have systems for answering questions of values: we elect representatives who enact policies so that society embodies the values we care about. When such values are contentious, there is conflict, of course, but there is a peaceful process for resolving that conflict through our institutions. Tech lacks the mechanism to ensure that it embodies societal values: even market pressure isn’t straightforward, since usage is free thanks to advertising.</p>
</section>
<section id="who-gets-to-measure-them" class="level2">
<h2 class="anchored" data-anchor-id="who-gets-to-measure-them">Who gets to measure them?</h2>
<p>The only real counterfactual evidence about tech’s effects is controlled by tech itself. This is the promise of US2020, a project I worked on while at Meta. The unique power of this project is that it reveals the counterfactuals which—but for the press of a button—Facebook can decide to make reality. Asking ‘Should Facebook remove algorithmic ranking?’ contains both a question of science and a question of values – and this project answers the question of science, <a href="https://doi.org/%20doi:10.1126/science.abp9364">showing precisely what would happen if Facebook did no algorithmic ranking on Feed</a>. As a result, the question of values is finally exposed to democratic scrutiny. Do we measure the right things to effectively answer that question? The way forward can only be an iterative refinement of asking the question and trying to improve the answers we get.</p>
<p>In short, the Election Project is a blueprint for the effective regulation of technology companies. The approach is straightforward and consistent with the language of <a href="https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX:32022R2065&amp;qid=1666857835014#d1e3513-1-1">Chapter III, Section 5 of the Digital Services Act</a>. Regulators can require that for important, societally relevant changes, Very Large Online Platforms must run and report the results of RCTs. These are far more informative than audits of the <em>processes</em> used in these platforms or providing data access without counterfactuals, or trying to explain the precise details of how underlying machine learning systems work (which are mostly black boxes to everyone internally, too). To reiterate: in general, many of the most important systems at VLOPs are not truly “understood” by those who create them. Rather, they are refined by measuring what they do, and launching the changes that push those measurements in a direction that those measurements deem positive.</p>
<p>Philipp Lorenz-Spreen has already compellingly argued that <a href="https://reclaimingautonomyonline.notion.site/reclaimingautonomyonline/Researcher-access-to-platform-data-under-the-DSA-Questions-and-answers-8f7390f3ae6b4aa7ad53d53158ed257c#82d252b2543f43cbb35489209884a230">Article 40 of the DSA is a way to free the results of internal A/B tests to the public</a>. Tech companies are already running lots of A/B tests, so maybe we can just require them to share those results! This is a very positive step, but it isn’t enough. The problem with limiting ourselves to these tests is that platforms can simply stop doing them on issues they feel present a regulation risk. We’re back in the same place we started where our ability to understand counterfactuals is dependent on the platforms’ goodwill.</p>
<p>Platforms already limit what is measured internally for this reason. A direct example of this is race in the US. Put simply, Meta works hard to limit what it knows about race in order to preclude legal risks like <a href="https://www.nytimes.com/2019/03/28/us/politics/facebook-housing-discrimination.html">it faced around housing discrimination in 2019</a>. When I left Meta, the primary way concerns about disparate impact were examined were through zip-code level demographics. This process ensures that individual users’ race was not <em>explicitly</em> inferred, so race cannot be <em>explicitly</em> taken into account in decisions. To be clear, this does not mean that systems do not have <a href="https://www.justice.gov/crt/fcs/T6Manual7">disparate racial impacts</a>. It does, however, make it very hard to measure when those disparate impacts might exist.</p>
<p>There are also precedents for requirements to run RCTs, such as with the <a href="https://assets.publishing.service.gov.uk/media/6363b00de90e0705a8c3544d/CMA_Experiments_note.pdf">UK’s Competition and Markets Authority requirement that Google test changes to third-party cookies</a>. The logic here was that Google’s change to how cookies work might lead to big changes in how the advertising market works. Without understanding that impact, the regulator can not make an informed decision about whether it was anti-competitive or not. Hence, require Google to measure the impact, since they’re the only people who can.</p>
<p>More traditionally, this is key to the FDA’s regulation of <a href="https://www.fda.gov/patients/drug-development-process/step-3-clinical-research">food and medicine</a>. The regulator is not generally performing the clinical trials to demonstrate safety itself: it instead requires that the would-be producer carry out the required tests, and ensures that those tests meet the standards of scientific rigor necessary to ensure safety. We think it’s important to verify that drugs meet at least a minimal standard before pushing them out broadly to all citizens. The same should be true for large changes to internet platforms. Note that the structure of clinical trials under the FDA would be exactly subject to the same critiques that are being made of US2020.&nbsp;</p>
<p>The point is not who is actually running the test, but who is defining what particular tests must be conducted and how resulting outcomes must be measured. As an insider to this process, let me be very clear: US2020 took <em>incredible</em> care that external academics understood in detail the process by which concepts were operationalized and measured. A lot of the back-and-forth in the collaboration was specifically about this: academics didn’t understand how data is collected and stored internally, and internal researchers needed clear operationalizations to measure the concepts requested. I think I saw someone throw out that there were around 1000 variables to figure this out for.</p>
<p>The DSA provides an avenue for this kind of accountability through counterfactual evidence, but it isn’t through Article 40. Instead, it’s through Article 34. Allow me to quote:</p>
<blockquote class="blockquote">
<p>Providers of very large online platforms and of very large online search engines shall diligently identify, analyse and assess any systemic risks in the Union stemming from the design or functioning of their service and its related systems, including algorithmic systems, or from the use made of their services.</p>
</blockquote>
<p>By suggesting that these risks arise from the design or functioning of the service, this sentence invokes counterfactuals. The question at play is whether platform design <em>causes</em> these risks. The only valid way to assess this is through counterfactuals as measured through randomized control trials. The counterfactual risk is what VLOPs should demonstrate in their risk assessments. I recognize that this plain-english reading is not formal legal analysis (and I am not a lawyer). Given the latitude of regulators to choose how this law should be interpreted, I hope that they do so in the way that will give them actual answers to the societally important questions of counterfactual risk. US2020 shows them how they can do that. Unfortunately, based on the <a href="https://op.europa.eu/en/publication-detail/-/publication/c1d645d0-42f5-11ee-a8b8-01aa75ed71a1/language-de">Commission’s current example of how to apply the risk management framework</a>, this is not likely to be the approach taken to interpreting the DSA.</p>
</section>
<section id="when-are-they-measured" class="level2">
<h2 class="anchored" data-anchor-id="when-are-they-measured">When are they measured?</h2>
<p>Time is a crucial element of this story. The number one complaint by everyone involved in US2020 is the time it took to make it happen. As <a href="https://kevinmunger.substack.com/p/mark-zuckerberg-wants-you-to-think">Kevin Munger emphasizes, it was 32 months after the US2020 studies were actually performed that they were published</a>. Part of that delay was because of peer review (note that the chronological feed paper was originally received by Science on March 7, 2022, but was only published on July 27, 2023. The paper was definitely improved in that time, but it goes without saying that nothing about the intervention or outcome measurement could have been improved or changed as a result of this peer review. It had already happened.</p>
<p>Surely nothing changed about Facebook in that time, right? Well, shortly after submission of chrono-feed, on July 27, 2022, Mark Zuckerberg made an announcement on the Q2 earnings call: “<a href="https://investor.fb.com/investor-events/event-details/2022/Q2-2022-Earnings/default.aspx">Right now, about 15% of content in a person’s Facebook feed and a little more than that of their Instagram feed is recommended by our AI from people, groups, or accounts that you don’t follow. We expect these numbers to more than double by the end of next year.</a>” Shit. It’s at this point that I’ll note we found clear heterogeneity of the effects of chronological feed on variables like how much uncivil content and slurs show up on Feed, as well as on-platform political behavior on the dimension of inventory size. When there’s more content Feed could show you, there’s more potential for Feed algorithms to change what you see.</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://ddimmery.com/posts/a-blueprint-for-the-regulation-of-tech/image_1.png" class="img-fluid figure-img"></p>
<figcaption>Image</figcaption>
</figure>
</div>
<p>Larger inventory is associated with larger effects on on-platform political behavior.Kevin and I have a <a href="https://osf.io/w38ye">new working paper</a> showing exactly how poorly agnostic approaches to generalizability fare when reality refuses to stay fixed. When reality is changing at internet speed, then it’s necessary that our knowledge generation process matches that speed. US2020 could do better on this. A lot of work had to be done to mitigate the risks that were inherent in the project as the first of its kind. There was a whole software infrastructure that had to be built from the ground up for protecting privacy, which was done at a pretty incredible speed. There was also a lot of work to build up the internal processes to make everything work both within the collaboration and within Meta (e.g.&nbsp;legal and privacy reviews to make sure user data was appropriately protected). And as stated above, there was a whole arduous process of defining meaningful operationalizations of everything: to a real extent, the scientific language of internal and external researchers wasn’t exactly the same, so translation was required.</p>
<p>If we’re thinking about an ongoing regulatory process, however, a lot of these concerns go away: an infrastructure and set of systems will be set up (by necessity) to streamline processes. Variable definitions will be negotiated over time, so that’s largely just a fixed cost per variable. If the output is regulatory rather than academic, the results wouldn’t need to be gated by the fickle peer review process (just government bureaucracy!).</p>
<p>Defining appropriate implementation of the regulations, too, must be a negotiated and long-term process. Meta has spent years defining specific internal metrics to understand the best dimensions of platform behavior to measure for business purposes. We cannot expect US2020 to have gotten the right societally important dimensions correct on the first try. DSA requires platforms to measure risks “on civic discourse and electoral processes, and public security”. How should these be operationalized? This can only be an iterative process as we see what kinds of measurements are both informative about what we care about in the world and sensitive to the kinds of changes VLOPs make. Iteration on the US2020 model is how to begin this process of refinement.</p>
</section>
<section id="wrapping-it-up" class="level2">
<h2 class="anchored" data-anchor-id="wrapping-it-up">Wrapping it up</h2>
<p>If we want to truly understand a world where tech is different in some way, we must change it in that way and measure what happens as a result. This is how Meta ensures Feed aligns with its business values, and this is the only way to ensure Feed – alongside all other aspects of online platforms – aligns with our societal values, as well. This may require platforms to run RCTs they would not have already run: we may imagine changes that they have not previously tested. At the very least, it will almost certainly be necessary to require that they measure outcomes they might not otherwise measure. In the US, for instance, this might mean requiring that they collect data on race that they are otherwise reluctant to hold.</p>
<p>US2020 has many of the features that are necessary for this kind of counterfactual accountability. It provides rigorous evidence of how Meta’s products would work if we were to change them. The RCTs were designed by external academics with full control rights, whose incentive is not to improve Meta’s bottom-line, but to expose answers to societally relevant questions. Meta had no substantial freedom to say no to such requests. It was not fast enough, nor was it large enough scale (e.g.&nbsp;it was only in the US). Regulators don’t currently have the expertise in-house to do the work that these academics did in asking good questions and co-designing studies to answer them. This is an important problem to solve, but other fields (like the FDA) have shown that such expertise can be acquired by regulatory agencies.</p>
<p>The biggest problem with US2020 is that it <a href="https://www.science.org/doi/full/10.1126/science.adi2430">only exists out of Meta’s goodwill</a>. And given how expensive it was for Meta, it’s not clear how much of that there is left. Look through the author list for the Meta employees – most worked primarily on this project for the last 3 years, and lots of other engineers and data scientists contributed substantial time as well. These people are all superstars and just their direct compensation over that time is a huge commitment from Meta: easily millions of dollars, likely tens of millions. Given recent belt-tightening, I wouldn’t count on Meta independently finding this a good tradeoff solely in order to be transparent. If we want to guarantee this kind of transparency from internet platforms, then we shouldn’t just hope that they will keep choosing to bring us rigorous counterfactual evaluation. Government should require it.</p>
<p>Think other people should read this? Feel free to share:</p>


</section>

<div id="quarto-appendix" class="default"><section class="quarto-appendix-contents" id="quarto-reuse"><h2 class="anchored quarto-appendix-heading">Reuse</h2><div class="quarto-appendix-contents"><div><a rel="license" href="https://creativecommons.org/licenses/by/4.0/">CC BY 4.0</a></div></div></section><section class="quarto-appendix-contents" id="quarto-citation"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">BibTeX citation:</div><pre class="sourceCode code-with-copy quarto-appendix-bibtex"><code class="sourceCode bibtex">@online{dimmery2023,
  author = {Dimmery, Drew},
  title = {A {Blueprint} for the {Regulation} of {Tech}},
  date = {2023-09-11},
  url = {https://ddimmery.com/posts/a-blueprint-for-the-regulation-of-tech/},
  langid = {en}
}
</code></pre><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-dimmery2023" class="csl-entry quarto-appendix-citeas">
Dimmery, Drew. 2023. <span>“A Blueprint for the Regulation of
Tech.”</span> September 11, 2023. <a href="https://ddimmery.com/posts/a-blueprint-for-the-regulation-of-tech/">https://ddimmery.com/posts/a-blueprint-for-the-regulation-of-tech/</a>.
</div></div></section></div> ]]></description>
  <category>technology</category>
  <category>metascience</category>
  <guid>https://ddimmery.com/posts/a-blueprint-for-the-regulation-of-tech/</guid>
  <pubDate>Mon, 11 Sep 2023 00:00:00 GMT</pubDate>
  <media:content url="https://ddimmery.com/posts/a-blueprint-for-the-regulation-of-tech/main-image.png" medium="image" type="image/png" height="95" width="144"/>
</item>
<item>
  <title>Quarto for an Academic Website</title>
  <dc:creator>Drew Dimmery</dc:creator>
  <link>https://ddimmery.com/posts/quarto-website/</link>
  <description><![CDATA[ 





<section id="intro" class="level1">
<h1>Intro</h1>
<p>I’ve never been good at keeping my website updated. I always go through two different phases of maintenance:</p>
<ol type="1">
<li>Rushing around creating a new website with bells and whistles using whatever the flavor of the month is</li>
<li>Never updating an existing website</li>
</ol>
<p>I’m hoping to break out of this cycle, but am currently solidly within Phase 1.</p>
<p><img src="https://ddimmery.com/posts/quarto-website/tobias-meme.jpg" class="image-fluid mx-auto d-block img-fluid"></p>
<p>A highlight from my time in Phase 2 was when I forgot to update my DNS and I totally lost control of <code>drewdimmery.com</code> (don’t go there, it has a squatter). I think my website at that time was some Octopress monstrosity. There are a few reasons I think <a href="https://quarto.org/">Quarto</a> might help with my vicious circle.</p>
<ul>
<li>Serving static HTML pages is about as easy as it gets</li>
<li>Very little Quarto-specific syntax to recall (e.g.&nbsp;CLI commands or abstruse markup)</li>
<li>Lots of flexibility (Python / R) in how to generate that static content</li>
<li>Full programmability means that generation can be based on arbitrary data structures of my choosing</li>
</ul>
<p>I previously used Hugo Academic for building my website, which was much better than just editing the content directly, but I never remembered the right way to generate a new publication definition (there was a CLI, but I never remembered the syntax). Each publication got its own file describing its details, and I found this quite clunky. I wanted something extremely lightweight: there isn’t much reason for my individual publications to get pages of their own, and I really don’t need a lot of information on each of them. I just want some basic information about each and a set of appropriate links to more details.</p>
<p>This post will detail how I’ve set up Quarto to accomplish this task. I’ve nearly completely separated the two main concerns around maintaining an academic website / CV, which to me are data on <em>publications</em> and <em>software</em> from the design elements of how to display them. It’s entirely possible that my particular issues are unique and this post won’t be useful to anyone else. Luckily, the marginal cost of words on the internet is essentially zero (and maybe the marginal value is, too).</p>
</section>
<section id="setup" class="level1">
<h1>Setup</h1>
<p>Setting up Quarto was very easy, so I won’t belabor this. The combination of the <a href="https://quarto.org/docs/get-started/">Get Started guide</a> with the <a href="https://quarto.org/docs/websites/">Website Creation guide</a> kept everything very straightforward. I also used <a href="https://blog.djnavarro.net/posts/2022-04-20_porting-to-quarto/">Danielle Navarro’s post</a> and <a href="https://github.com/djnavarro/quarto-blog">her blog’s code</a> to get everything set up.</p>
<p>I decided late in the setup process to add a blog, so I will mention that it’s actually very easy to do: it basically just requires adding a <a href="https://quarto.org/docs/websites/website-listings.html">Listing page</a> (i.e.&nbsp;the blog’s index), a folder to contain the various posts and a <code>_metadata.yml</code> file in that folder to describe global settings to apply to all posts. I just created these manually without too much trouble. This is one of the great things about building sites with tools like Quarto: everything is extremely transparent: just put a couple files in the right places and you’re good to go.</p>
</section>
<section id="site-design" class="level1">
<h1>Site Design</h1>
<p>To demonstrate how I’ve set things up to populate the website from data about my academic life, I’ll focus on my <a href="../../research.html">publications</a> page. There are two main files undergirding this page:</p>
<dl>
<dt><code>papers.yaml</code></dt>
<dd>
a data file in YAML with standardized information on each publication. I chose YAML because it’s fairly easy to write correctly formatted YAML by hand (and I’ll be updating)
</dd>
<dt><code>research.qmd</code></dt>
<dd>
The page which takes the data in <code>papers.yaml</code> and turns it into nicely formatted Markdown / HTML. This is setup as a Jupyter-backed <code>qmd</code> file (essentially a Jupyter notebook).
</dd>
</dl>
<p>This idea of separating the data side (information about publications) from formatting is aimed at making my life easier. One of the reasons I often stop updating my website is because when I come back in 3 months with a new publication, I never remember all the details about how I formatted entries in whatever flavor of Bootstrap I happened to be using when I built the website. Moreover, because I know that there’s a barrier to understanding before I can get started, it’s extremely easy to put off (and therefore it never gets done).</p>
<p>By separating out the data entry from the formatting, this simplifies matters substantially.</p>
<section id="data" class="level2">
<h2 class="anchored" data-anchor-id="data">Data</h2>
<p>I put data about each publication in a basic YAML format:</p>
<details>
<summary>
See example data
</summary>
<pre class="{yaml}"><code>softblock:
  title: Efficient Balanced Treatment Assignments for Experimentation
  authors:
    - David Arbour
    - me
    - Anup Rao
  year: 2021
  venue: AISTATS
  preprint: https://arxiv.org/abs/2010.11332
  published_url: https://proceedings.mlr.press/v130/arbour21a.html
  github: https://github.com/ddimmery/softblock</code></pre>
</details>
<p>This is basically like a simplified bibtex entry with more URLs so I can annotate where to find replication materials for a given paper, as well as distinguish between preprints (always freely accessible) versus published versions (not always open access). A convenience that I add in the markup here is referring to myself as <code>me</code> in the author list (which is an ordered list). This allows me to add in extra post-processing to highlight where I sit in the author list.</p>
<p>Some additional things I considered adding but chose to ignore for a first version:</p>
<ul>
<li>An abstract</li>
<li>A suggested bibtex entry</li>
</ul>
<p>Both of these would be easy to add, but I chose to start simpler. I don’t love YAML for entering long blocks of text, which both of these are.</p>
</section>
<section id="formatting" class="level2">
<h2 class="anchored" data-anchor-id="formatting">Formatting</h2>
<p>Since I can write the generation logic for page in Python, this puts me on comfortable ground to hack something together. To knit the above publication data into HTML, I just literally bind together the programmatically generated raw HTML and print it onto the page.</p>
<p>I do a couple additional useful things in this process: - Separate out working papers or non-archival papers from published work (I make this distinction based on whether I include a <code>published_url</code> field or not). - Order and categorize papers by year - Provide nice Bootstrappy buttons for external links (e.g.&nbsp;to Preprints / Code / etc)</p>
<details>
<summary>
See <code>research.qmd</code> fragment
</summary>
<pre class="{python}"><code>import yaml
from IPython.display import display, Markdown, HTML

def readable_list(_s):
  if len(_s) &lt; 3:
    return ' and '.join(map(str, _s))
  *a, b = _s
  return f"{', '.join(map(str, a))}, and {b}"

def button(url, str, icon):
    icon_base = icon[:2]
    return f"""&lt;a class="btn btn-outline-dark btn-sm", href="{url}" target="_blank" rel="noopener noreferrer"&gt;
        &lt;i class="{icon_base} {icon}" role='img' aria-label='{str}'&gt;&lt;/i&gt;
        {str}
    &lt;/a&gt;"""

yaml_data = yaml.safe_load(open("papers.yaml"))
pub_strs = {"pubs": {}, "wps": {}}
for _, data in yaml_data.items():
    title_str = data["title"]
    authors = data.get("authors", ["me"])
    authors = [
        aut if aut != "me" else "&lt;strong&gt;Drew Dimmery&lt;/strong&gt;" for aut in authors
    ]
    author_str = readable_list(authors)
    year_str = data["year"]

    buttons = []
    preprint = data.get("preprint")
    if preprint is not None:
        buttons.append(button(preprint, "Preprint", "bi-file-earmark-pdf"))

    github = data.get("github")
    if github is not None:
        buttons.append(button(github, "Github", "bi-github"))

    pub_url = data.get("published_url")
    venue = data.get("venue")
    working_paper = pub_url is None
    
    pub_str = f'{author_str}. ({year_str}) "{title_str}."'

    if venue is not None:
        pub_str += f" &lt;em&gt;{venue}&lt;/em&gt;"

    if working_paper:
        if year_str not in pub_strs["wps"]:
            pub_strs["wps"][year_str] = []
        pub_strs["wps"][year_str].append(
            "&lt;li class='list-group-item'&gt;" + pub_str + "&lt;br&gt;" + " ".join(buttons) + "&lt;/li&gt;"
        )
    else:
        if year_str not in pub_strs["pubs"]:
            pub_strs["pubs"][year_str] = []
        buttons.append(button(pub_url, "Published", "ai-archive"))
        pub_strs["pubs"][year_str].append(
            "&lt;li class='list-group-item'&gt;" + pub_str + "&lt;br&gt;" + " ".join(buttons) + "&lt;/li&gt;"
        )</code></pre>
</details>
<p>I then print this out using the <code>display</code> functions from the IPython module and using the <code>asis</code> chunk option:</p>
<details>
<summary>
See <code>research.qmd</code> fragment
</summary>
<pre class="{python}"><code>for year in sorted(pub_strs["pubs"].keys(), reverse=True):
    display(Markdown(f"### {year}" + "{#" + f"published-{year}" + "}"))
    display(HTML(
        "&lt;ul class='list-group list-group-flush'&gt;" + '\n'.join(pub_strs["pubs"][year]) + "&lt;/ul&gt;"
    ))</code></pre>
</details>
<p>The <a href="https://github.com/ddimmery/quarto-website/blob/main/research.qmd">full code is on GitHub</a>.</p>
<p>It’s worth noting that to get the years to show up in the Table of Contents its necessary to be careful exactly how the content is stuck onto the page. If you don’t use the <code>asis</code> chunk option, you can still get all the right content to show up, but it won’t necessarily appear in the ToC. I also found it necessary to include <code>section-divs: false</code> in the header, or else the output would get wrapped in additional <code>div</code> tags which made it harder to get the right classes in the right divs. There are probably more elegant ways to do all of this.</p>
<p>I use the same basic setup to populate the <a href="../../software.html">Software page</a>, albeit with simpler logic.</p>
<section id="additions" class="level3">
<h3 class="anchored" data-anchor-id="additions">Additions</h3>
<p>I debated adding an abstract that expands out on click (like the code folding above in this post). This would actually be more or less trivial to add using a <code>&lt;details&gt;</code> HTML tag if I wanted to provide the data in the YAML. I’m ignoring this for now because I want to minimize data entry for my future self (and it’s anyway just a click away at the Preprint link).</p>
</section>
</section>
</section>
<section id="deployment" class="level1">
<h1>Deployment</h1>
<p>It’s extremely easy to build a new version of the website locally (<code>quarto render</code> from CLI), but there’s no guarantee I’ll remember that off the top of my head in a month without Googling, so I think it’s worthwhile to setup automatic building after I push a commit to GitHub.</p>
<p>GitHub Actions is incredible. I adapted the <a href="https://github.com/quarto-dev/quarto-actions/blob/main/examples/quarto-book-netlify.yaml">example config from Quarto</a> to the following (also <a href="https://github.com/ddimmery/quarto-website/blob/main/.github/workflows/build.yml">on GitHub here</a>):</p>
<details>
<summary>
GitHub Actions for Netlify
</summary>
<pre class="{yaml}"><code>on:
  push:
    branches: main
  pull_request:
    branches: main
  # to be able to trigger a manual build
  workflow_dispatch:

name: Render and deploy website to Netlify

jobs:
  build-deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      - uses: actions/setup-python@v3
        with:
          python-version: '3.9'
          cache: 'pip'
      - run: pip install -r requirements.txt

      - uses: r-lib/actions/setup-r@v2
        with:
          use-public-rspm: true

      - uses: r-lib/actions/setup-renv@v2
      
      - name: Install Quarto
        uses: quarto-dev/quarto-actions/install-quarto@v1
        with:
          # To install LaTeX to build PDF book 
          tinytex: true 
          # uncomment below and fill to pin a version
          # version: 0.9.105

      - name: Render website
        # Add any command line argument needed
        run: |
          quarto render
      - name: Deploy to Netlify
        id: netlify-deploy
        uses: nwtgck/actions-netlify@v1
        with:
          # The folder the action should deploy. Adapt if you changed in Quarto config
          publish-dir: './_site'
          production-branch: main
          github-token: ${{ secrets.GITHUB_TOKEN }}
          deploy-message:
            'Deploy from GHA: ${{ github.event.pull_request.title || github.event.head_commit.message }} (${{ github.sha }})'
          enable-pull-request-comment: true #  Comment on pull request
          enable-commit-comment: true # Comment on GitHub commit
          enable-commit-status: true # GitHub commit status 
        env:
          NETLIFY_AUTH_TOKEN: ${{ secrets.NETLIFY_AUTH_TOKEN }}
          NETLIFY_SITE_ID: ${{ secrets.NETLIFY_SITE_ID }}
        timeout-minutes: 1</code></pre>
</details>
<p>This Action requires two pieces of information from Netlify entered as secrets in GitHub. The <code>NETLIFY_SITE_ID</code> may be found in the site configuration settings, while the <code>NETLIFY_AUTH_TOKEN</code> may be found in personal settings (the personal access token).</p>
<p>One thing I have not yet done is set up an <a href="https://rstudio.github.io/renv/index.html">renv</a> to ensure dependencies for blog posts are taken care of in GitHub Actions. This means that posts like the <a href="../../posts/softblock-demo/">experimental design demo</a> can’t be knit via GitHub Actions. I did this for two reasons (other than laziness). First, it’s a pain to get GIS tools working on any environment (ok, so its <em>part</em> laziness). I’ve actually done this before for automated <code>R CMD check</code>ing of the <a href="https://github.com/ddimmery/regweight/blob/main/.github/workflows/check-full.yaml"><code>regweight</code> package</a>, but didn’t feel like it was worthwhile here.</p>
<p>The reason it’s not worth it is that Quarto has a <a href="https://quarto.org/docs/projects/code-execution.html#freeze">great feature called “freezing”</a>. Essentially, it knits blog posts or pages, and only re-renders them when something about the source changes. This means that the vast majority of posts don’t need to be rendered on each build. If I’m working on a blog post, I can write it locally, render on my machine, commit that pre-rendered post and then all future builds on Actions won’t get held up by their inability to render that post.</p>
<p>As I type this, it becomes clear that I’ll forget how to do this pretty often (given that there’s been about an 8 year delay since my next most recent blog, I likely won’t stay in practice). But blogs aren’t my main concern on my website: keeping a software and publication list up-to-date is.</p>
<p>Setting up Actions means that simple updates to pages (or YAML files) can actually be done directly in the GitHub editing UI, which further lowers the barrier for my future self. I don’t even need to clone the repository to whatever computer I’m working on to add a publication!</p>
</section>
<section id="future-dreams" class="level1">
<h1>Future dreams</h1>
<p>I imagine my CV is similar to most academics’ in that it’s built like a house of cards (and overfull hboxs). Whenever I add something new to it, I have to copy some lines from elsewhere and modify them to fit the new entry. This always takes me way more time than I’d like. If I mashed together my current <a href="../../about.html">About page</a> with the <a href="../../research.html">Research page</a>, it’s like 90% of the way to a full CV. It should presumably be pretty easy to do explicitly combine them and output a reasonable-looking CV.</p>
<p>This is a project for another day, though. Too much of the Research page directly outputs HTML, which makes it difficult to naïvely import into a <img src="https://latex.codecogs.com/png.latex?%5CLaTeX"> CV.</p>
<p>An almost completely naïve approach to directly importing the relevant pages creates <a href="cv.pdf">this ugly document</a>.</p>
<details>
<summary>
Naïve CV
</summary>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode markdown code-with-copy"><code class="sourceCode markdown"><span id="cb5-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">---</span></span>
<span id="cb5-2"><span class="an" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">title:</span><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"> "Curriculum Vitae"</span></span>
<span id="cb5-3"><span class="an" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">format:</span><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"> pdf</span></span>
<span id="cb5-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">---</span></span>
<span id="cb5-5"></span>
<span id="cb5-6">{{&lt; include about.qmd &gt;}}</span>
<span id="cb5-7"></span>
<span id="cb5-8">{{&lt; include research.md &gt;}}</span></code></pre></div></div>
</details>
<p>It’s definitely possible to improve on this. The easiest hacky approach is to just write a whole alternative version of the HTML formatting code which resides in <code>research.qmd</code> to output appropriately formatted <img src="https://latex.codecogs.com/png.latex?%5CLaTeX"> markup.</p>
<p>For now, I’m pretty pleased with the system I have, but ask me again in three months.</p>


</section>

<div id="quarto-appendix" class="default"><section class="quarto-appendix-contents" id="quarto-reuse"><h2 class="anchored quarto-appendix-heading">Reuse</h2><div class="quarto-appendix-contents"><div><a rel="license" href="https://creativecommons.org/licenses/by/4.0/">CC BY 4.0</a></div></div></section><section class="quarto-appendix-contents" id="quarto-citation"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">BibTeX citation:</div><pre class="sourceCode code-with-copy quarto-appendix-bibtex"><code class="sourceCode bibtex">@online{dimmery2022,
  author = {Dimmery, Drew},
  title = {Quarto for an {Academic} {Website}},
  date = {2022-05-11},
  url = {https://ddimmery.com/posts/quarto-website/},
  langid = {en}
}
</code></pre><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-dimmery2022" class="csl-entry quarto-appendix-citeas">
Dimmery, Drew. 2022. <span>“Quarto for an Academic Website.”</span> May
11, 2022. <a href="https://ddimmery.com/posts/quarto-website/">https://ddimmery.com/posts/quarto-website/</a>.
</div></div></section></div> ]]></description>
  <category>website</category>
  <guid>https://ddimmery.com/posts/quarto-website/</guid>
  <pubDate>Wed, 11 May 2022 00:00:00 GMT</pubDate>
  <media:content url="https://ddimmery.com/posts/quarto-website/main-image.png" medium="image" type="image/png" height="216" width="144"/>
</item>
<item>
  <title>Using SoftBlock to Design an Experiment</title>
  <dc:creator>Drew Dimmery</dc:creator>
  <link>https://ddimmery.com/posts/softblock-demo/</link>
  <description><![CDATA[ 





<section id="introduction" class="level1">
<h1>Introduction</h1>
<p>In particular, I’m going to imagine that I’m designing an experiment in which I assign different treatments to particular precincts in North Carolina. In order to optimize power, of course, we want to make sure that our two test groups look as similar as possible in terms of prior voting patterns.</p>
<p>Thus, the steps in this design will be:</p>
<ol type="1">
<li>Collect relevant historical data.</li>
<li>Define variables on which we wish to balance.</li>
<li>Allocate treatment assignment using new methods.</li>
<li>Simulate the power of hypothesis tests under the proposed design.</li>
<li>Fake some outcome data and analyze it for average and heterogeneous treatment effects.</li>
</ol>
</section>
<section id="implementation-of-methods" class="level1">
<h1>Implementation of methods</h1>
<section id="description" class="level2">
<h2 class="anchored" data-anchor-id="description">Description</h2>
<p>The relevant API is a function with <code>tidyverse</code> semantics called <code>assign_softblock</code> (or <code>assign_greedy_neighbors</code>). These functions accept a vector of columns to be used in the design. The SoftBlock version additionally accepts two arguments, <code>.s2</code> for the bandwidth of the RBF kernel to use in the construction of a similarity matrix as well as <code>.neighbors</code> which indicates the number of nearest neighbors to include in the graph on which to construct the spanning tree. These parameters don’t generally need to be modified.</p>
</section>
<section id="source-code" class="level2">
<h2 class="anchored" data-anchor-id="source-code">Source Code</h2>
<details>
<summary>
See source code
</summary>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">writeLines</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">readLines</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"https://raw.githubusercontent.com/ddimmery/softblock/master/r_implementation.R"</span>))</span></code></pre></div></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code>library(Matrix)
library(igraph)
library(FNN)
library(hash)

assign_greedily &lt;- function(graph) {
    adj_mat = igraph::as_adjacency_matrix(graph, type="both", sparse=TRUE) != 0
    N = nrow(adj_mat)
    root_id = make.keys(sample(N, 1))

    a = rbinom(1, 1, 0.5)
    visited = hash()

    random_order = make.keys(sample(N))
    unvisited = hash(random_order, random_order)

    colors = hash()
    stack = hash()

    stack[[root_id]] &lt;- a
    tentative_color = rbinom(N, 1, 0.5)
    while ((!is.empty(unvisited)) || (!is.empty(stack))) {
        if (is.empty(stack)) {
            cur_node = keys(unvisited)[1]
            del(cur_node, unvisited)
            color = tentative_color[as.integer(cur_node)]
        } else {
            cur_node = keys(stack)[1]
            color = stack[[cur_node]]
            del(cur_node, stack)
            del(cur_node, unvisited)
        }
        visited[[cur_node]] = cur_node
        colors[[cur_node]] = color
        children = make.keys(which(adj_mat[as.integer(cur_node), ]))
        for (child in children) {
            if(has.key(child, unvisited)) {
                stack[[child]] = 1 - color
            }
        }
    }
    values(colors, keys=1:N)
}


assign_softblock &lt;- function(.data, cols, .s2=2, .neighbors=6) {
    expr &lt;- rlang::enquo(cols)
    pos &lt;- tidyselect::eval_select(expr, data = .data)
    df_cov &lt;- rlang::set_names(.data[pos], names(pos))
    cov_mat = scale(model.matrix(~.+0, df_cov))
    N = nrow(cov_mat)
    st = lubridate::now()
    knn = FNN::get.knn(cov_mat, k=.neighbors)
    st = lubridate::now()
    knn.adj = Matrix::sparseMatrix(i=rep(1:N, .neighbors), j=c(knn$nn.index), x=exp(-c(knn$nn.dist) / .s2))
    knn.graph &lt;- graph_from_adjacency_matrix(knn.adj, mode="plus", weighted=TRUE, diag=FALSE)
    E(knn.graph)$weight &lt;- (-1 * E(knn.graph)$weight)
    st = lubridate::now()
    mst.graph = igraph::mst(knn.graph)
    E(mst.graph)$weight &lt;- (-1 * E(mst.graph)$weight)
    st = lubridate::now()
    assignments &lt;- assign_greedily(mst.graph)
    .data$treatment &lt;- assignments
    attr(.data, "laplacian") &lt;- igraph::laplacian_matrix(mst.graph, normalize=TRUE, sparse=TRUE)
    .data
}

assign_greedy_neighbors &lt;- function(.data, cols) {
    expr &lt;- rlang::enquo(cols)
    pos &lt;- tidyselect::eval_select(expr, data = .data)
    df_cov &lt;- rlang::set_names(.data[pos], names(pos))
    cov_mat = scale(model.matrix(~.+0, df_cov))
    N = nrow(cov_mat)
    knn = FNN::get.knn(cov_mat, k=1)
    knn.adj = Matrix::sparseMatrix(i=1:N, j=c(knn$nn.index), x=c(knn$nn.dist))
    knn.graph &lt;- graph_from_adjacency_matrix(knn.adj, mode="plus", weighted=TRUE, diag=FALSE)
    assignments &lt;- assign_greedily(knn.graph)
    .data$treatment &lt;- assignments
    attr(.data, "laplacian") &lt;- igraph::laplacian_matrix(knn.graph, normalize=TRUE, sparse=TRUE)
    .data
}

assign_matched_pairs &lt;- function(.data, cols, .s2=2, .neighbors=6) {
    expr &lt;- rlang::enquo(cols)
    pos &lt;- tidyselect::eval_select(expr, data = .data)
    df_cov &lt;- rlang::set_names(.data[pos], names(pos))
    cov_mat = scale(model.matrix(~.+0, df_cov))
    N = nrow(cov_mat)
    knn = FNN::get.knn(cov_mat, k=.neighbors)
    knn.adj = Matrix::sparseMatrix(i=rep(1:N, .neighbors), j=c(knn$nn.index), x=exp(-c(knn$nn.dist) / .s2))
    knn.graph &lt;- graph_from_adjacency_matrix(knn.adj, mode="plus", weighted=TRUE, diag=FALSE)
    E(knn.graph)$weight &lt;- (-1 * E(knn.graph)$weight)
    mwm.graph = igraph::max_bipartite_match(knn.graph)
    E(mwm.graph)$weight &lt;- (-1 * E(mwm.graph)$weight)
    assignments &lt;- assign_greedily(mwm.graph)
    .data$treatment &lt;- assignments
    attr(.data, "laplacian") &lt;- igraph::laplacian_matrix(mwm.graph, normalize=TRUE, sparse=TRUE)
    .data
}

# library(tibble)
# data = tibble(
#     x1=runif(10),
#     x2=runif(10),
#     x3=rbinom(10, 1, 0.5)
# )
# library(dplyr)
# library(tidyr)
# library(ggplot2)
# data %&gt;% assign_softblock(c(x1, x2)) -&gt; newdata

# ggplot(newdata, aes(x=x1, y=x2, color=factor(treatment), shape=factor(x3))) + geom_point() + theme_minimal()

# newdata %&gt;%
#     attr("laplacian") %&gt;%
#     ifelse(lower.tri(.), ., 0) %&gt;%
#     as_tibble() -&gt; adj_df
# names(adj_df) &lt;- paste0(1:ncol(adj_df))

# adj_df %&gt;%
#     group_by(id_1=as.character(row_number())) %&gt;%
#     gather(id_2, weight, -id_1) %&gt;%
#     filter(weight != 0)  %&gt;%
#     mutate(id_2=as.character(id_2), id=paste(id_1, id_2, sep='-')) -&gt; adj_df

# locs = newdata %&gt;% mutate(id=as.character(row_number())) %&gt;% select(id, x1, x2, x3)

# edges=bind_rows(
# adj_df  %&gt;% inner_join(locs, by=c('id_1'='id')),
# adj_df  %&gt;% inner_join(locs, by=c('id_2'='id'))
# ) %&gt;% arrange(id) %&gt;% ungroup()

# pp = ggplot(newdata, aes(x=x1, y=x2)) +
# geom_line(aes(group=id, size=1), data=edges, color='grey') +
# geom_point(aes(color=factor(treatment), shape=factor(x3), size=2), alpha=.9) +
# scale_size_continuous(range=c(1, 3)) +
# theme_minimal() + theme(legend.position='none')

# print(pp)</code></pre>
</div>
</div>
</details>
</section>
</section>
<section id="data-preparation" class="level1">
<h1>Data Preparation</h1>
<p>This demo will be based on North Carolina data because their Board of Elections makes it very easy to get precinct level data. I’m also only going to use historical data from the most recent election to avoid needing to match precincts across elections.</p>
<p>With access to a full voter file, this section could be drastically improved by incorporating other important elements into the design like demographics.</p>
<section id="get-precinct-data" class="level2 tabset">
<h2 class="tabset anchored" data-anchor-id="get-precinct-data">Get Precinct data</h2>
<section id="results-data" class="level3">
<h3 class="anchored" data-anchor-id="results-data">Results Data</h3>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb3-1">url <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"http://dl.ncsbe.gov/ENRS/2020_11_03/results_pct_20201103.zip"</span></span>
<span id="cb3-2">zip_file <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tempfile</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fileext =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">".zip"</span>)</span>
<span id="cb3-3"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">download.file</span>(url, zip_file, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">mode =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"wb"</span>)</span>
<span id="cb3-4">spec <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cols</span>(</span>
<span id="cb3-5">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">County =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">col_character</span>(),</span>
<span id="cb3-6">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Election Date</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">col_character</span>(),</span>
<span id="cb3-7">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Precinct =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">col_character</span>(),</span>
<span id="cb3-8">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Contest Group ID</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">col_double</span>(),</span>
<span id="cb3-9">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Contest Type</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">col_character</span>(),</span>
<span id="cb3-10">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Contest Name</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">col_character</span>(),</span>
<span id="cb3-11">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Choice =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">col_character</span>(),</span>
<span id="cb3-12">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Choice Party</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">col_character</span>(),</span>
<span id="cb3-13">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Vote For</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">col_double</span>(),</span>
<span id="cb3-14">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Election Day</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">col_double</span>(),</span>
<span id="cb3-15">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">One Stop</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">col_double</span>(),</span>
<span id="cb3-16">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Absentee by Mail</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">col_double</span>(),</span>
<span id="cb3-17">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Provisional =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">col_double</span>(),</span>
<span id="cb3-18">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Total Votes</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">col_double</span>(),</span>
<span id="cb3-19">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Real Precinct</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">col_character</span>(),</span>
<span id="cb3-20">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">X16 =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">col_skip</span>()</span>
<span id="cb3-21">)</span>
<span id="cb3-22">results <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> readr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">read_tsv</span>(zip_file, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">col_types=</span>spec)</span></code></pre></div></div>
</details>
<div class="cell-output cell-output-stderr">
<pre><code>New names:
• `` -&gt; `...16`</code></pre>
</div>
<div class="cell-output cell-output-stderr">
<pre><code>Warning: The following named parsers don't match the column names: X16</code></pre>
</div>
</div>
</section>
<section id="shapefiles" class="level3">
<h3 class="anchored" data-anchor-id="shapefiles">Shapefiles</h3>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb6-1">url <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"https://s3.amazonaws.com/dl.ncsbe.gov/ShapeFiles/Precinct/SBE_PRECINCTS_20201018.zip"</span></span>
<span id="cb6-2">temp <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tempfile</span>()</span>
<span id="cb6-3">temp2 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tempfile</span>()</span>
<span id="cb6-4"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">download.file</span>(url, temp)</span>
<span id="cb6-5"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">unzip</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">zipfile =</span> temp, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">exdir =</span> temp2)</span>
<span id="cb6-6">nc_SHP_file <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list.files</span>(temp2, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">pattern =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">".shp$"</span>,<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">full.names=</span><span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>)</span>
<span id="cb6-7">shapes <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> sf<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">read_sf</span>(nc_SHP_file)</span></code></pre></div></div>
</details>
</div>
</section>
</section>
<section id="aggregate-data-and-join" class="level2">
<h2 class="anchored" data-anchor-id="aggregate-data-and-join">Aggregate Data and Join</h2>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb7-1">results <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb7-2">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Real Precinct</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Y'</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb7-3">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">group_by</span>(County, Precinct) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb7-4">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summarize</span>(</span>
<span id="cb7-5">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">total_vote_pres=</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Total Votes</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span>[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Contest Name</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'US PRESIDENT'</span>], <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">na.rm=</span><span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>),</span>
<span id="cb7-6">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">dem_share_pres=</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Total Votes</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span>[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Contest Name</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'US PRESIDENT'</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&amp;</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Choice Party</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'DEM'</span>], <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">na.rm=</span><span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>total_vote_pres,</span>
<span id="cb7-7">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">gop_share_pres=</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Total Votes</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span>[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Contest Name</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'US PRESIDENT'</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&amp;</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Choice Party</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'REP'</span>], <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">na.rm=</span><span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>total_vote_pres,</span>
<span id="cb7-8">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">total_vote_senate=</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Total Votes</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span>[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Contest Name</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'US SENATE'</span>], <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">na.rm=</span><span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>),</span>
<span id="cb7-9">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">dem_share_senate=</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Total Votes</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span>[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Contest Name</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'US SENATE'</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&amp;</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Choice Party</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'DEM'</span>], <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">na.rm=</span><span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>total_vote_senate,</span>
<span id="cb7-10">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">gop_share_senate=</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Total Votes</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span>[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Contest Name</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'US SENATE'</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&amp;</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Choice Party</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'REP'</span>], <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">na.rm=</span><span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>total_vote_senate,</span>
<span id="cb7-11">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">total_vote_gov=</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Total Votes</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span>[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Contest Name</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'NC GOVERNOR'</span>], <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">na.rm=</span><span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>),</span>
<span id="cb7-12">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">dem_share_gov=</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Total Votes</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span>[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Contest Name</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'NC GOVERNOR'</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&amp;</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Choice Party</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'DEM'</span>], <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">na.rm=</span><span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>total_vote_gov,</span>
<span id="cb7-13">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">gop_share_gov=</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Total Votes</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span>[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Contest Name</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'NC GOVERNOR'</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&amp;</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Choice Party</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'REP'</span>], <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">na.rm=</span><span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>total_vote_gov,</span>
<span id="cb7-14">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">total_vote_house=</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Total Votes</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span>[<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">grepl</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'US HOUSE OF REPRESENTATIVES DISTRICT'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Contest Name</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span>)], <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">na.rm=</span><span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>),</span>
<span id="cb7-15">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">dem_share_house=</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Total Votes</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span>[<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">grepl</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'US HOUSE OF REPRESENTATIVES DISTRICT'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Contest Name</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&amp;</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Choice Party</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'DEM'</span>], <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">na.rm=</span><span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>total_vote_house,</span>
<span id="cb7-16">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">gop_share_house=</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Total Votes</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span>[<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">grepl</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'US HOUSE OF REPRESENTATIVES DISTRICT'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Contest Name</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&amp;</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Choice Party</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'REP'</span>], <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">na.rm=</span><span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>total_vote_house</span>
<span id="cb7-17">    ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ungroup</span>() <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">-&gt;</span> results_agg</span></code></pre></div></div>
</details>
<div class="cell-output cell-output-stderr">
<pre><code>`summarise()` has grouped output by 'County'. You can override using the
`.groups` argument.</code></pre>
</div>
<details class="code-fold">
<summary>Code</summary>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb9-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">inner_join</span>(results_agg, shapes, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">by=</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'County'</span><span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'county_nam'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Precinct'</span><span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'prec_id'</span>)) <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">-&gt;</span> df_joined</span>
<span id="cb9-2">DT<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">datatable</span>(df_joined <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sample_n</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">size=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> dplyr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>geometry), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">rownames =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">FALSE</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">options=</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">scrollX=</span><span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">autoWidth =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>))</span></code></pre></div></div>
</details>
<div class="cell-output-display">
<div class="datatables html-widget html-fill-item" id="htmlwidget-1770d66a86dcce969965" style="width:100%;height:auto;"></div>
<script type="application/json" data-for="htmlwidget-1770d66a86dcce969965">{"x":{"filter":"none","vertical":false,"data":[["WAKE","WAKE","SWAIN","CRAVEN","BUNCOMBE","ORANGE","GUILFORD","HENDERSON","HOKE","GUILFORD","DAVIDSON","MECKLENBURG","GUILFORD","GUILFORD","GASTON","CARTERET","GUILFORD","RICHMOND","LEE","ROWAN","WAYNE","MADISON","MECKLENBURG","NEW HANOVER","BUNCOMBE","MECKLENBURG","CUMBERLAND","PENDER","CASWELL","HALIFAX","WILKES","WATAUGA","NORTHAMPTON","FRANKLIN","ORANGE","GUILFORD","CABARRUS","HENDERSON","BRUNSWICK","BUNCOMBE","IREDELL","RUTHERFORD","CUMBERLAND","MECKLENBURG","BLADEN","DURHAM","BRUNSWICK","CABARRUS","BURKE","MECKLENBURG","ALEXANDER","WAKE","SAMPSON","CABARRUS","WAYNE","WAKE","HARNETT","DURHAM","CASWELL","YANCEY","ROBESON","ROWAN","COLUMBUS","ALAMANCE","RUTHERFORD","ROBESON","PERQUIMANS","DURHAM","WAKE","JOHNSTON","GUILFORD","GUILFORD","WILKES","WAKE","MADISON","DAVIDSON","STANLY","GUILFORD","GUILFORD","MECKLENBURG","PITT","MECKLENBURG","CLEVELAND","RUTHERFORD","PERQUIMANS","ORANGE","GUILFORD","BRUNSWICK","GUILFORD","LENOIR","UNION","CUMBERLAND","WAKE","BERTIE","UNION","DAVIE","RANDOLPH","GASTON","UNION","FORSYTH"],["16-07","20-11","WHCH","20","21.1","CS1","OR2","SM","01","G32","52","141","G05","H26","40","ASCI","G11","13","C1","04","11","MARS H","235","W25","34.1","128","CC21","CF11","ANDE","HPR","108","02","CREEKS","11","SJ","G40A2","07-00","PV","17","39.2","ST4","14","G4A","113","P502","06","05","02-05","0051","001","E","18-01","HARR","12-04","21","20-04","PR20","35.3","LEAS","06 JAC","17","13","P25B","06E","10A","22","PARKVI","38","16-01","PR04","H24","FR5B","101","01-29","HOT SP","64","0018","NCGR1","H01","021","0200B","015","LAWNDL","06A","BELVID","WC","H20B","16","G65","FC","029C","MB62","04-10","W1","012","03","AS","18","036","304"],[462,626,2005,2615,919,176,291,2513,274,270,3109,1788,192,197,3788,621,209,213,551,3340,196,2992,2103,1014,385,1858,2377,2618,699,237,204,297,134,1265,169,235,348,1289,5288,359,2375,1827,1099,2109,141,125,3280,463,2568,992,1962,493,1295,366,183,406,485,399,531,1197,252,1792,993,1780,1021,1799,1412,383,442,404,272,161,276,368,648,760,918,428,330,1410,399,2578,1362,2057,555,370,144,4433,280,3250,2139,1980,370,294,2155,1923,3108,4649,1669,1432],[0.3181818181818182,0.3370607028753994,0.571072319201995,0.3560229445506692,0.7366702937976061,0.3409090909090909,0.2061855670103093,0.2475129327497015,0.3722627737226277,0.3481481481481482,0.1366999035059505,0.8092841163310962,0.9427083333333334,0.350253807106599,0.3286694825765575,0.2045088566827697,0.6028708133971292,0.3145539906103286,0.2903811252268603,0.1772455089820359,0.7397959183673469,0.4154411764705883,0.5387541607227769,0.814595660749507,0.6285714285714286,0.6776103336921421,0.6848969289019773,0.2822765469824293,0.2532188841201717,0.4050632911392405,0.25,0.101010101010101,0.2985074626865671,0.5446640316205533,0.5266272189349113,0.5617021276595745,0.09770114942528736,0.339022498060512,0.367624810892587,0.5543175487465181,0.4509473684210527,0.3475643130815544,0.5632393084622384,0.4513987671882408,0.4326241134751773,0.776,0.3152439024390244,0.3520518358531318,0.3169781931464175,0.5766129032258065,0.2038735983690112,0.5699797160243407,0.6525096525096525,0.3825136612021858,0.5573770491803278,0.3842364532019704,0.4412371134020618,0.4987468671679198,0.4519774011299435,0.3057644110275689,0.4325396825396826,0.140625,0.07955689828801611,0.5168539325842697,0.188050930460333,0.4130072262367982,0.3803116147308782,0.7049608355091384,0.4027149321266968,0.2103960396039604,0.3198529411764706,0.4037267080745341,0.1014492753623188,0.358695652173913,0.3503086419753086,0.7013157894736842,0.2189542483660131,0.2336448598130841,0.6818181818181818,0.6468085106382979,0.5338345864661654,0.7540729247478666,0.4133627019089574,0.3947496353913466,0.2594594594594595,0.3891891891891892,0.3055555555555556,0.3584480036092939,0.5142857142857142,0.3076923076923077,0.3501636278634876,0.7772727272727272,0.3918918918918919,0.6360544217687075,0.1271461716937355,0.1606864274570983,0.2638352638352638,0.4902129490212949,0.2570401437986818,0.9594972067039106],[0.6320346320346321,0.6309904153354633,0.4049875311720698,0.6332695984703632,0.2448313384113167,0.625,0.7628865979381443,0.7373656983684839,0.6094890510948905,0.6222222222222222,0.8481826954004503,0.1616331096196868,0.04166666666666666,0.6243654822335025,0.6615628299894404,0.7938808373590982,0.3684210526315789,0.6713615023474179,0.6969147005444646,0.8077844311377246,0.2397959183673469,0.5711898395721925,0.442225392296719,0.1518737672583826,0.3402597402597403,0.3067814854682454,0.2965923432898612,0.6990068754774638,0.7396280400572246,0.5864978902953587,0.7303921568627451,0.8855218855218855,0.6940298507462687,0.4442687747035573,0.4437869822485207,0.4212765957446808,0.8908045977011494,0.6431342125678821,0.6263237518910741,0.4178272980501393,0.5322105263157895,0.6360153256704981,0.4158325750682438,0.5320056899004267,0.5531914893617021,0.192,0.6673780487804878,0.6285097192224622,0.6748442367601246,0.4122983870967742,0.7854230377166157,0.3610547667342799,0.3366795366795367,0.5792349726775956,0.4262295081967213,0.5862068965517241,0.5257731958762887,0.4461152882205514,0.5329566854990584,0.6791979949874687,0.5634920634920635,0.8448660714285714,0.9164149043303121,0.4646067415730337,0.8001958863858962,0.5764313507504168,0.6090651558073654,0.2637075718015666,0.5610859728506787,0.7821782178217822,0.6470588235294118,0.5652173913043478,0.8804347826086957,0.6086956521739131,0.6296296296296297,0.2868421052631579,0.7668845315904139,0.7383177570093458,0.306060606060606,0.3347517730496454,0.4511278195488722,0.2238169123351435,0.580029368575624,0.5887214389888187,0.7261261261261261,0.5864864864864865,0.6458333333333334,0.6336566659147305,0.4357142857142857,0.6843076923076923,0.6358111266947172,0.2065656565656566,0.5783783783783784,0.3639455782312925,0.8607888631090487,0.8231929277171087,0.722972972972973,0.492794149279415,0.7297783103654883,0.03072625698324022],[455,621,1976,2598,911,175,292,2491,270,271,3096,1761,189,198,3743,602,209,212,544,3318,193,2975,2082,1011,383,1851,2356,2589,688,236,202,294,133,1254,171,228,345,1280,5236,350,2351,1801,1089,2101,137,126,3235,463,2545,987,1943,485,1286,354,181,403,482,395,526,1201,247,1778,984,1768,1021,1743,1401,379,441,405,268,161,270,369,634,752,909,425,321,1400,389,2559,1351,2043,549,368,143,4382,278,3211,2119,1956,364,290,2129,1912,3068,4598,1651,1420],[0.3098901098901099,0.3075684380032206,0.5273279352226721,0.3498845265588915,0.7178924259055982,0.3371428571428571,0.1849315068493151,0.2316338819751104,0.3888888888888889,0.3284132841328413,0.1579457364341085,0.7705848949460534,0.9312169312169312,0.3131313131313131,0.3224686080683943,0.2408637873754153,0.5071770334928229,0.330188679245283,0.2941176470588235,0.1720916214587101,0.6943005181347151,0.4094117647058824,0.5081652257444764,0.76162215628091,0.6240208877284595,0.6477579686655862,0.6735993208828522,0.2877558903051371,0.2601744186046512,0.4110169491525424,0.2821782178217822,0.119047619047619,0.3007518796992481,0.5661881977671451,0.5087719298245614,0.5482456140350878,0.1217391304347826,0.32109375,0.3609625668449198,0.5171428571428571,0.4325818800510421,0.3486951693503609,0.5436179981634527,0.4036173250832937,0.4598540145985401,0.7698412698412699,0.3174652241112829,0.3239740820734341,0.3115913555992141,0.4721377912867274,0.215645908389089,0.534020618556701,0.6423017107309487,0.3644067796610169,0.5303867403314917,0.3399503722084367,0.454356846473029,0.4481012658227848,0.4733840304182509,0.3089092422980849,0.4574898785425101,0.1428571428571428,0.100609756097561,0.5118778280542986,0.1802154750244858,0.4239816408491107,0.3668807994289793,0.6517150395778364,0.4081632653061225,0.2320987654320988,0.3283582089552239,0.3850931677018634,0.1592592592592593,0.2655826558265583,0.3880126182965299,0.6795212765957447,0.2189218921892189,0.2,0.6728971962616822,0.5842857142857143,0.5347043701799485,0.7018366549433372,0.4167283493708364,0.3798335780714635,0.2622950819672131,0.3804347826086957,0.3356643356643357,0.3461889548151529,0.4568345323741007,0.3086265960759888,0.3289287399716848,0.7443762781186094,0.3324175824175824,0.6068965517241379,0.1202442461249413,0.1610878661087866,0.2672750977835723,0.4734667246628969,0.2465172622652938,0.9464788732394366],[0.6175824175824176,0.6264090177133655,0.3917004048582996,0.6158583525789069,0.2447859495060373,0.6285714285714286,0.75,0.7205941389000401,0.5666666666666667,0.6457564575645757,0.8000645994832042,0.1584327086882453,0.02116402116402116,0.6161616161616161,0.6321132781191557,0.7225913621262459,0.4066985645933014,0.6320754716981132,0.6213235294117647,0.7802893309222423,0.2538860103626943,0.547563025210084,0.4404418828049952,0.1552917903066271,0.3394255874673629,0.3079416531604538,0.2775891341256367,0.6431054461181923,0.7063953488372093,0.5550847457627118,0.6534653465346535,0.826530612244898,0.6616541353383458,0.39792663476874,0.456140350877193,0.4473684210526316,0.855072463768116,0.6375,0.6044690603514133,0.4171428571428571,0.5133985538068907,0.6007773459189339,0.3829201101928374,0.5678248453117563,0.4817518248175183,0.1904761904761905,0.61854714064915,0.6241900647948164,0.6243614931237721,0.5055724417426545,0.7323726196603191,0.3649484536082474,0.3234836702954899,0.5310734463276836,0.4143646408839779,0.5880893300248139,0.5,0.4708860759493671,0.4866920152091255,0.6486261448792673,0.4696356275303644,0.8093363329583803,0.8658536585365854,0.4389140271493213,0.7825661116552399,0.5306942053930006,0.6052819414703783,0.287598944591029,0.5034013605442177,0.7308641975308642,0.6007462686567164,0.577639751552795,0.7925925925925926,0.6964769647696477,0.555205047318612,0.2659574468085106,0.7337733773377337,0.7576470588235295,0.2897196261682243,0.3692857142857143,0.4138817480719794,0.255177803829621,0.53960029607698,0.5849241311796378,0.7158469945355191,0.5842391304347826,0.5734265734265734,0.6168416248288453,0.4064748201438849,0.6568047337278107,0.6314299197734781,0.1932515337423313,0.6071428571428571,0.3517241379310345,0.8496946923438234,0.7923640167364017,0.6942633637548892,0.4826011309264898,0.7135069654754694,0.02676056338028169],[460,622,1988,2610,911,176,294,2504,273,270,3106,1761,192,196,3757,622,210,212,546,3322,194,2984,2083,1017,387,1853,2365,2605,695,237,203,298,134,1264,172,226,345,1280,5261,353,2369,1808,1095,2110,139,125,3259,458,2562,991,1957,485,1289,360,182,402,484,401,526,1208,251,1791,987,1778,1022,1771,1410,381,441,405,269,158,273,370,640,756,918,428,323,1405,399,2569,1360,2052,554,370,142,4404,278,3236,2136,1972,365,290,2146,1923,3094,4633,1657,1429],[0.3978260869565217,0.3922829581993569,0.5829979879275654,0.3781609195402299,0.7683863885839737,0.3977272727272727,0.2448979591836735,0.2759584664536741,0.4505494505494506,0.4111111111111111,0.1838377334191887,0.817717206132879,0.9270833333333334,0.3826530612244898,0.3540058557359596,0.2556270096463023,0.580952380952381,0.3584905660377358,0.3534798534798535,0.2155328115593016,0.7268041237113402,0.4547587131367292,0.556409025444071,0.816125860373648,0.6795865633074936,0.6961683756071236,0.7128964059196617,0.3201535508637236,0.2964028776978417,0.4388185654008439,0.3054187192118227,0.151006711409396,0.3283582089552239,0.5886075949367089,0.5465116279069767,0.6858407079646017,0.1333333333333333,0.35859375,0.4092377874928721,0.5410764872521246,0.4774166314900802,0.3838495575221239,0.5945205479452055,0.4568720379146919,0.4676258992805755,0.824,0.3562442467014422,0.3602620087336245,0.3645589383294301,0.5721493440968718,0.2519161982626469,0.5793814432989691,0.6757176105508146,0.4138888888888889,0.5824175824175825,0.4278606965174129,0.487603305785124,0.5561097256857855,0.4904942965779467,0.3352649006622517,0.5976095617529881,0.1697375767727526,0.1094224924012158,0.5556805399325084,0.2113502935420744,0.5143986448334275,0.4014184397163121,0.7165354330708661,0.4693877551020408,0.2814814814814815,0.3903345724907063,0.4620253164556962,0.1648351648351648,0.4081081081081081,0.428125,0.7129629629629629,0.2668845315904139,0.2803738317757009,0.7337461300309598,0.6505338078291815,0.5714285714285714,0.7481510315297781,0.4433823529411764,0.4249512670565302,0.2888086642599278,0.4243243243243243,0.3802816901408451,0.4003178928247048,0.5215827338129496,0.334672435105068,0.3609550561797753,0.7925963488843814,0.4246575342465753,0.6344827586206897,0.1444547996272134,0.2043681747269891,0.3225597931480285,0.5180228793438377,0.2776101388050694,0.960111966410077],[0.5804347826086956,0.5932475884244373,0.3888329979879276,0.614176245210728,0.2151481888035126,0.5511363636363636,0.7278911564625851,0.7064696485623003,0.5384615384615384,0.5703703703703704,0.8068254990341275,0.1567291311754685,0.04166666666666666,0.5816326530612245,0.631354804365185,0.7363344051446945,0.3809523809523809,0.6367924528301887,0.6263736263736264,0.7730282962071041,0.2268041237113402,0.5315013404825737,0.4229476716274604,0.1514257620452311,0.3074935400516796,0.2908796546141392,0.2630021141649049,0.6579654510556622,0.6935251798561151,0.5527426160337553,0.6600985221674877,0.8322147651006712,0.6716417910447762,0.3995253164556962,0.4302325581395349,0.3097345132743363,0.8579710144927536,0.621875,0.5803079262497624,0.4164305949008499,0.5031658927817645,0.5995575221238938,0.3799086757990868,0.5303317535545023,0.5035971223021583,0.152,0.6213562442467014,0.6222707423580786,0.6272443403590945,0.4228052472250252,0.7393970362800204,0.3670103092783505,0.3134212567882079,0.5277777777777778,0.4065934065934066,0.5348258706467661,0.4958677685950413,0.4164588528678304,0.5,0.6473509933774835,0.3904382470119522,0.8179787828029034,0.8794326241134752,0.4274465691788527,0.7798434442270059,0.4743083003952569,0.5886524822695035,0.2519685039370079,0.4807256235827664,0.7111111111111111,0.5799256505576208,0.5316455696202531,0.8278388278388278,0.581081081081081,0.5515625,0.2619047619047619,0.7167755991285403,0.6985981308411215,0.2414860681114551,0.3359430604982206,0.4160401002506265,0.2331646555079798,0.5433823529411764,0.5653021442495126,0.7003610108303249,0.5648648648648649,0.5774647887323944,0.5899182561307902,0.4064748201438849,0.6551297898640297,0.6217228464419475,0.1754563894523327,0.547945205479452,0.3482758620689655,0.8466915191053123,0.7852314092563702,0.6690368455074337,0.4644938484783078,0.7109233554616777,0.02589223233030091],[457,605,1960,2559,904,173,289,2483,269,261,3096,1646,185,190,3589,578,207,211,533,3271,190,2969,1390,1000,382,1447,2329,2564,683,231,201,294,134,1234,169,224,342,1276,5206,352,2326,1764,1086,2100,134,122,3189,458,2534,674,1936,482,1276,352,177,393,472,391,511,1193,241,1755,972,1750,1006,1759,1396,370,431,405,259,156,271,364,631,749,901,415,315,1087,391,2176,1337,2030,551,363,142,4357,270,3188,2106,1950,362,288,2120,1908,2974,4438,1642,1394],[0.3413566739606127,0.3355371900826447,0.548469387755102,0.3368503321610004,0.7334070796460177,0.3815028901734104,0.2214532871972318,0.2384212645992751,0.3903345724907063,0.3601532567049808,0.1401808785529716,1,0.9621621621621622,0.4157894736842105,0.3304541655057119,0.2024221453287197,0.6183574879227053,0.3554502369668247,0.3095684803001876,0.1742586365025986,0.7368421052631579,0.4078814415628157,1,0.8179999999999999,0.6151832460732984,1,0.712322885358523,0.2905616224648986,0.2547584187408492,0.4025974025974026,0.2338308457711443,0.1292517006802721,0.2985074626865671,0.5818476499189628,0.5976331360946746,0.6428571428571429,0.1023391812865497,0.3221003134796238,0.3520937379946216,0.5255681818181818,0.4591573516766982,0.3424036281179138,0.5874769797421732,0.4138095238095238,0.4552238805970149,0.819672131147541,0.3116964565694575,0.3427947598253275,0.3208366219415943,1,0.2071280991735537,0.5643153526970954,0.646551724137931,0.4147727272727273,0.5819209039548022,0.361323155216285,0.4682203389830508,0.5242966751918159,0.4696673189823874,0.3093042749371333,0.4522821576763486,0.1504273504273504,0.09259259259259259,0.5222857142857142,0.1858846918489065,0.4360432063672541,0.3775071633237823,0.7081081081081081,0.4454756380510441,0.1802469135802469,0.3359073359073359,0.4294871794871795,0.1365313653136531,0.3104395604395604,0.3518225039619651,0.7022696929238985,0.2375138734739179,0.2240963855421687,0.6984126984126984,1,0.5294117647058824,1,0.4255796559461481,0.3788177339901478,0.2595281306715064,0.3966942148760331,0.3309859154929577,0.3426669726876291,0.5703703703703704,0.2882685069008783,0.335707502374169,0.7979487179487179,0.3701657458563536,0.6354166666666666,0.1245283018867925,0.1540880503144654,0.2673167451244116,0.4864803965750338,0.2673568818514007,0.9691535150645624],[0.6017505470459519,0.6264462809917355,0.3908163265306123,0.6631496678389996,0.2367256637168142,0.6184971098265896,0.7785467128027682,0.7333870318163512,0.6096654275092936,0.6398467432950191,0.8598191214470284,0,0.03783783783783784,0.5842105263157895,0.6536639732516021,0.7975778546712803,0.3816425120772947,0.6445497630331753,0.6904315196998124,0.8257413634974015,0.2631578947368421,0.5581003704951162,0,0.174,0.3429319371727749,0,0.287677114641477,0.7086583463338534,0.7452415812591509,0.5974025974025974,0.7213930348258707,0.826530612244898,0.7014925373134329,0.4181523500810373,0.4023668639053254,0.3571428571428572,0.8976608187134503,0.6496865203761756,0.6473300038417211,0.4176136363636364,0.5408426483233018,0.6383219954648526,0.4125230202578269,0.5861904761904762,0.5447761194029851,0.180327868852459,0.6857949200376293,0.6572052401746725,0.6586424625098658,0,0.7732438016528925,0.3630705394190871,0.3526645768025078,0.5852272727272727,0.4180790960451977,0.5928753180661578,0.5317796610169492,0.4757033248081842,0.5303326810176126,0.6647108130762783,0.5477178423236515,0.8495726495726496,0.9074074074074074,0.4777142857142857,0.8021868787276342,0.5639567936327459,0.6224928366762178,0.2918918918918919,0.4965197215777262,0.817283950617284,0.6640926640926641,0.5705128205128205,0.8228782287822878,0.6703296703296703,0.5974643423137876,0.2977303070761015,0.7624861265260822,0.7759036144578313,0.3015873015873016,0,0.4705882352941176,0,0.5482423335826477,0.5911330049261084,0.7404718693284936,0.6033057851239669,0.6690140845070423,0.6559559329814092,0.4296296296296296,0.7117314930991218,0.664292497625831,0.2020512820512821,0.585635359116022,0.3645833333333333,0.8754716981132076,0.8459119496855346,0.7326832548755884,0.5013519603424966,0.7326431181485993,0.03084648493543759],[2319,2363,2460,1498,1808,178,966,793,736,919,1361,466,885,964,1101,1652,890,9,638,2582,2127,556,515,237,1813,456,1461,104,1643,843,2079,2130,208,1123,171,1001,1738,788,1868,1818,712,2564,1430,447,1895,1254,1857,1720,1763,337,1978,2335,2548,1731,2118,2358,831,1304,1644,2030,2646,2588,1529,2013,2577,2653,101,1300,2313,651,1007,1018,2072,2177,555,1367,2519,1003,925,375,60,364,1548,2575,97,175,985,1863,918,617,2430,1468,2247,1904,2399,1327,20,1077,2415,1186],["16-07","20-11","WHCH","FAIRFIELD HARBOUR","HAW CREEK ELEMENTARY SCHOOL","COLES STORE","OR2","SOUTH MILLS RIVER","ALLENDALE","G32","SILVER HILL 52","141","G05","H26","DALLAS  1","Atlantic/Sea Level/Cedar Island","G11","MINERAL SPRINGS  2","C1","BRADSHAW","11","MARS HILL","235","W25","BLACK MOUNTAIN 3 - LAKE TOMAHAWK","128","CROSS CREEK 21","CAPE FEAR","ANDERSON","HOBGOOD","FAIRPLAINS","BEAVER DAM","CREEKSVILLE","SANDY CREEK","ST JOHN","G40A2","07-00","PISGAH VIEW","LONGWOOD","FAIRVIEW VOLUNTEER FIRE DEPT","STATESVILLE  4","FOREST CITY  2","CROSS CREEK 30-G4","113","ELIZABETHTOWN  2","LAKEWOOD SCHOOL","TOWNCREEK","02-05","SILVER CREEK 01","001","ELLENDALE","18-01","HARRELLS","12-04","21","20-04","STEWARTS CREEK","0035.3","LEASBURG","JACKS CREEK","LUMBERTON  7","ROCK GROVE","WILLIAMS 2","EAST GRAHAM","DUNCAN CREEK-GOLDEN VALLEY","NORTH PEMBROKE","PARKVILLE","HOPE VALLEY BAPTIST","16-01","BENTONVILLE","H24","FR5B","ANTIOCH","01-29","HOT SPRINGS","THOMASVILLE 3 64","RICHFIELD","NCGR1","H01","021","AYDEN B","015","LAWNDL","CHIMNEY ROCK","BELVIDERE","WHITE CROSS","H20B","SHINGLETREE 1","G65","FALLING CREEK","STALLINGS VFD","MONTIBELLO","04-10","WINDSOR 1","BETHLEHEM PRESBYTERIAN CHURCH","CLARKSVILLE","ASHEBORO SOUTH","ASHBROOK","CROSSROADS AME ZION CHURCH","FORSYTH TECH CC MAZIE WOODRUFF CTR"],[null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null],[92,92,87,25,11,68,41,45,47,41,29,60,41,41,36,16,41,77,53,80,96,57,60,65,11,60,26,71,17,42,97,95,66,35,68,41,13,45,10,11,49,81,26,60,9,32,10,13,12,60,2,92,82,13,96,92,43,32,17,100,78,80,24,1,81,78,72,32,92,51,41,41,97,92,57,29,84,41,41,60,74,60,23,81,72,68,41,10,41,54,90,26,92,8,90,30,76,36,90,34],[null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null]],"container":"<table class=\"display\">\n  <thead>\n    <tr>\n      <th>County<\/th>\n      <th>Precinct<\/th>\n      <th>total_vote_pres<\/th>\n      <th>dem_share_pres<\/th>\n      <th>gop_share_pres<\/th>\n      <th>total_vote_senate<\/th>\n      <th>dem_share_senate<\/th>\n      <th>gop_share_senate<\/th>\n      <th>total_vote_gov<\/th>\n      <th>dem_share_gov<\/th>\n      <th>gop_share_gov<\/th>\n      <th>total_vote_house<\/th>\n      <th>dem_share_house<\/th>\n      <th>gop_share_house<\/th>\n      <th>id<\/th>\n      <th>enr_desc<\/th>\n      <th>of_prec_id<\/th>\n      <th>county_id<\/th>\n      <th>blockid<\/th>\n    <\/tr>\n  <\/thead>\n<\/table>","options":{"scrollX":true,"autoWidth":true,"columnDefs":[{"className":"dt-right","targets":[2,3,4,5,6,7,8,9,10,11,12,13,14,17]},{"name":"County","targets":0},{"name":"Precinct","targets":1},{"name":"total_vote_pres","targets":2},{"name":"dem_share_pres","targets":3},{"name":"gop_share_pres","targets":4},{"name":"total_vote_senate","targets":5},{"name":"dem_share_senate","targets":6},{"name":"gop_share_senate","targets":7},{"name":"total_vote_gov","targets":8},{"name":"dem_share_gov","targets":9},{"name":"gop_share_gov","targets":10},{"name":"total_vote_house","targets":11},{"name":"dem_share_house","targets":12},{"name":"gop_share_house","targets":13},{"name":"id","targets":14},{"name":"enr_desc","targets":15},{"name":"of_prec_id","targets":16},{"name":"county_id","targets":17},{"name":"blockid","targets":18}],"order":[],"orderClasses":false}},"evals":[],"jsHooks":[]}</script>
</div>
</div>
</section>
<section id="add-some-geographic-features" class="level2">
<h2 class="anchored" data-anchor-id="add-some-geographic-features">Add some geographic features</h2>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb10" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb10-1">df_joined<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>geometry <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb10-2">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">st_centroid</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb10-3">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">st_transform</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"+init=epsg:4326"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb10-4">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">st_coordinates</span>() <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">-&gt;</span> latlong</span></code></pre></div></div>
</details>
<div class="cell-output cell-output-stderr">
<pre><code>Warning in CPL_crs_from_input(x): GDAL Message 1: +init=epsg:XXXX syntax is
deprecated. It might return a CRS with a non-EPSG compliant axis order. Further
messages of this type will be suppressed.</code></pre>
</div>
<details class="code-fold">
<summary>Code</summary>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb12" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb12-1">df_joined<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>longitude <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> latlong[, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'X'</span>]</span>
<span id="cb12-2">df_joined<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>latitude <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> latlong[, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Y'</span>]</span>
<span id="cb12-3">area <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> df_joined<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>geometry <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">st_transform</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"+init=epsg:4326"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">st_area</span>()</span>
<span id="cb12-4">df_joined<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>area_km2 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> units<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">drop_units</span>(area) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1e6</span> <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># convert m^2 to km^2</span></span>
<span id="cb12-5">df_joined <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> df_joined <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">vote_density_pres =</span> total_vote_pres <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> area_km2)</span></code></pre></div></div>
</details>
</div>
</section>
</section>
<section id="assign-treatment-four-ways" class="level1">
<h1>Assign treatment four ways</h1>
<div class="tabset-margin-container"></div><div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-1-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-1-1" aria-controls="tabset-1-1" aria-selected="true" href="">Softblock</a></li><li class="nav-item"><a class="nav-link" id="tabset-1-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-1-2" aria-controls="tabset-1-2" aria-selected="false" href="">Greedy nearest neighbors</a></li><li class="nav-item"><a class="nav-link" id="tabset-1-3-tab" data-bs-toggle="tab" data-bs-target="#tabset-1-3" aria-controls="tabset-1-3" aria-selected="false" href="">QuickBlock</a></li><li class="nav-item"><a class="nav-link" id="tabset-1-4-tab" data-bs-toggle="tab" data-bs-target="#tabset-1-4" aria-controls="tabset-1-4" aria-selected="false" href="">Matched Pairs</a></li></ul>
<div class="tab-content">
<div id="tabset-1-1" class="tab-pane active" aria-labelledby="tabset-1-1-tab">
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb13" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb13-1">start_time <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> lubridate<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">now</span>()</span>
<span id="cb13-2">df_joined <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">assign_softblock</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(</span>
<span id="cb13-3">    longitude, latitude, area_km2, vote_density_pres, <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># geographic</span></span>
<span id="cb13-4">    total_vote_pres, dem_share_pres, gop_share_pres, <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># 2020 presidential</span></span>
<span id="cb13-5">    total_vote_senate, dem_share_senate, gop_share_senate, <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># 2020 senate</span></span>
<span id="cb13-6">    total_vote_gov, dem_share_gov, gop_share_gov, <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># 2020 governor</span></span>
<span id="cb13-7">    total_vote_house, dem_share_house, gop_share_house <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># 2020 house</span></span>
<span id="cb13-8">)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb13-9"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rename</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">treatment_sb=</span>treatment)<span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">-&gt;</span> df_joined</span></code></pre></div></div>
</details>
<div class="cell-output cell-output-stderr">
<pre><code>Warning: The `normalized` argument of `make_lattice()` provide normalization instead as
of igraph 2.0.3.
ℹ `normalized` is now deprecated, use `normalization` instead.</code></pre>
</div>
<details class="code-fold">
<summary>Code</summary>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb15" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb15-1">softblock_weights <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">attr</span>(df_joined, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"laplacian"</span>)</span>
<span id="cb15-2">end_time <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> lubridate<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">now</span>()</span>
<span id="cb15-3"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">print</span>(end_time <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> start_time)</span></code></pre></div></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code>Time difference of 0.7473469 secs</code></pre>
</div>
</div>
</div>
<div id="tabset-1-2" class="tab-pane" aria-labelledby="tabset-1-2-tab">
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb17" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb17-1">start_time <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> lubridate<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">now</span>()</span>
<span id="cb17-2">df_joined <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">assign_greedy_neighbors</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(</span>
<span id="cb17-3">    longitude, latitude, area_km2, vote_density_pres, <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># geographic</span></span>
<span id="cb17-4">    total_vote_pres, dem_share_pres, gop_share_pres, <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># 2020 presidential</span></span>
<span id="cb17-5">    total_vote_senate, dem_share_senate, gop_share_senate, <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># 2020 senate</span></span>
<span id="cb17-6">    total_vote_gov, dem_share_gov, gop_share_gov, <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># 2020 governor</span></span>
<span id="cb17-7">    total_vote_house, dem_share_house, gop_share_house <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># 2020 house</span></span>
<span id="cb17-8">)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb17-9"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rename</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">treatment_nn=</span>treatment)<span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">-&gt;</span> df_joined</span>
<span id="cb17-10">nn_weights <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">attr</span>(df_joined, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"laplacian"</span>)</span>
<span id="cb17-11">end_time <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> lubridate<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">now</span>()</span>
<span id="cb17-12"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">print</span>(end_time <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> start_time)</span></code></pre></div></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code>Time difference of 1.482411 secs</code></pre>
</div>
</div>
</div>
<div id="tabset-1-3" class="tab-pane" aria-labelledby="tabset-1-3-tab">
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb19" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb19-1">start_time <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> lubridate<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">now</span>()</span>
<span id="cb19-2">df_joined <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> dplyr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(</span>
<span id="cb19-3">    longitude, latitude, area_km2, vote_density_pres, <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># geographic</span></span>
<span id="cb19-4">    total_vote_pres, dem_share_pres, gop_share_pres, <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># 2020 presidential</span></span>
<span id="cb19-5">    total_vote_senate, dem_share_senate, gop_share_senate, <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># 2020 senate</span></span>
<span id="cb19-6">    total_vote_gov, dem_share_gov, gop_share_gov, <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># 2020 governor</span></span>
<span id="cb19-7">    total_vote_house, dem_share_house, gop_share_house <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># 2020 house</span></span>
<span id="cb19-8">) <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">-&gt;</span> df_qb</span>
<span id="cb19-9">qb_blocks <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">quickblock</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.data.frame</span>(df_qb), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">size_constraint =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">6</span>L)</span>
<span id="cb19-10">df_joined<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>treatment_qb <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.integer</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.character</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">assign_treatment</span>(qb_blocks, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">treatments=</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'0'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'1'</span>))))</span>
<span id="cb19-11">end_time <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> lubridate<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">now</span>()</span>
<span id="cb19-12"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">print</span>(end_time <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> start_time)</span></code></pre></div></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code>Time difference of 0.03046703 secs</code></pre>
</div>
</div>
</div>
<div id="tabset-1-4" class="tab-pane" aria-labelledby="tabset-1-4-tab">
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb21" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb21-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># This is extremely slow.</span></span>
<span id="cb21-2"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># start_time = lubridate::now()</span></span>
<span id="cb21-3"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># df_joined %&gt;% assign_matched_pairs(c(</span></span>
<span id="cb21-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#     longitude, latitude, area_km2, vote_density_pres, # geographic</span></span>
<span id="cb21-5"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#     total_vote_pres, dem_share_pres, gop_share_pres, # 2020 presidential</span></span>
<span id="cb21-6"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#     total_vote_senate, dem_share_senate, gop_share_senate, # 2020 senate</span></span>
<span id="cb21-7"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#     total_vote_gov, dem_share_gov, gop_share_gov, # 2020 governor</span></span>
<span id="cb21-8"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#     total_vote_house, dem_share_house, gop_share_house # 2020 house</span></span>
<span id="cb21-9"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># )) %&gt;%</span></span>
<span id="cb21-10"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># rename(treatment_mp=treatment)-&gt; df_joined</span></span>
<span id="cb21-11"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># mp_weights &lt;- attr(df_joined, "laplacian")</span></span>
<span id="cb21-12"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># end_time = lubridate::now()</span></span>
<span id="cb21-13"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># print(end_time - start_time)</span></span></code></pre></div></div>
</details>
</div>
</div>
</div>
</div>
</section>
<section id="power-simulation" class="level1">
<h1>Power simulation</h1>
<p>To do power calculations consistently across methods, I will do the following:</p>
<ol type="1">
<li>Calculate the implied regression adjustment for each design, applied to vote share in 2020 as the outcome.</li>
<li>Find the residual for each point.</li>
<li>Permute residuals over units.</li>
<li>Estimate power for a given effect size by pulling additional permutations.</li>
<li>Sweep over a range of effect sizes.</li>
</ol>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb22" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb22-1">create_power_simulator <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(W, A, Y) {</span>
<span id="cb22-2">    dL <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">diag</span>(W)</span>
<span id="cb22-3">    Dinv <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> (<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> A <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> dL</span>
<span id="cb22-4">    estimates <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">drop</span>((Dinv <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%*%</span> W) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%*%</span> Y)</span>
<span id="cb22-5">    residuals <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> Y <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> estimates</span>
<span id="cb22-6">    x_mat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cbind</span>(A, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb22-7">    xlx <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">t</span>(x_mat) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%*%</span> W <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%*%</span> x_mat</span>
<span id="cb22-8">    bread <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> MASS<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ginv</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.matrix</span>(xlx))</span>
<span id="cb22-9">    detect_effect <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">effect=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>) {</span>
<span id="cb22-10">        sim_outcome <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> estimates <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sample</span>(residuals) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> A <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> effect</span>
<span id="cb22-11">        coefs <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> bread <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%*%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">t</span>(x_mat) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%*%</span> W <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%*%</span> sim_outcome</span>
<span id="cb22-12">        r <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">diag</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">drop</span>((sim_outcome <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> (x_mat <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%*%</span> coefs)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">^</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>))</span>
<span id="cb22-13">        meat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">t</span>(x_mat) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%*%</span> (W <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%*%</span> r <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%*%</span> W) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%*%</span> x_mat</span>
<span id="cb22-14">        vcv <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> bread <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%*%</span> meat <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%*%</span> bread</span>
<span id="cb22-15">        upr <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> coefs[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>] <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">qnorm</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.975</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sqrt</span>(vcv[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>])</span>
<span id="cb22-16">        lwr <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> coefs[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>] <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">qnorm</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.025</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sqrt</span>(vcv[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>])</span>
<span id="cb22-17">        lwr <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span></span>
<span id="cb22-18">    }</span>
<span id="cb22-19">    detect_effect</span>
<span id="cb22-20">}</span>
<span id="cb22-21">create_power_simulator_qb <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(blocks, A, Y) {</span>
<span id="cb22-22">    estimate_by_block <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tapply</span>(Y[A<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>], blocks[A<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>], mean) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tapply</span>(Y[A<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>], blocks[A<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>], mean)</span>
<span id="cb22-23">    estimates <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">unlist</span>(purrr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">map</span>(blocks, <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(b) estimate_by_block[<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.character</span>(b)]))</span>
<span id="cb22-24">    residuals <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> Y <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> estimates</span>
<span id="cb22-25">    detect_effect <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">effect=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>) {</span>
<span id="cb22-26">        sim_outcome <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> estimates <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sample</span>(residuals) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> A <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> effect</span>
<span id="cb22-27">        result <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> quickblock<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">blocking_estimator</span>(sim_outcome, blocks, A)</span>
<span id="cb22-28">        upr <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> result<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>effects[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>] <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">qnorm</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.975</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sqrt</span>(result<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>effect_variances[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>])</span>
<span id="cb22-29">        lwr <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> result<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>effects[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>] <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">qnorm</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.025</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sqrt</span>(result<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>effect_variances[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>])</span>
<span id="cb22-30">        lwr <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span></span>
<span id="cb22-31">    }</span>
<span id="cb22-32">    detect_effect</span>
<span id="cb22-33">}</span></code></pre></div></div>
</details>
</div>
<section id="simulate-each-design" class="level2">
<h2 class="anchored" data-anchor-id="simulate-each-design">Simulate each design</h2>
<div class="tabset-margin-container"></div><div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-2-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-2-1" aria-controls="tabset-2-1" aria-selected="true" href="">Softblock</a></li><li class="nav-item"><a class="nav-link" id="tabset-2-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-2-2" aria-controls="tabset-2-2" aria-selected="false" href="">Greedy Neighbors</a></li><li class="nav-item"><a class="nav-link" id="tabset-2-3-tab" data-bs-toggle="tab" data-bs-target="#tabset-2-3" aria-controls="tabset-2-3" aria-selected="false" href="">QuickBlock</a></li></ul>
<div class="tab-content">
<div id="tabset-2-1" class="tab-pane active" aria-labelledby="tabset-2-1-tab">
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb23" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb23-1">simulate_power <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">create_power_simulator</span>(softblock_weights, df_joined<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>treatment_sb, df_joined<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>dem_share_pres)</span>
<span id="cb23-2">estimate_power_for_effect <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(effect) <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mean</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">replicate</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">25</span>, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">simulate_power</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">effect=</span>effect)))</span>
<span id="cb23-3">effects <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">seq</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.05</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">length=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">25</span>)</span>
<span id="cb23-4">power_sb <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">unlist</span>(purrr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">map</span>(effects, estimate_power_for_effect))</span></code></pre></div></div>
</details>
</div>
</div>
<div id="tabset-2-2" class="tab-pane" aria-labelledby="tabset-2-2-tab">
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb24" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb24-1">simulate_power <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">create_power_simulator</span>(nn_weights, df_joined<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>treatment_nn, df_joined<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>dem_share_pres)</span>
<span id="cb24-2">estimate_power_for_effect <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(effect) <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mean</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">replicate</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">25</span>, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">simulate_power</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">effect=</span>effect)))</span>
<span id="cb24-3">power_nn <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">unlist</span>(purrr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">map</span>(effects, estimate_power_for_effect))</span></code></pre></div></div>
</details>
</div>
</div>
<div id="tabset-2-3" class="tab-pane" aria-labelledby="tabset-2-3-tab">
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb25" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb25-1">simulate_power <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">create_power_simulator_qb</span>(qb_blocks, df_joined<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>treatment_qb, df_joined<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>dem_share_pres)</span>
<span id="cb25-2">estimate_power_for_effect <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(effect) <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mean</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">replicate</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">25</span>, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">simulate_power</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">effect=</span>effect)))</span>
<span id="cb25-3">effects <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">seq</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.05</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">length=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">25</span>)</span>
<span id="cb25-4">power_qb <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">unlist</span>(purrr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">map</span>(effects, estimate_power_for_effect))</span></code></pre></div></div>
</details>
</div>
</div>
</div>
</div>
</section>
<section id="power-comparison" class="level2">
<h2 class="anchored" data-anchor-id="power-comparison">Power Comparison</h2>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb26" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb26-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(</span>
<span id="cb26-2">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(</span>
<span id="cb26-3">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">effects=</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(effects, effects, effects),</span>
<span id="cb26-4">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">power=</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(power_sb, power_nn, power_qb),</span>
<span id="cb26-5">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">design=</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rep</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'SoftBlock'</span>, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">length</span>(power_sb)), <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rep</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Greedy Neighbors'</span>, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">length</span>(power_sb)), <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rep</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'QuickBlock'</span>, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">length</span>(power_sb)))</span>
<span id="cb26-6">    ), <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(effects, power, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color=</span>design)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_line</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb26-7">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_x_continuous</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Effect (pp)'</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">labels=</span>scales<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span>percent) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb26-8">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_y_continuous</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Power"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">labels=</span>scales<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span>percent) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb26-9">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_color_discrete</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Design"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb26-10">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme_minimal</span>()</span></code></pre></div></div>
</details>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://ddimmery.com/posts/softblock-demo/index_files/figure-html/power_plot-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
</section>
</section>
<section id="analysis" class="level1">
<h1>Analysis</h1>
<section id="average-effects" class="level2">
<h2 class="anchored" data-anchor-id="average-effects">Average Effects</h2>
<p>The effect estimates here use the appropriate design-based estimators for each design.</p>
<p>First, I’m going to generate a fake outcome to use. I’ll leave the average effect near zero (0.5pp), but individual effects are random draws from around that vale, but with heterogeneous effects based on democratic vote share in 2020 (i.e.&nbsp;positive effects in democratic precincts and vice versa).</p>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb27" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb27-1">df_joined<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>outcome <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> df_joined<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>dem_share_pres</span>
<span id="cb27-2">df_joined<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>ite <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">with</span>(df_joined, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.005</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.005</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> (<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">plogis</span>((dem_share_pres <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">median</span>(dem_share_pres)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sd</span>(dem_share_pres))))</span>
<span id="cb27-3">df_joined<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>outcome_sb <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">with</span>(df_joined, outcome <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> treatment_sb <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> ite)</span>
<span id="cb27-4">df_joined<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>outcome_nn <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">with</span>(df_joined, outcome <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> treatment_nn <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> ite)</span>
<span id="cb27-5">df_joined<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>outcome_qb <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">with</span>(df_joined, outcome <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> treatment_qb <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> ite)</span></code></pre></div></div>
</details>
</div>
<div class="tabset-margin-container"></div><div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-3-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-3-1" aria-controls="tabset-3-1" aria-selected="true" href="">Softblock</a></li><li class="nav-item"><a class="nav-link" id="tabset-3-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-3-2" aria-controls="tabset-3-2" aria-selected="false" href="">Greedy Neighbors</a></li><li class="nav-item"><a class="nav-link" id="tabset-3-3-tab" data-bs-toggle="tab" data-bs-target="#tabset-3-3" aria-controls="tabset-3-3" aria-selected="false" href="">QuickBlock</a></li></ul>
<div class="tab-content">
<div id="tabset-3-1" class="tab-pane active" aria-labelledby="tabset-3-1-tab">
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb28" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb28-1">estimate_effect <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(W, A, Y) {</span>
<span id="cb28-2">    dL <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">diag</span>(W)</span>
<span id="cb28-3">    Dinv <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> (<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> A <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> dL</span>
<span id="cb28-4">    x_mat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cbind</span>(A, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb28-5">    xlx <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">t</span>(x_mat) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%*%</span> W <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%*%</span> x_mat</span>
<span id="cb28-6">    bread <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> MASS<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ginv</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.matrix</span>(xlx))</span>
<span id="cb28-7">    coefs <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> bread <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%*%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">t</span>(x_mat) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%*%</span> W <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%*%</span> Y</span>
<span id="cb28-8">    r <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">diag</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">drop</span>((Y <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> (x_mat <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%*%</span> coefs)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">^</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>))</span>
<span id="cb28-9">    meat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">t</span>(x_mat) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%*%</span> (W <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%*%</span> r <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%*%</span> W) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%*%</span> x_mat</span>
<span id="cb28-10">    vcv <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> bread <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%*%</span> meat <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%*%</span> bread</span>
<span id="cb28-11">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">estimate=</span>coefs[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>], <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">std.error=</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sqrt</span>(vcv[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]))</span>
<span id="cb28-12">}</span>
<span id="cb28-13">sb_est <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">estimate_effect</span>(softblock_weights, df_joined<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>treatment_sb, df_joined<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>outcome_sb)</span></code></pre></div></div>
</details>
</div>
</div>
<div id="tabset-3-2" class="tab-pane" aria-labelledby="tabset-3-2-tab">
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb29" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb29-1">nn_est <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">estimate_effect</span>(nn_weights, df_joined<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>treatment_nn, df_joined<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>outcome_nn)</span></code></pre></div></div>
</details>
</div>
</div>
<div id="tabset-3-3" class="tab-pane" aria-labelledby="tabset-3-3-tab">
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb30" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb30-1">result <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> quickblock<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">blocking_estimator</span>(df_joined<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>outcome_qb, qb_blocks, df_joined<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>treatment_qb)</span>
<span id="cb30-2">qb_est <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">estimate=</span>result<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>effects[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>], <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">std.error=</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sqrt</span>(result<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>effect_variances[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]))</span></code></pre></div></div>
</details>
</div>
</div>
</div>
</div>
</section>
<section id="plot-average-effects" class="level2">
<h2 class="anchored" data-anchor-id="plot-average-effects">Plot Average Effects</h2>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb31" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb31-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(</span>
<span id="cb31-2">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">estimate=</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(sb_est<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>estimate, nn_est<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>estimate, qb_est<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>estimate),</span>
<span id="cb31-3">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">std.error=</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(sb_est<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>std.error, nn_est<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>std.error, qb_est<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>std.error),</span>
<span id="cb31-4">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">design=</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"SoftBlock"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Greedy Neighbors"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"QuickBlock"</span>)</span>
<span id="cb31-5">) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb31-6"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x=</span>design, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y=</span>estimate, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ymin=</span>estimate<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">-1.96</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>std.error, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ymax=</span>estimate<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">+1.96</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>std.error)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb31-7">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_pointrange</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb31-8">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_x_discrete</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Design"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb31-9">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_y_continuous</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"ATE (pp)"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">labels=</span>scales<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span>percent) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb31-10">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">coord_flip</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb31-11">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme_minimal</span>()</span></code></pre></div></div>
</details>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://ddimmery.com/posts/softblock-demo/index_files/figure-html/plot_fx-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
</section>
<section id="heterogeneous-effects" class="level2">
<h2 class="anchored" data-anchor-id="heterogeneous-effects">Heterogeneous Effects</h2>
<p>These effects will be estimated using DR-learner of <a href="https://arxiv.org/abs/2004.14497">Kennedy (2020)</a>. For simplicity, I will estimate nuisance functions using <code>glmnet</code>.</p>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb32" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb32-1">predict.hte.split <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(x, a, y, s, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">predict.s=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>) {</span>
<span id="cb32-2">    s.pi <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> (predict.s) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%%</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span></span>
<span id="cb32-3">    s.mu <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> (predict.s <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%%</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span></span>
<span id="cb32-4">    s.dr <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> (predict.s <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%%</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span></span>
<span id="cb32-5">    pihat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">predict</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cv.glmnet</span>(x[s<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span>s.pi,],a[s<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span>s.pi], <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">family=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"binomial"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">nfolds=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">newx=</span>x, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">type=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"response"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">s=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"lambda.min"</span>)</span>
<span id="cb32-6">    mu0hat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">predict</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cv.glmnet</span>(x[a<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&amp;</span> s<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span>s.mu,],y[a<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&amp;</span> s<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span>s.mu], <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">nfolds=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">newx=</span>x, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">type=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"response"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">s=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"lambda.min"</span>)</span>
<span id="cb32-7">    mu1hat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">predict</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cv.glmnet</span>(x[a<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&amp;</span> s<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span>s.mu,],y[a<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&amp;</span> s<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span>s.mu], <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">nfolds=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>),<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">newx=</span>x, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">type=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"response"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">s=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"lambda.min"</span>)</span>
<span id="cb32-8">    pseudo <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> ((a<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>pihat)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>(pihat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>pihat)))<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>(y<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>a<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>mu1hat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>a)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>mu0hat) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> mu1hat <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> mu0hat</span>
<span id="cb32-9">    drl <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">predict</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cv.glmnet</span>(x[s<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span>s.dr,],pseudo[s<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span>s.dr]),<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">newx=</span>x[s<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span>predict.s, ], <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">s=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"lambda.min"</span>)</span>
<span id="cb32-10">    drl</span>
<span id="cb32-11">}</span>
<span id="cb32-12">predict.hte.crossfit <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(x, a, y) {</span>
<span id="cb32-13">    N <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">length</span>(a)</span>
<span id="cb32-14">    s <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sample</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>, N, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">replace=</span><span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>)</span>
<span id="cb32-15">    hte <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rep</span>(<span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">NA_real_</span>, N)</span>
<span id="cb32-16">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> (split <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>) {</span>
<span id="cb32-17">        hte[s<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span>split] <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">predict.hte.split</span>(x, a, y, s, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">predict.s=</span>split)</span>
<span id="cb32-18">    }</span>
<span id="cb32-19">    hte</span>
<span id="cb32-20">}</span>
<span id="cb32-21">calculate_hte <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(.data, cols, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">.treatment=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'treatment'</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">.outcome=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'outcome'</span>) {</span>
<span id="cb32-22">    expr <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> rlang<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">enquo</span>(cols)</span>
<span id="cb32-23">    pos <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> tidyselect<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">eval_select</span>(expr, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> .data)</span>
<span id="cb32-24">    df_cov <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> rlang<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">set_names</span>(.data[pos], <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">names</span>(pos))</span>
<span id="cb32-25">    cov_mat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">model.matrix</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span>.<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, df_cov))</span>
<span id="cb32-26">    .data<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>hte <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">predict.hte.crossfit</span>(cov_mat, .data[[.treatment]], .data[[.outcome]])</span>
<span id="cb32-27">    .data</span>
<span id="cb32-28">}</span></code></pre></div></div>
</details>
</div>
</section>
<section id="estimate-htes" class="level2">
<h2 class="anchored" data-anchor-id="estimate-htes">Estimate HTEs</h2>
<div class="tabset-margin-container"></div><div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-4-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-4-1" aria-controls="tabset-4-1" aria-selected="true" href="">Softblock</a></li><li class="nav-item"><a class="nav-link" id="tabset-4-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-4-2" aria-controls="tabset-4-2" aria-selected="false" href="">Greedy Neighbors</a></li><li class="nav-item"><a class="nav-link" id="tabset-4-3-tab" data-bs-toggle="tab" data-bs-target="#tabset-4-3" aria-controls="tabset-4-3" aria-selected="false" href="">QuickBlock</a></li></ul>
<div class="tab-content">
<div id="tabset-4-1" class="tab-pane active" aria-labelledby="tabset-4-1-tab">
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb33" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb33-1">df_joined <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">calculate_hte</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(</span>
<span id="cb33-2">    longitude, latitude, area_km2, vote_density_pres, <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># geographic</span></span>
<span id="cb33-3">    total_vote_pres, <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># 2020 presidential</span></span>
<span id="cb33-4">    total_vote_senate, dem_share_senate, gop_share_senate, <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># 2020 senate</span></span>
<span id="cb33-5">    total_vote_gov, dem_share_gov, gop_share_gov, <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># 2020 governor</span></span>
<span id="cb33-6">    total_vote_house, dem_share_house, gop_share_house <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># 2020 house</span></span>
<span id="cb33-7">), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">.treatment=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'treatment_sb'</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">.outcome=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'outcome_sb'</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rename</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">hte_sb=</span>hte) <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">-&gt;</span> df_joined</span>
<span id="cb33-8"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summary</span>(df_joined<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>hte_sb)</span></code></pre></div></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code>     Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
-0.010306  0.004135  0.007536  0.007148  0.009029  0.033580 </code></pre>
</div>
</div>
</div>
<div id="tabset-4-2" class="tab-pane" aria-labelledby="tabset-4-2-tab">
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb35" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb35-1">df_joined <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">calculate_hte</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(</span>
<span id="cb35-2">    longitude, latitude, area_km2, vote_density_pres, <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># geographic</span></span>
<span id="cb35-3">    total_vote_pres, <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># 2020 presidential</span></span>
<span id="cb35-4">    total_vote_senate, dem_share_senate, gop_share_senate, <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># 2020 senate</span></span>
<span id="cb35-5">    total_vote_gov, dem_share_gov, gop_share_gov, <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># 2020 governor</span></span>
<span id="cb35-6">    total_vote_house, dem_share_house, gop_share_house <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># 2020 house</span></span>
<span id="cb35-7">), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">.treatment=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'treatment_nn'</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">.outcome=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'outcome_nn'</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rename</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">hte_nn=</span>hte) <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">-&gt;</span> df_joined</span>
<span id="cb35-8"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summary</span>(df_joined<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>hte_nn)</span></code></pre></div></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code>    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
0.005742 0.006825 0.007388 0.007670 0.008596 0.010863 </code></pre>
</div>
</div>
</div>
<div id="tabset-4-3" class="tab-pane" aria-labelledby="tabset-4-3-tab">
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb37" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb37-1">df_joined <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">calculate_hte</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(</span>
<span id="cb37-2">    longitude, latitude, area_km2, vote_density_pres, <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># geographic</span></span>
<span id="cb37-3">    total_vote_pres, <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># 2020 presidential</span></span>
<span id="cb37-4">    total_vote_senate, dem_share_senate, gop_share_senate, <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># 2020 senate</span></span>
<span id="cb37-5">    total_vote_gov, dem_share_gov, gop_share_gov, <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># 2020 governor</span></span>
<span id="cb37-6">    total_vote_house, dem_share_house, gop_share_house <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># 2020 house</span></span>
<span id="cb37-7">), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">.treatment=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'treatment_qb'</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">.outcome=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'outcome_qb'</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rename</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">hte_qb=</span>hte) <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">-&gt;</span> df_joined</span>
<span id="cb37-8"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summary</span>(df_joined<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>hte_qb)</span></code></pre></div></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code>      Min.    1st Qu.     Median       Mean    3rd Qu.       Max. 
-0.0007862  0.0070832  0.0083612  0.0082249  0.0094074  0.0398552 </code></pre>
</div>
</div>
</div>
</div>
</div>
</section>
<section id="plot-hte-distributions" class="level2">
<h2 class="anchored" data-anchor-id="plot-hte-distributions">Plot HTE distributions</h2>
<div class="cell">
<details class="code-fold">
<summary>Code</summary>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb39" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb39-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(df_joined, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>()) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb39-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_histogram</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x=</span>hte_sb, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fill=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'SoftBlock'</span>), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">bins=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">50</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">alpha=</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.4</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb39-3"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_histogram</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x=</span>hte_nn, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fill=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Greedy Neighbors'</span>), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">bins=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">50</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">alpha=</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.4</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb39-4"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_histogram</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x=</span>hte_qb, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fill=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'QuickBlock'</span>), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">bins=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">50</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">alpha=</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.4</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb39-5"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_x_continuous</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Effect (pp)"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">labels=</span>scales<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span>percent) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb39-6"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme_minimal</span>()</span></code></pre></div></div>
</details>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://ddimmery.com/posts/softblock-demo/index_files/figure-html/hte_plot-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>


</section>
</section>

<div id="quarto-appendix" class="default"><section class="quarto-appendix-contents" id="quarto-reuse"><h2 class="anchored quarto-appendix-heading">Reuse</h2><div class="quarto-appendix-contents"><div><a rel="license" href="https://creativecommons.org/licenses/by/4.0/">CC BY 4.0</a></div></div></section><section class="quarto-appendix-contents" id="quarto-citation"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">BibTeX citation:</div><pre class="sourceCode code-with-copy quarto-appendix-bibtex"><code class="sourceCode bibtex">@online{dimmery2022,
  author = {Dimmery, Drew},
  title = {Using {SoftBlock} to {Design} an {Experiment}},
  date = {2022-05-10},
  url = {https://ddimmery.com/posts/softblock-demo/},
  langid = {en},
  abstract = {This is a demo of using SoftBlock for experimental design
    for a notional randomization of precincts in North Carolina. To
    optimize power, we focus on ensuring very similar patterns of prior
    voting patterns in treatment and in control. This demo walks through
    all the necessary steps for this process and shows how to perform
    estimation of average and heterogeneous effects.}
}
</code></pre><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-dimmery2022" class="csl-entry quarto-appendix-citeas">
Dimmery, Drew. 2022. <span>“Using SoftBlock to Design an
Experiment.”</span> May 10, 2022. <a href="https://ddimmery.com/posts/softblock-demo/">https://ddimmery.com/posts/softblock-demo/</a>.
</div></div></section></div> ]]></description>
  <category>experiments</category>
  <category>demo</category>
  <guid>https://ddimmery.com/posts/softblock-demo/</guid>
  <pubDate>Tue, 10 May 2022 00:00:00 GMT</pubDate>
  <media:content url="https://ddimmery.com/posts/softblock-demo/main-image.png" medium="image" type="image/png" height="76" width="144"/>
</item>
</channel>
</rss>
