<rss xmlns:dc="http://purl.org/dc/elements/1.1/"
     xmlns:content="http://purl.org/rss/1.0/modules/content/"
     xmlns:atom="http://www.w3.org/2005/Atom"
     xmlns:media="http://search.yahoo.com/mrss/"
     version="2.0">
  <channel>
    <title><![CDATA[ Junk Charts ]]></title>
    <description><![CDATA[ Data, graphs and AI - by the bestselling author of Numbers Rule Your World and industry veteran, blogging since 2006 ]]></description>
    <link>https://www.junkcharts.com</link>
    <image>
      <url>https://www.junkcharts.com/favicon.png</url>
      <title>Junk Charts</title>
      <link>https://www.junkcharts.com</link>
    </image>
    <lastBuildDate>Thu, 09 Apr 2026 12:49:12 -0400</lastBuildDate>
    <atom:link href="https://www.junkcharts.com" rel="self" type="application/rss+xml"/>
    <ttl>60</ttl>
        <item>
          <title><![CDATA[ Know your data 48: selling faces ]]></title>
          <link>https://www.junkcharts.com/know-your-data-48-selling-faces/</link>
          <guid isPermaLink="false">69d2b71d1f40e50001cf1286</guid>
          <category><![CDATA[ data sharing ]]></category>
          <dc:creator><![CDATA[ Kaiser Fung ]]></dc:creator>
          <pubDate>Thu, 09 Apr 2026 09:21:21 -0400</pubDate>
          <content:encoded><![CDATA[
            <style type="text/css">
              * { box-sizing: border-box !important; }
              img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                display: block !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              figure {
                margin: 0 !important;
                padding: 0 !important;
                max-width: 100% !important;
                width: 100% !important;
                overflow: hidden !important;
              }
              .kg-image, .kg-image-card img, .kg-gallery-image img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                object-fit: contain !important;
              }
              .kg-image-card, .kg-gallery-card {
                max-width: 100% !important;
                width: 100% !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              .post-content, .gh-content {
                max-width: 100% !important;
                overflow-x: hidden !important;
              }
              div, p, span {
                max-width: 100% !important;
              }
            </style>
            <p>In the last <a href="https://www.junkcharts.com/know-your-data-47-ai-solves-missing-connections-or-not/">post</a> about Clearview AI's face recognition database, I asked the rhetorical question:</p><blockquote>Have you heard a peep from Facebook/Instagram, Twitter, Google, etc. about Clearview illicitly taking images from their properties?</blockquote><p>I didn't expect an answer to appear in the same week! </p><p>The FTC has "settled" with OKCupid, which is a dating app owned by the Match group, on giving data to a face recognition company (Clarifai, a competitor of Clearview) without proper notification to its users (<a href="https://arstechnica.com/tech-policy/2026/03/okcupid-match-pay-no-fine-for-sharing-user-photos-with-facial-recognition-firm/?ref=junkcharts.com">link</a>).</p><p>I put "settled" in quotes because throughout the story of the data belonging to Americans, the meaning of many words has been warped beyond recognition. This is, as Ars Technica pointed out in its header, a "settlement" without any financial penalty. The company just swore on its pinky that it would behave in the future.</p><p>We can learn several bits of interesting information from this news though.</p><p>Firstly, despite being allowed to hide questionable behavior behind obnoxiously long, pugnaciously vague, and presumably unread privacy policies, these tech companies still have to engage in activities that they simply could not afford to put down in writing. </p><p>Secondly, the word "sharing" has also lost any meaning. Does anyone – except the FTC enforcers – seriously believe that OKCupid gave 3 million photos to Clarifai <em>for free</em>?<strong> </strong>And, <em>as an act of charity</em>, would you also take our users' location and their demographic data?</p><p>The FTC claimed it knew what happened: based on the passage quoted by Ars Technica, it appeared that they merely repeated the story told by OKCupid, Match, and Clarifai. They claimed that no formal agreement existed but disclosed that the founder of OKCupid and the CEO of Match were both "financially invested" in Clarifai. These parties somehow believed that this cover story gave them a get-out-of-jail card, a rationale to support their use of the word "sharing". In fact, this is even more troubling than if a straightforward commercial agreement were to exist.</p><p>For one, this story proves that user data at tech companies are at the hands of <em>individuals</em>. (We already sort of knew from some past actions e.g. by Elon Musk.) It also shows that these individuals – none of whom face any kind of sanctions – will sell out their users for personal financial gain. When no agreement exists, it's harder to trace where, when and what data have left the building.</p><p>Thirdly, this settlement stemmed from actions that took place in 2014, so it took 12 years for regulators to uphold the law by a friendly handshake with the offenders. </p><p>Fourthly, let's read the terms of the settlement carefully, shall we? Hold on to your seat belts, this is truly scary:</p><blockquote>OKCupid and Match... agreed to a permanent prohibition barring them from misrepresenting how they use and share personal data.</blockquote><p>Wait, so businesses are heretofore not prohibited from "misrepresenting how they use and share personal data"? It takes 12 years and a negotiated settlement to confirm to Americans that our businesses are in fact free to misrepresent how they use and share personal data – unless the FTC imposes a specific "permanent" ban  from lying?</p><p>Fifthly, apparently Clarifai has done absolutely nothing wrong in this matter, even though it is the entity that sought out the people's data, and exploited the data to build a product that sometimes might send them to jail for months because of errors (See prior <a href="https://www.junkcharts.com/know-your-data-47-ai-solves-missing-connections-or-not/">post</a>). For good measure, they even told us that they sell to "foreign governments, military operations and police departments".</p><p>Last but not least, this news resolves the mystery of how companies like Clearview AI and Clarifai build out their enormous databases of people's images associated with their personally-identifiable data. They may not even need to "scrape" the data; they simply get them "for free" via secretive "data sharing."</p><p></p><p></p><p></p><p></p>
          ]]></content:encoded>
          <description><![CDATA[ The FTC does not have our backs, that much is clear ]]></description>
        </item>
        <item>
          <title><![CDATA[ The paradox of circles ]]></title>
          <link>https://www.junkcharts.com/the-paradox-of-circles/</link>
          <guid isPermaLink="false">69d5e3a318b3440001a703e4</guid>
          <category><![CDATA[ distortion ]]></category>
          <dc:creator><![CDATA[ Kaiser Fung ]]></dc:creator>
          <pubDate>Wed, 08 Apr 2026 09:13:45 -0400</pubDate>
          <content:encoded><![CDATA[
            <style type="text/css">
              * { box-sizing: border-box !important; }
              img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                display: block !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              figure {
                margin: 0 !important;
                padding: 0 !important;
                max-width: 100% !important;
                width: 100% !important;
                overflow: hidden !important;
              }
              .kg-image, .kg-image-card img, .kg-gallery-image img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                object-fit: contain !important;
              }
              .kg-image-card, .kg-gallery-card {
                max-width: 100% !important;
                width: 100% !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              .post-content, .gh-content {
                max-width: 100% !important;
                overflow-x: hidden !important;
              }
              div, p, span {
                max-width: 100% !important;
              }
            </style>
            <p>While I was writing the posts about making paired circle charts (<a href="https://www.junkcharts.com/getting-metrics-right-is-half-the-battle/">here</a>, <a href="https://www.junkcharts.com/guide-to-using-pairs-of-circles/">here</a>), I came across a paradox. </p><p>The oft-repeated guidance when making circles or bubbles is that the data should be encoded in the areas. Since circular areas involve squaring the radii, we should square-root the data if we're putting the data in the radius (or diameter). This action happens frequently because in most drawing programs, we can control the radius (or diameter) easily, but we can't directly calibrate the area. </p><p>Now, take a look at the following chart:</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/04/example_racetrack_chart.png" class="kg-image" alt="" loading="lazy" width="512" height="496"></figure><p>I dislike these racetrack charts (<a href="https://www.junkcharts.com/close-races/">link</a>). Anyone who has run on a track knows the outer lanes are longer than the inner lanes. These charts draw our attention to the circumferences of the circles. </p><p>Therein lies the rub. The circumference is proportional to the radius (or diameter). Therefore, if we make an emptied-out circle, we should <strong>not</strong> square-root the data.</p><hr><p>The trouble is in every circle, the area and the circumference (or border) are simultaneously present. Hence, the paradox of circles. If the data are attached to the area, then the circumference distorts them; if the data are tied to the circumference (or radius or diameter or angle), then the area distorts it. One can't have both!</p><p>I asked Andrew about this paradox, and of course, he has written something related to it. It's in this <a href="https://statmodeling.stat.columbia.edu/2025/05/30/statistical-graphics-when-does-it-make-sense-to-introduce-deliberate-distortion-to-counteract-an-expected-perceptual-illusion/?ref=junkcharts.com">post</a>, which was a response to an earlier post of mine (<a href="https://www.junkcharts.com/the-line-angle-illusion">link</a>). </p><p>Andrew featured the following chart he made about the "social penumbra" of different groups of people:</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/04/agelman_quartercircles.png" class="kg-image" alt="" loading="lazy" width="1536" height="1002" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/2026/04/agelman_quartercircles.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/2026/04/agelman_quartercircles.png 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/04/agelman_quartercircles.png 1536w" sizes="(min-width: 720px) 720px"></figure><p>He ran into the neither here nor there problem, saying:</p><!--members-only--><blockquote>So by displaying the data as areas, we’re knowingly handing people a distortion. For example, if a certain group represents 1% of the population, then the core group (the yellow circle in the graph) will take up 1% of the area of the full circle and thus will be 10% in linear dimension.</blockquote><p>Instead of full circles, he made quarter-circles. I think it's a brilliant move. The other three-quarters are just wasted space, so to speak. However, because the right-angled edges are present, the readers may be more likely to pay attention to the radius of the quarter-circle, rather than its area. </p><p>For what it's worth, this is the <a href="https://www.junkcharts.com/tag/legend/" rel="noreferrer">legend</a>:</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/04/agelman_quartercircle_legend.png" class="kg-image" alt="" loading="lazy" width="1024" height="787" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/2026/04/agelman_quartercircle_legend.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/2026/04/agelman_quartercircle_legend.png 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/04/agelman_quartercircle_legend.png 1024w" sizes="(min-width: 720px) 720px"></figure><p>Let's zoom in on the "gun owner" category at the bottom left corner:</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/04/agelman_socialpenumbra_gunowner.png" class="kg-image" alt="" loading="lazy" width="502" height="498"></figure><p>The first gray circle ("family members") has a radius that is about 1.5 times that of the yellow "core". If the data are encoded in the circular areas, then the gray circle's area is \((1.5)^2 = 2.25\) times that of the yellow "core". So, the size of family members but not core is 1.25 times the size of the core. </p><p>On the other hand, if the data are encoded in the radii, then the gray circle (think its circumference) is 1.5 times that of the yellow circle, so that the set of family members (not core) is about half the size of the core. </p><p>Thus, the "distorted" quantity is quite severely distorted. As a designer, you're hoping that your readers interpret the chart the way you intended (one of area or circumference, but not both). </p><p>In this case, I would be surprised if readers are focused on the circumferences. They might try to measure the radii since that's much easier to compare than the areas. (This is still true if the full concentric circles are shown.) On balance, I still think these quarter-circles have a place in our toolbox.</p><hr><p>In the older <a href="https://www.junkcharts.com/the-line-angle-illusion" rel="noreferrer">post</a>, I asked whether designers should (be forgiven for?) deliberately distort data in order to correct known visual illusions. </p><p>The quarter-circle example is related but not quite what I had in mind. This paradox of circles is such that we are forced to distort one quantity no matter what; so we aren't really doing a double-negative to undo an illusion. </p><p>The log chart is also related but not quite what I had in mind. In a log chart, we deliberately introduce a severe distortion, and it's not because readers apply an illusion to undo its effect.</p>
          ]]></content:encoded>
          <description><![CDATA[ You can&#39;t have it both ways, no you can&#39;t ]]></description>
        </item>
        <item>
          <title><![CDATA[ Know your data 47: AI solves missing connections (or not) ]]></title>
          <link>https://www.junkcharts.com/know-your-data-47-ai-solves-missing-connections-or-not/</link>
          <guid isPermaLink="false">69c9d37f6cac520001da9c5f</guid>
          <category><![CDATA[ know your data ]]></category>
          <dc:creator><![CDATA[ Kaiser Fung ]]></dc:creator>
          <pubDate>Wed, 01 Apr 2026 08:46:48 -0400</pubDate>
          <content:encoded><![CDATA[
            <style type="text/css">
              * { box-sizing: border-box !important; }
              img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                display: block !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              figure {
                margin: 0 !important;
                padding: 0 !important;
                max-width: 100% !important;
                width: 100% !important;
                overflow: hidden !important;
              }
              .kg-image, .kg-image-card img, .kg-gallery-image img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                object-fit: contain !important;
              }
              .kg-image-card, .kg-gallery-card {
                max-width: 100% !important;
                width: 100% !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              .post-content, .gh-content {
                max-width: 100% !important;
                overflow-x: hidden !important;
              }
              div, p, span {
                max-width: 100% !important;
              }
            </style>
            <p>A sob story (<a href="https://www.yahoo.com/news/articles/police-used-ai-facial-recognition-100054218.html?ref=junkcharts.com">link</a>) about a grandmother in Tennessee raises a number of issues related to use of AI tools by law enforcement.</p><p>As reported, Angela Lipps was falsely accused of bank fraud near Fargo, North Dakota, a crime she did not commit. She was charged in ND, arrested and sat in jail in Tennessee, extradited to ND, and ultimately the charges were dismissed, and she was released after five months 🥲.</p><p>Lipps’s suffering was due to "misidentification". Her lawyer was able to produce bank records that proved that she was in Tennessee at the time of the crime. Tennessee is over 1,000 miles (1,600 km) away from North Dakota. In fact, according to Lipps, she doesn’t travel, has never even boarded an airplane, let alone ever went to North Dakota!</p><p>The situation ultimately got traced to the notorious face surveillance company, <a href="https://en.wikipedia.org/wiki/Clearview_AI?ref=junkcharts.com">Clearview AI</a>. This private company makes a business out of scraping images from social media and online sources, building a gigantic database that is used to “doxx” people.</p><p>Let’s dissect that bit by bit.</p><p>I deliberately use the word “doxx”. Doxxing is usually associated with someone publishing someone else’s personally identifiable information (PII, such as names and addresses) without that person’s consent, usually for the purpose of shaming, revenge, etc. According to Wiki (<a href="https://en.wikipedia.org/wiki/Doxing?ref=junkcharts.com">link</a>),  the U.S. have weak regulations in this area; only a few states consider doxxing illegal.</p><p>The contempt for doxxing appears to stop at the corporate door. Clearview’s raison d'être is doxxing on steroids. Its entire business is tagging images with people’s names (which naturally leads to other PII data, given the motivation of its customers). Instead of putting it up on social media, like a political activist might do, Clearview sells the information to someone willing to pay, also without that person’s consent, and worse, behind that person’s back. Government agencies are its primary customers.</p><p>According to the Wiki <a href="https://en.wikipedia.org/wiki/Clearview_AI?ref=junkcharts.com">page</a>, Clearview settled a lawsuit in 2022, agreeing not to sell to “private individuals and businesses.” But the linked CNN <a href="https://www.cnn.com/2022/05/09/tech/clearview-ai-aclu-settlement/index.html?ref=junkcharts.com">article</a> used the qualified phrase, “most companies in the United States”. The Lipps case obviously shows that government use of such a tool is not harmless.</p><p>The other key word is “scraping,” about which I wrote recently (<a href="https://www.junkcharts.com/the-emerging-ai-agents-war/">here</a>). Clearview is engaged in large-scale harvesting of images across multiple platforms; that is their central value proposition. For this business, they need as many images as possible, as recent as possible. Have you heard a peep from Facebook/Instagram, Twitter, Google, etc. about Clearview illicitly taking images from their properties? Neither have I. Years ago when the <em>New York Times</em> covered this company, they made some noise but there have been no lawsuits or enforcement actions that I’m aware of.</p><hr><p>The Lipps case is highly instructive, showing us how surveillance data can sometimes harm people. </p><!--members-only--><p>According to the police department that used Clearview to identify Lipps as the criminal, the true criminal had used Lipps’s picture on a fake ID. How did the true fraudster have access to Lipps’s picture? Most likely from social-media scraping!</p><p>Misidentification is evidently a misnomer. If we believe the police’s story, then Clearview correctly identified Lipps from the fake ID photo. The problem appears to be that they ran with that, without looking for collaborative evidence. A surveillance image actually existed that would have exonerated Lipps if it were inspected.</p><hr><p>The Lipps case also shines a light on the gray legal area in which these law enforcement agencies work. The Fargo jurisdiction did not have any AI for facial recognition; it then asked neighboring West Fargo police to help out, because they have Clearview. What’s to stop Clearview users from doxxing someone for non-official matters?</p><p>(By the way, Clearview’s sales team is probably knocking on the doors of the two police departments – because they have learned that users are sharing their Netflix accounts, so to speak.)</p><p>The proposed solution by the Fargo police is to route such identification requests to the North Dakota State and Local Intelligence Center, which has specific expertise in AI tools. </p><p>When I read that, I said to myself, I bet NDSLIC also uses Clearview, which would not have made a difference in the Lipps case. A quick search confirmed it. The West Fargo police chief defended his department (<a href="https://www.am1100theflag.com/episode/west-fargo-police-chief-defends-department-after-fargo-chiefs-ai-comments-14-mins-03-25-26/?ref=junkcharts.com">link</a>), saying exactly that: “[we] did send it to NDSLIC, which returned the exact same results using the identical Clearview AI software.”</p><p>In the U.S., there is so much involvement by private entities in these aspects of law enforcement that it becomes very hard to figure out if proper and legal process has been applied.</p><p>Undoubtedly, facial recognition technology has solved, and will continue to help solve, crimes. But just as surely, such technologies, under various, sometimes unexpected, circumstances, will result in innocent people being “harrassed,” and in Lipps’s case, thrown in jail for months. Where do we draw the line?</p><p>P.S. [4/1/2026] Other posts in the Know Your Data series are <a href="https://www.junkcharts.com/tag/know-your-data/" rel="noreferrer">here</a>.</p>
          ]]></content:encoded>
          <description><![CDATA[ AI will connect you to someone far far away ]]></description>
        </item>
        <item>
          <title><![CDATA[ Why only me? ]]></title>
          <link>https://www.junkcharts.com/why-only-me/</link>
          <guid isPermaLink="false">69caa4d86cac520001da9c72</guid>
          <category><![CDATA[ doping ]]></category>
          <dc:creator><![CDATA[ Kaiser Fung ]]></dc:creator>
          <pubDate>Tue, 31 Mar 2026 09:01:50 -0400</pubDate>
          <content:encoded><![CDATA[
            <style type="text/css">
              * { box-sizing: border-box !important; }
              img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                display: block !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              figure {
                margin: 0 !important;
                padding: 0 !important;
                max-width: 100% !important;
                width: 100% !important;
                overflow: hidden !important;
              }
              .kg-image, .kg-image-card img, .kg-gallery-image img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                object-fit: contain !important;
              }
              .kg-image-card, .kg-gallery-card {
                max-width: 100% !important;
                width: 100% !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              .post-content, .gh-content {
                max-width: 100% !important;
                overflow-x: hidden !important;
              }
              div, p, span {
                max-width: 100% !important;
              }
            </style>
            <p>The doping story won't stop. </p><p>A past winner of the New York Marathon has recently failed doping tests, and is being banned from competing for five years (<a href="https://apnews.com/article/new-york-marathon-kenya-korir-doping-banned-efa7761bd8cf34257833b14e39bf87ae?ref=junkcharts.com">link</a>). He tested positive for a new version of EPO, which was an emerging drug at the time I wrote <strong>Numbers Rule Your World (</strong><a href="https://amzn.to/2zssjj4?ref=junkcharts.com"><strong>link</strong></a><strong>)</strong>. In Chapter 4, I discussed what it really meant when Lance Armstrong, at the time the GOAT cyclist, said he passed hundreds of doping tests in his career. Embarrassingly, statistics instructors at the time were comparing doping tests to mammograms, which are notorious for the amount of false positives they generate. I showed why false negatives are the real problem – this all happened before Armstrong's downfall.</p><p>Albert Korir, from Kenya, is a star. In addition to winning the New York Marathon in 2021, he placed second in 2019 and 2023, and third in 2024 and 2025. </p><p>Given that he passed all tests (a la Lance Armstrong) in those past years, he is only stripped of honors since October 2025. Sadly, his third-place finish in 2025 is no longer. </p><p>The only surprise is that he admitted the offence, and received a one-year reduction in penalty. He didn't say he ate any contaminated beef, or used his sick father's spoon, or drank from someone else's water bottle. </p><p>Of drug testing, for every athlete caught doping, there are many more who elude detection. Indeed, for every athlete caught doping, plenty of prior tests of the same athlete had came back negative. </p><p>Or, you can be the person who believe that the first time these athletes crossed the doping line, they got caught red-handed.</p>
          ]]></content:encoded>
          <description><![CDATA[ At least he didn&#39;t make up a story. ]]></description>
        </item>
        <item>
          <title><![CDATA[ Guide to using pairs of circles ]]></title>
          <link>https://www.junkcharts.com/guide-to-using-pairs-of-circles/</link>
          <guid isPermaLink="false">69b5d14f22e3d400014705ed</guid>
          <category><![CDATA[ Bubble chart ]]></category>
          <dc:creator><![CDATA[ Kaiser Fung ]]></dc:creator>
          <pubDate>Tue, 24 Mar 2026 08:52:50 -0400</pubDate>
          <content:encoded><![CDATA[
            <style type="text/css">
              * { box-sizing: border-box !important; }
              img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                display: block !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              figure {
                margin: 0 !important;
                padding: 0 !important;
                max-width: 100% !important;
                width: 100% !important;
                overflow: hidden !important;
              }
              .kg-image, .kg-image-card img, .kg-gallery-image img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                object-fit: contain !important;
              }
              .kg-image-card, .kg-gallery-card {
                max-width: 100% !important;
                width: 100% !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              .post-content, .gh-content {
                max-width: 100% !important;
                overflow-x: hidden !important;
              }
              div, p, span {
                max-width: 100% !important;
              }
            </style>
            <p>In my last <a href="https://www.junkcharts.com/getting-metrics-right-is-half-the-battle/">post</a>, I featured the following student project that uses nested circles to compare pairs of data. </p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/03/rayvellanyu_alexisduhaney_2-2.png" class="kg-image" alt="" loading="lazy" width="1450" height="1446" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/2026/03/rayvellanyu_alexisduhaney_2-2.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/2026/03/rayvellanyu_alexisduhaney_2-2.png 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/03/rayvellanyu_alexisduhaney_2-2.png 1450w" sizes="(min-width: 720px) 720px"></figure><p>The underlying data are measures of change in wealth over time, specifically, a 15-year period (2000-2015). In each pair, one circle represents the "rich" and the other circle represents the "poor".  So, for each country, there are two numbers being compared. For most countries, since the rich is getting richer, the "rich" circle is the larger one.</p><hr><p>I find it useful to start by looking at the "boring" way of presenting the same concept, using side-by-side <a href="https://www.junkcharts.com/tag/bar-chart/" rel="noreferrer">bar charts</a>.</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/03/kfung_junkcharts_redo_economist_wealthgap.png" class="kg-image" alt="" loading="lazy" width="988" height="1084" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/2026/03/kfung_junkcharts_redo_economist_wealthgap.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/03/kfung_junkcharts_redo_economist_wealthgap.png 988w" sizes="(min-width: 720px) 720px"></figure><p>This dataset contains certain identifying features, due to how the <em>Economist</em> chose to define wealth disparity. Each number is the relative wealth of the rich (or poor), relative to the national average (=100), in each year. Because of the skewness of the wealth distribution, the numbers for the rich are usually quite a bit larger than those for the poor; it follows that the numbers for the change in wealth are also larger than those for the poor. In fact, the change in wealth for the poor is typically negative: if the rich are running higher, the poor should be falling lower! In each year, the average is pinned to zero. An exception is if the middle class lags while both ends of the distribution gain.</p><p>The circular version separates the direction and magnitude of the data: the circular areas encode the absolute values of wealth changes, while the <a href="https://www.junkcharts.com/tag/color/" rel="noreferrer">colors</a> show the direction of change (up or down).</p><hr><p>In this post, I explore a few design decisions when making such circular charts:</p><ul><li>sizing individual circles,</li><li>handling direction and magnitude,</li><li>determining relative sizes of circle pairs.</li></ul><p>The basics first. Since the data are encoded in the areas of the circles, and the area of a circle is proportional to the square of its radius, we usually have to feed the square-root of the data to the plotting software.</p><hr><p>Let's take a generic pair (A, B). There are three possible relationships between A and B: A&gt;B, A&lt;B, and A=B.</p><p>The strict inequalities can be simply accommodated:</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/03/kfung_junkcharts_pairedcircles_unequal.png" class="kg-image" alt="" loading="lazy" width="700" height="460" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/2026/03/kfung_junkcharts_pairedcircles_unequal.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/03/kfung_junkcharts_pairedcircles_unequal.png 700w"></figure><p>The case of equality disturbs the peace. When A=B, the two circles have the same areas; they completely overlap.  </p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/03/kfung_junkcharts_pairedcircles_equal.png" class="kg-image" alt="" loading="lazy" width="494" height="474"></figure><p>One way out of this problem is to assert that the case of A=B is sufficiently rare as to be ignorable. I'd be willing to accept such an assumption in the case of the wealth inequality dataset.</p><p>If such an assertion is not supported, then a more creative solution is needed. For example, put them side by side. </p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/03/kfung_junkcharts_pairedcircles_sidebyside.png" class="kg-image" alt="" loading="lazy" width="1148" height="500" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/2026/03/kfung_junkcharts_pairedcircles_sidebyside.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/2026/03/kfung_junkcharts_pairedcircles_sidebyside.png 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/03/kfung_junkcharts_pairedcircles_sidebyside.png 1148w" sizes="(min-width: 720px) 720px"></figure><hr><p>An additional complication arises when the data contain both positive and negative values, which is the situation with the change in wealth data. </p><p>As shown below, we have six feasible configurations, requiring two colors plus two tinges, coupled with which circle is larger.</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/03/kfung_junkcharts_pairedcircles_colorconfig6.png" class="kg-image" alt="" loading="lazy" width="1076" height="597" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/2026/03/kfung_junkcharts_pairedcircles_colorconfig6.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/2026/03/kfung_junkcharts_pairedcircles_colorconfig6.png 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/03/kfung_junkcharts_pairedcircles_colorconfig6.png 1076w" sizes="(min-width: 720px) 720px"></figure><!--members-only--><p>In each configuration, which circle is larger is immediately apparent. Then, the tinge signals whether the individual element (A or B) has positive or negative sign. In our dataset, the tinge signals either gain or loss in relative index over time. </p><p>Alexis, the student who made the featured chart, simplifies the situation, as she applies <a href="https://www.junkcharts.com/tag/color/" rel="noreferrer">color</a> to the gap in wealth changes (i.e., \(A - B\)), rather than the wealth changes themselves (A, B). Thus, there is only one value, and one corresponding color, per pair of circles.</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/03/kfung_junkcharts_pairedcircles_colorconfi2.png" class="kg-image" alt="" loading="lazy" width="1079" height="589" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/2026/03/kfung_junkcharts_pairedcircles_colorconfi2.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/2026/03/kfung_junkcharts_pairedcircles_colorconfi2.png 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/03/kfung_junkcharts_pairedcircles_colorconfi2.png 1079w" sizes="(min-width: 720px) 720px"></figure><p>The larger circle is given a fixed color (blue here). The color of the smaller circle is the direction of the difference in wealth changes between the rich and the poor – in other words, the direction of the wealth gap. </p><p>The simplicity is achieved by giving up the ability to distinguish between the various cases shown above. We go from six possibilities to two. </p><hr><p>In Alexis's chart, all circles conform to an unspoken single <a href="https://www.junkcharts.com/tag/scale/" rel="noreferrer">scale</a>, aligned to the 2015 relative wealth index for the rich. </p><p>This represents a third dimension. The pair of circles shows the wealth <em>changes</em> of the rich and the poor. The designer has freedom to choose what to use for this third dimension. This is not a decision available for the standard bar chart presentation. </p><p>The following illustrates the effect of introducing a third dimension. The top set of circles does not utilize the third dimension while the bottom set of circles does. </p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/03/kfung_junkcharts_pairedcircles_thirddimension.png" class="kg-image" alt="" loading="lazy" width="1486" height="898" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/2026/03/kfung_junkcharts_pairedcircles_thirddimension.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/2026/03/kfung_junkcharts_pairedcircles_thirddimension.png 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/03/kfung_junkcharts_pairedcircles_thirddimension.png 1486w" sizes="(min-width: 720px) 720px"></figure><p>In the top row, the focus is within-country variation. In Japan as well as Spain, both the rich and the poor shifted in the same direction between 2000 and 2015, and the magnitude of the shift of the poor was roughly half that of the rich. In the United States, the rich got richer while the poor got poorer. The wealth change for  rich Americans was roughly 20 times that for the poor. </p><p>In the bottom row, the sizes of the circles for Japan and Spain are all aligned with those for USA. Both within-country and between-country variations are present. </p><p>It's up to the designer to figure out whether, and how, to utilize this third dimension.</p><p> </p>
          ]]></content:encoded>
          <description><![CDATA[ A discussion of some design decisions ]]></description>
        </item>
        <item>
          <title><![CDATA[ Getting metrics right is half the battle ]]></title>
          <link>https://www.junkcharts.com/getting-metrics-right-is-half-the-battle/</link>
          <guid isPermaLink="false">69b5c63f22e3d40001470571</guid>
          <category><![CDATA[ measurement ]]></category>
          <dc:creator><![CDATA[ Kaiser Fung ]]></dc:creator>
          <pubDate>Wed, 18 Mar 2026 08:01:29 -0400</pubDate>
          <content:encoded><![CDATA[
            <style type="text/css">
              * { box-sizing: border-box !important; }
              img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                display: block !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              figure {
                margin: 0 !important;
                padding: 0 !important;
                max-width: 100% !important;
                width: 100% !important;
                overflow: hidden !important;
              }
              .kg-image, .kg-image-card img, .kg-gallery-image img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                object-fit: contain !important;
              }
              .kg-image-card, .kg-gallery-card {
                max-width: 100% !important;
                width: 100% !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              .post-content, .gh-content {
                max-width: 100% !important;
                overflow-x: hidden !important;
              }
              div, p, span {
                max-width: 100% !important;
              }
            </style>
            <p>I've had a chance to look at some class projects from Ray Vella's NYU class recently. I've featured work from prior classes on this blog before (<a href="https://www.junkcharts.com/tag/ray-vella/" rel="noreferrer">link</a>).</p><p>The project objective is to improve a chart on income inequality published by the Economist.</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/03/economist_richgetricher.png" class="kg-image" alt="" loading="lazy" width="666" height="860" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/2026/03/economist_richgetricher.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/03/economist_richgetricher.png 666w"></figure><p>This term, I was most interested in two solutions that involve rethinking how inequality should be measured.</p><p>One of the most confusing parts of the Economist original chart is the unit of measuring inequality. You see the data labels 848 and 1,150 up top, and it's puzzling what those numbers mean. The subtitle claims the data concern "GDP per person": are they expressed in British pounds? 848 pounds would be too little, but 848,000 pounds could be too high, but then the blue dots represent "the rich," and if they mean the "super-rich," it might not be large enough.</p><p>The answer is None of the above. Following the asterisk, the reader learns that the GDP per capita data have been adjusted. You'd have to know enough about economics data to see that "at purchasing-power parity" implies that all values are US dollars.</p><p>Swapping pounds for dollars, I'm still perplexed in much the same way. No text suggests that I should add 000 to these units. That's when one has to return to the titles, and notice "National average = 100".</p><p>The plotted data are evidently index values, with the national average set to 100. 848 is 8.48 times the national average while 1,150 is 11.5 times the national average. </p><p>Strictly speaking, the plotted data reflect the values relative to the national average in each year. The national average for the U.K. is 100 in both 2000 and 2015 even though the average in pounds for 2015 is surely quite a bit higher than the average in pounds for 2000.</p><p>The point of taking you down this dark tunnel is to demonstrate how much work it takes to explain to the reader how the designer has transformed the data, and to convince you, hopefully, not to venture into the dark side.</p><p>[To really explain it fully, I'd need another blog post because the above description is still missing one important piece.]</p><hr><p>The first of the student work, by Thomas Carlson, reverts to more conventional measures of income inequality. (He submitted several views, one of which I'm discussing here.)</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/03/rayvellanyu_tcarlson_1.png" class="kg-image" alt="" loading="lazy" width="1228" height="712" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/2026/03/rayvellanyu_tcarlson_1.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/2026/03/rayvellanyu_tcarlson_1.png 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/03/rayvellanyu_tcarlson_1.png 1228w" sizes="(min-width: 720px) 720px"></figure><p>The rich is represented by "top 10%" average pre-tax income; the poor, by "bottom 50%" average pre-tax income. This pair of metrics is much easier to understand. The poor are the bottom half of the distribution, i.e. below the national median.</p><p>The vertical axis shows percentages. Instead of showing what their average income is in dollars, which is also hard to interpret without comparisons, he's showing what proportion of the nation's wealth each group comprises. Taking USA (on the far right) for example, the top 10% of Americans hold almost half of the nation's wealth while the bottom half of Americans own just 12%. (By inference, the other 40% of the population have roughly 40% of the wealth. An interesting symmetry is revealed in this data.)</p><p>By Thomas's metric, the U.K. is not particularly remarkable. This underlines the point that units of measurement/definitions of metrics matter a lot. </p><p>Thomas improved the chart a lot by addressing what I call a Type D problem, in the style of the Trifecta Checkup (featured in this recent <a href="https://www.junkcharts.com/get-your-automated-junk-charts-clone/">post</a>). He also changed the chart form (a Type V problem). </p><hr><p>From a visual perspective, the most striking effort was the work by Alexis Duhaney.</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/03/rayvellanyu_alexisduhaney_2-1.png" class="kg-image" alt="" loading="lazy" width="1450" height="1446" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/2026/03/rayvellanyu_alexisduhaney_2-1.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/2026/03/rayvellanyu_alexisduhaney_2-1.png 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/03/rayvellanyu_alexisduhaney_2-1.png 1450w" sizes="(min-width: 720px) 720px"></figure><p>She used circular areas instead of bars to visualize the data. Each country's situation is depicted by two numbers, one each for the rich and the poor. Each number is a pre-post change in the underlying measure of wealth. </p><p>The underlying metric is taken from the original Economist chart, so it would take a day and a half to explain it to readers. But the  instinct of expressing a pre-post change is a sharp one. One can switch to Thomas's metrics, as an example. As such, the underlying metric becomes the change in share of pre-tax national income.</p><p>The visual appeal of this circular design is beyond question. Switching to circles introduces a whole set of issues, which I'll cover in a separate <a href="https://www.junkcharts.com/guide-to-using-pairs-of-circles/">post</a>.</p><p>P.S. [3/24/26] Added link to follow-up <a href="https://www.junkcharts.com/guide-to-using-pairs-of-circles/">post</a>. </p><p></p>
          ]]></content:encoded>
          <description><![CDATA[ Learning from some student projects ]]></description>
        </item>
        <item>
          <title><![CDATA[ The emerging AI agents war ]]></title>
          <link>https://www.junkcharts.com/the-emerging-ai-agents-war/</link>
          <guid isPermaLink="false">69b25243a0407b0001c02d05</guid>
          <category><![CDATA[ AI ]]></category>
          <dc:creator><![CDATA[ Kaiser Fung ]]></dc:creator>
          <pubDate>Fri, 13 Mar 2026 08:29:53 -0400</pubDate>
          <content:encoded><![CDATA[
            <style type="text/css">
              * { box-sizing: border-box !important; }
              img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                display: block !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              figure {
                margin: 0 !important;
                padding: 0 !important;
                max-width: 100% !important;
                width: 100% !important;
                overflow: hidden !important;
              }
              .kg-image, .kg-image-card img, .kg-gallery-image img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                object-fit: contain !important;
              }
              .kg-image-card, .kg-gallery-card {
                max-width: 100% !important;
                width: 100% !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              .post-content, .gh-content {
                max-width: 100% !important;
                overflow-x: hidden !important;
              }
              div, p, span {
                max-width: 100% !important;
              }
            </style>
            <p>Mark Palko sent me news that Amazon obtained an injunction against Perplexity's shopping bot (<a href="https://www.cnbc.com/2026/03/10/amazon-wins-court-order-to-block-perplexitys-ai-shopping-agent.html?ref=junkcharts.com" rel="noreferrer">link</a>). </p><p><a href="https://www.perplexity.ai/?ref=junkcharts.com" rel="noreferrer">Perplexity</a> is best known as a pioneer of AI-assisted web search, a product that I'd  confidently say will find its market. It will succeed not because it delivers better search results, but because it offers a far more natural, far simpler user experience.</p><p>The recent news concerns something else – Perplexity's shopping bot that browses around and shop for things on behalf of users. This shopping bot is an example of an "AI agent," a term you must have heard of if you follow any <a href="https://www.junkcharts.com/tag/webtech/" rel="noreferrer">tech</a> news.</p><p>First, here's an incidental demonstration of my point about AI search.</p><p>Doing researching for this post, I typed keywords like "Perplexity shopping with Comet" in a traditional search engine, yielding pages upon pages of recent pieces about Amazon's lawsuit, despite my deliberate omission of the word "Amazon" or anything legal. Using an AI chatbot, with a prompt like "I want to find links to articles that introduce the Comet browser shopping feature offered by Perplexity starting last year. I don't want recent links about Amazon's lawsuit", I got exactly what I wanted. Here's a <a href="https://www.envive.ai/post/how-perplexity-comet-will-change-agentic-commerce?ref=junkcharts.com">link</a> to an article about the Comet shopping feature. (It's an "ad" by a company in the AI agent space, which is a different issue altogether.)</p><p>Based on currently available products, an AI agent is an automated workflow. In the article linked above, Perplexity Comet's edge is said to be: </p><blockquote>Instead of waiting for your next search query, it actively completes tasks, negotiates purchases, and automates shopping workflows that previously required dozens of manual steps.</blockquote><p>In the course of online shopping, one might start with an idea of what to buy. Then, one might find articles written about the "best" items in that category, noting pros and cons, and prices. One might then shortlist some options, and pick one. Then, one might select a retailer that sells the selected item, figure out its shipping and return policies, and if satisfactory, complete the transaction.</p><p>Perplexity's Comet browser does all these tasks:</p><blockquote>The AI agent can auto-fill forms, conduct multi-site research, aggregate reviews, compare pricing, and — critically for commerce — initiate and complete purchase transactions.</blockquote><p>It's time to introduce the naughty word: "scraping". This is the crux of Amazon's grievance.</p><hr><p>In order for Comet (or any other AI agent) to fulfill those tasks, it must navigate around websites, extract data from webpages, analyze the data, and make decisions. Extracting data from webpages is the well-known activity known as "web scraping". </p><p>Web scraping is a strange beast. It has no reason to exist and yet, it's everywhere. When the data science field was created some 15 years ago, a common starting point of a textbook teaching Python is <a href="https://www.junkcharts.com/tag/web-scraping/" rel="noreferrer">web scraping</a>. Open up some webpage, and grab the data on the page. </p><p>Imagine you're the owner of a small on-line seller of widgets. An engineer at a competitor writes a web scraper to compile a database of the products you sell, and the prices. This scraper browses your website page by page, extracting the product and pricing information. </p><p>As the owner, you either consider your product pricing catalog confidential or public information. </p><p>Most retailers treat it as trade secret – if you take a notepad and start jotting down every product and price in a Walmart or Target, you'd most likely be stopped. Ditto, most online retailers deploy technology to detect and block web scrapers, typically by refusing to serve them webpages (403 errors). These retailers act as if the information presented publicly is protected. This stance has led to an arms race, as developers work around the anti-scraping tech. Regardless of one's view on this, we can agree that if the retailers treat their product catalogs as trade secrets, anyone trying to scrape the data is acting against the retailers' wishes, and may face legal jeopardy. (I never understood why college professors taught web scraping as the first example of a python script.)</p><p>Alternatively, some retailers might view their product pricing data as public information, so that they are okay with third-party access. In this world, web scraping bears no legal risk, but it is a poor technical solution nonetheless. The proper approach is to create APIs so that developers can register themselves and request the data they want in an open, orderly fashion. </p><p>All retailers have databases that hold their product and pricing data. Their websites grab data from these databases, and present them in nice formats to customers. Web scraping code grabs the data, together with the layers of formatting, spread out across hundreds and thousands of pages, and then removes the packaging, and merges the page-level data, restoring the structure of data. If successful, the output of the web scraper is similar to what the retailers hold in their databases! In practice, it's an inexact copy of the retailers' databases, riddled with errors. If these retailers consent to sharing the data, there are better ways to organize the data exchange.</p><p>Whether the retailers condone or condemn web scraping, there is still no reason to use it. </p><hr><p>The emergence of AI agents brings this touchy subject to the forefront. The only way shopping bots can function is if they are allowed to browse around websites, collecting data. If Amazon's lawsuit succeeds, it kills not only Perplexity's bot, but also all others. </p><p></p><p></p>
          ]]></content:encoded>
          <description><![CDATA[ Amazon kicks it off ]]></description>
        </item>
        <item>
          <title><![CDATA[ Another test of self-sufficiency ]]></title>
          <link>https://www.junkcharts.com/another-test-of-self-sufficiency/</link>
          <guid isPermaLink="false">69af239c75c90a00015d7bb3</guid>
          <category><![CDATA[ self-sufficiency test ]]></category>
          <dc:creator><![CDATA[ Kaiser Fung ]]></dc:creator>
          <pubDate>Tue, 10 Mar 2026 08:09:44 -0400</pubDate>
          <content:encoded><![CDATA[
            <style type="text/css">
              * { box-sizing: border-box !important; }
              img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                display: block !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              figure {
                margin: 0 !important;
                padding: 0 !important;
                max-width: 100% !important;
                width: 100% !important;
                overflow: hidden !important;
              }
              .kg-image, .kg-image-card img, .kg-gallery-image img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                object-fit: contain !important;
              }
              .kg-image-card, .kg-gallery-card {
                max-width: 100% !important;
                width: 100% !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              .post-content, .gh-content {
                max-width: 100% !important;
                overflow-x: hidden !important;
              }
              div, p, span {
                max-width: 100% !important;
              }
            </style>
            <p>I came across this infographic by the Swiss paper, <a href="https://www.nzz.ch/?ref=junkcharts.com" rel="noreferrer">Neuen Zürcher Zeitung</a>, on my Linkedin feed (thanks, Markus Ikehata).</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/03/marcusikehata_swisselections.jpg" class="kg-image" alt="" loading="lazy" width="800" height="1171" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/2026/03/marcusikehata_swisselections.jpg 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/03/marcusikehata_swisselections.jpg 800w" sizes="(min-width: 720px) 720px"></figure><p>The piece is done in German, a language I don't speak. So reading this data visualization is like applying a "self-sufficiency test". That's the test I use to determine how much work the visual elements of a chart are doing to convey insights (as opposed to text and numbers). </p><hr><p>I'll now document what I've learned from reading the just the visual elements of the infographic. Feel free to correct any errors in the comments below.</p><p>The first component is a semi-circle chart, that is quite canonical when it comes to representing parliaments. </p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/03/nnz_elections_semicircle.png" class="kg-image" alt="" loading="lazy" width="838" height="662" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/2026/03/nnz_elections_semicircle.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/03/nnz_elections_semicircle.png 838w" sizes="(min-width: 720px) 720px"></figure><p>I can count seven major parties represented by what I'd presume their typical party colors. The deep orange party (SP) has the plurality although a coalition of possibly three parties are needed to claim a majority.</p><p>The second chart is a side-by-side <a href="https://www.junkcharts.com/tag/bar-chart/" rel="noreferrer">bar chart</a>. </p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/03/nnz_elections_barchart.png" class="kg-image" alt="" loading="lazy" width="844" height="592" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/2026/03/nnz_elections_barchart.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/03/nnz_elections_barchart.png 844w" sizes="(min-width: 720px) 720px"></figure><p>This chart uses the plurality party (deep orange) as the anchor, and is designed to display the importance of the other parties relative to it. The second largest party has a bit more than half the number of seats as the first party. </p><p>Curiously, this chart uses nine <a href="https://www.junkcharts.com/tag/color/" rel="noreferrer">colors</a> while the semi-circle chart has seven. On further inspection, the first seven bars correspond directly to the seven parties shown in the semi-circle. EVP (yellow) is omitted from the first chart: a mystery. The gray bar shows "others," as I came to realize below. (I also confirmed that "andere" is German for "others".)</p><p>Now, the third chart is a <a href="https://www.junkcharts.com/tag/line-chart/" rel="noreferrer">line chart</a>.</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/03/nnz_elections_linechart.png" class="kg-image" alt="" loading="lazy" width="836" height="542" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/2026/03/nnz_elections_linechart.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/03/nnz_elections_linechart.png 836w" sizes="(min-width: 720px) 720px"></figure><p>This chart shows a <a href="https://www.junkcharts.com/tag/time-series/" rel="noreferrer">trend</a> from 1970 to today. It has eight <a href="https://www.junkcharts.com/tag/color/" rel="noreferrer">colors</a>. This is how I figured out that the omission of the gray bar is different from that of the yellow bar. </p><!--members-only--><p>I'm scratching my head, trying to reconcile the line chart with the semi-circle. This chart shows the plurality party (orange) as mostly hovering near the zero level, with the two green lines clearly above it. I can see that all lines start at zero in 1970, thus, it displays an index relative to that year. I gather that the deep orange party (SP) has always been strong, and has maintained its number of seats throughout the decades. Meanwhile, the parties represented by the green lines (SVP, Grüne) have been gaining seats in the recent past. </p><p>Something happened in 2002 that merited a footnote. </p><p>Finally, on the right side, we have two columns of <a href="https://www.junkcharts.com/tag/map/" rel="noreferrer">maps</a>. (I have altered the shape of the grid below to save space.)</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/03/nnz_elections_maps.png" class="kg-image" alt="" loading="lazy" width="1448" height="1162" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/2026/03/nnz_elections_maps.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/2026/03/nnz_elections_maps.png 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/03/nnz_elections_maps.png 1448w" sizes="(min-width: 720px) 720px"></figure><p>This is a <a href="https://www.junkcharts.com/tag/small-multiples/" rel="noreferrer">small-multiples</a> presentation of the the same eight parties, without the gray ("others") category.</p><p>They show the relative strength of each party across nine regions of Switzerland. Be attentive to the <a href="https://www.junkcharts.com/tag/scale/" rel="noreferrer">scales</a>. What's the question answered by this chart? It's the geographical distribution of the strength of each party. It's best to interpret each map as a separate entity.</p><p>We shouldn't fixate on one region, and compare the shades of color, to understand the relative strengths of each party in a given region. That's because every map has its own color scale, adapted to the range of data in each map. For example, in the region indicated by the red arrow below, EVP has maximum strength (value of 4.9 according to the legend) while FDP is weak. Nevertheless, "weak" by FDP standard is still significantly more than 4.9.</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/03/junkcharts_nnz_elections_readingmaps.png" class="kg-image" alt="" loading="lazy" width="1022" height="696" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/2026/03/junkcharts_nnz_elections_readingmaps.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/2026/03/junkcharts_nnz_elections_readingmaps.png 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/03/junkcharts_nnz_elections_readingmaps.png 1022w" sizes="(min-width: 720px) 720px"></figure><hr><p>In conclusion, this NNZ effort passes the <a href="https://www.junkcharts.com/tag/sufficiency/" rel="noreferrer">self-sufficiency test</a> comfortably. Even though I haven't read any of the text, I'm still able to learn a lot about the Swiss elections. The text assists the reader but the visual elements are self-sufficient.</p><p>(If you see any misinterpretation, please make a comment below. I hope I don't have to overturn my own conclusion 😃 )</p>
          ]]></content:encoded>
          <description><![CDATA[ Reading an infographic of Swiss elections ]]></description>
        </item>
        <item>
          <title><![CDATA[ Get your automated Junk Charts clone ]]></title>
          <link>https://www.junkcharts.com/get-your-automated-junk-charts-clone/</link>
          <guid isPermaLink="false">69a880083338930001d40ac7</guid>
          <category><![CDATA[ AI ]]></category>
          <dc:creator><![CDATA[ Kaiser Fung ]]></dc:creator>
          <pubDate>Wed, 04 Mar 2026 15:05:20 -0500</pubDate>
          <content:encoded><![CDATA[
            <style type="text/css">
              * { box-sizing: border-box !important; }
              img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                display: block !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              figure {
                margin: 0 !important;
                padding: 0 !important;
                max-width: 100% !important;
                width: 100% !important;
                overflow: hidden !important;
              }
              .kg-image, .kg-image-card img, .kg-gallery-image img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                object-fit: contain !important;
              }
              .kg-image-card, .kg-gallery-card {
                max-width: 100% !important;
                width: 100% !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              .post-content, .gh-content {
                max-width: 100% !important;
                overflow-x: hidden !important;
              }
              div, p, span {
                max-width: 100% !important;
              }
            </style>
            <p>Prof. Hill at Samford University posted several class projects to Linkedin, which is more than amusing. The students made clones of Junk Charts critics, using the Junk Charts <a href="https://www.junkcharts.com/junk-charts-trifecta-checkup-the-definitive-guide/" rel="noreferrer">Trifecta Checkup</a> framework. </p><p>Professor Chartworth (<a href="https://claude.ai/public/artifacts/76a9c843-c658-483a-b658-4f9d68384df1?ref=junkcharts.com" rel="noreferrer">link</a>), by Mary Hodges, is cheeky and irreverent. I threw in the <a href="https://www.junkcharts.com/a-cheesy-graphic/" rel="noreferrer">cheese</a> chart that recently showed up on the blog.</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/03/jorgecamoes_eurostatcheeses.jpg" class="kg-image" alt="" loading="lazy" width="1280" height="1599" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/2026/03/jorgecamoes_eurostatcheeses.jpg 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/2026/03/jorgecamoes_eurostatcheeses.jpg 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/03/jorgecamoes_eurostatcheeses.jpg 1280w" sizes="(min-width: 720px) 720px"></figure><p>The AI (it's Claude behind the scenes) gave this answer:</p><blockquote>📋 1. WHAT IS THE QUESTION?!</blockquote><blockquote>*adjusts glasses frantically* Oh, OH! Finally, FINALLY a chart that actually answers a clear question! Who are the top cheese producers in the EU? It's right there in the title! *wipes tear* I'm... I'm not used to this level of basic competence. Is this what normalcy feels like? I need to sit down.</blockquote><blockquote>📊 2. WHAT DOES THE DATA SAY?!</blockquote><blockquote>*nervous laughter* So Germany makes 2.4 million tonnes and gets 23%, France makes 1.9 million and gets 18%, and Italy makes 1.4 million and gets 13%... BY TUFTE'S GHOST, THESE THREE COUNTRIES ONLY ACCOUNT FOR 54% OF EU PRODUCTION! *screams internally* WHERE ARE THE OTHER 46%?! The Netherlands is probably sobbing in a corner with their Gouda, completely excluded from this cheese triumvirate! This is like showing a podium but forgetting to mention there's an entire stadium of competitors we're ignoring!</blockquote><blockquote>🎨 3. THE VISUAL CRIMES!</blockquote><blockquote>WHO APPROVED THESE PROPORTIONALLY INCORRECT CHEESE WHEELS?! *flips table* The German cheese wheel is MAYBE twice the visual area of Italy's, but it should be 1.7x based on the actual tonnage! In my 47 sleepless nights of measuring cheese illustrations with digital calipers, I've never seen such a flagrant violation of the sacred principle of proportional representation! And don't get me STARTED on how they're all the same shade of yellow-orange like some kind of processed American singles fever dream—where's the visual distinction?! It's like they asked an AI to draw 'generic cheese' and called it a day! *collapses dramatically* The only thing saving this from complete disaster is that at least—AT LEAST—they labeled the actual numbers clearly. Small mercies in this dairy-based nightmare.</blockquote><p>Not bad, eh? (Compare to my writeup <a href="https://www.junkcharts.com/a-cheesy-graphic/" rel="noreferrer">here</a>.)</p><p>Professor Chartworth does a great job maintaining a certain style. The content is better than I could have expected. Call me impressed.</p><hr><p>The framework can also be used seriously, as demonstrated by Larkin Camp's project (<a href="https://claude.ai/public/artifacts/42b84a0d-cf58-4508-84a1-13a993edb8f1?ref=junkcharts.com" rel="noreferrer">link</a>). I test-ran it using this New York Times production:</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/03/nyt_internationalstudents_bubblebag.png" class="kg-image" alt="" loading="lazy" width="1302" height="1438" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/2026/03/nyt_internationalstudents_bubblebag.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/2026/03/nyt_internationalstudents_bubblebag.png 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/03/nyt_internationalstudents_bubblebag.png 1302w" sizes="(min-width: 720px) 720px"></figure><p>This no-nonsense critic implements my "type" classification, which pinpoints the directions of improvement. It also provides a type summary, like this:</p><blockquote>Critique Type: Type V</blockquote><blockquote>The visual design needs improvement, but question and data are good.</blockquote><blockquote>This visualization addresses a clear question with appropriate data but suffers from poor visual design choices. The beeswarm plot prioritizes aesthetics over functionality, making it nearly impossible to identify specific countries or understand patterns without extensive interaction. The design would benefit from a simpler, more readable format that allows viewers to quickly grasp both individual country performance and overall trends.</blockquote><p>It correctly recognizes the "beeswarm" design, and its weaknesses. </p><hr><p>Prof. Hill also features Hudson Hyneman's application. Check it out <a href="https://claude.ai/public/artifacts/2e4a5628-d309-47ca-b1c0-9bc678a1245f?ref=junkcharts.com" rel="noreferrer">here</a>.</p>
          ]]></content:encoded>
          <description><![CDATA[ I&#39;ll now haunt you every minute of every day ]]></description>
        </item>
        <item>
          <title><![CDATA[ Numbersense in sports commentary ]]></title>
          <link>https://www.junkcharts.com/numbersense-in-sports-commentary/</link>
          <guid isPermaLink="false">69a06ff75fad5000018a89ba</guid>
          <category><![CDATA[ sports analytics ]]></category>
          <dc:creator><![CDATA[ Kaiser Fung ]]></dc:creator>
          <pubDate>Mon, 02 Mar 2026 09:03:37 -0500</pubDate>
          <content:encoded><![CDATA[
            <style type="text/css">
              * { box-sizing: border-box !important; }
              img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                display: block !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              figure {
                margin: 0 !important;
                padding: 0 !important;
                max-width: 100% !important;
                width: 100% !important;
                overflow: hidden !important;
              }
              .kg-image, .kg-image-card img, .kg-gallery-image img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                object-fit: contain !important;
              }
              .kg-image-card, .kg-gallery-card {
                max-width: 100% !important;
                width: 100% !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              .post-content, .gh-content {
                max-width: 100% !important;
                overflow-x: hidden !important;
              }
              div, p, span {
                max-width: 100% !important;
              }
            </style>
            <p>Even though data and analytics are part and parcel of modern sports, it's still  jarring to hear sports broadcasters invoke common statistical fallacies.</p><p>During an overtime period in the recent Champions League match between Italy's Juventus and Turkey's Galatasaray, one commentator attacked the Turkish team's strategy (at ~97 minutes mark, when neither side had yet scored in overtime):</p><blockquote>They  [Galatasaray] showed you absolutely the way to not go about protecting a three-goal cushion tonight. From the very start, they never really played enough football. They were more content with trying to stop the game, break the game up, slow the game down.</blockquote><p>The background: the two teams were taking part in a two-match playoff for a spot in the Round of 16; in the first match, played the week before, Galatasaray seized a 3-goal lead, which in football terms, is considered a massive advantage; and yet, on home soil, Juventus netted three goals in regular time, leveling the aggregate score (all of that despite playing ten vs eleven since the 49th minute).</p><p>According to this broadcaster, the outcome proved the Turkish side's strategy wrong. Instead of a conservative strategy of "slowing the game down," the Turkish side should have – I don't know what his unspoken alternative strategy would have been – treated it as if they did not have a 3-0 lead? Take risks trying to pad the goal difference while leaving gaps in the defense?</p><p>The commentary reflects the classic "outcome bias" fallacy of evaluating a strategy based on the realized outcome, not on what information was available at the time of the decision. </p><p>Imagine a lottery with just two players, paying out $100,000 for bets of $100. Using the aforementioned flawed logic, the loser should not have played in the first place; simultaneously, the winner obviously made the right decision to participate. However, at the time either makes the decision, they possess the same information so either both join or neither. You can't have it both ways.</p><hr><p>Galatasaray ultimately scored two goals in overtime to join the Round of 16. The broadcaster didn't take back what he said earlier. The ultimate outcome should have confirmed the wisdom of the original conservative strategy, no?</p><hr><p>Notably, at the start of the broadcast, the hosts cited some damning statistics: in the history of the Champions League, at this stage of the competition, we were told that out of 49 teams that were down by three or more goals after the first match, only four managed to overcome the deficit and advance to the next round. That's a probability of 8%. (With Juventus's loss, it's four out of 50.)</p><p>It would be amusing to analyze those 50 matches, and check how many of the teams that were leading after the first match deployed conservative tactics, and how they fared, relative to those that didn't.</p><p>P.S. [3/2/26] Corrected a typo. Clarified that Galatasaray didn't win the second match but won in aggregate. </p><p></p>
          ]]></content:encoded>
          <description><![CDATA[ Does the bad outcome prove the strategy bad? ]]></description>
        </item>
        <item>
          <title><![CDATA[ Beautiful chart to behold ]]></title>
          <link>https://www.junkcharts.com/beautiful-chart-to-behold/</link>
          <guid isPermaLink="false">699df37f37ce3200012c9343</guid>
          <category><![CDATA[ data visualization ]]></category>
          <dc:creator><![CDATA[ Kaiser Fung ]]></dc:creator>
          <pubDate>Thu, 26 Feb 2026 08:09:57 -0500</pubDate>
          <content:encoded><![CDATA[
            <style type="text/css">
              * { box-sizing: border-box !important; }
              img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                display: block !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              figure {
                margin: 0 !important;
                padding: 0 !important;
                max-width: 100% !important;
                width: 100% !important;
                overflow: hidden !important;
              }
              .kg-image, .kg-image-card img, .kg-gallery-image img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                object-fit: contain !important;
              }
              .kg-image-card, .kg-gallery-card {
                max-width: 100% !important;
                width: 100% !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              .post-content, .gh-content {
                max-width: 100% !important;
                overflow-x: hidden !important;
              }
              div, p, span {
                max-width: 100% !important;
              }
            </style>
            <p>As <a href="https://www.chartography.net/p/bellissimi-data-graphics?ref=junkcharts.com">RJ</a> puts it, "Bellissimo"!</p><p>This 3-D graphic of population data - called a "stereogram" – by an Italian statistician from 1880 is striking in appearance. It's also a chart that requires – no, demands – one's time to dissect and devour. </p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/02/rjandrews_stereogram1.jpeg" class="kg-image" alt="" loading="lazy" width="1456" height="1777" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/2026/02/rjandrews_stereogram1.jpeg 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/2026/02/rjandrews_stereogram1.jpeg 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/02/rjandrews_stereogram1.jpeg 1456w" sizes="(min-width: 720px) 720px"></figure><p>RJ adds to it by turning it into a modern <a href="https://www.junkcharts.com/tag/interactive/" rel="noreferrer">interactive</a> chart, with tooltips that help clarify the multiple threads of information.</p><hr><p>The underlying dataset is any population growth data. Here is something similar from the U.S. Census Bureau (the 1880 graphic used old Swedish data when Sweden had high birth rates and high infant mortality).</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/02/kfung_uscensus_populationdata_1990_1999_byage.png" class="kg-image" alt="" loading="lazy" width="2000" height="808" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/2026/02/kfung_uscensus_populationdata_1990_1999_byage.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/2026/02/kfung_uscensus_populationdata_1990_1999_byage.png 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1600/2026/02/kfung_uscensus_populationdata_1990_1999_byage.png 1600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/02/kfung_uscensus_populationdata_1990_1999_byage.png 2084w" sizes="(min-width: 720px) 720px"></figure><p>Each number in this table is a count of people of a given age in a given year. Summing down a column is adding up people of all ages in a given year, thus yielding the total population in that year. Scanning across a row shows the trend in the size of a given age cohort (adding across a row is not meaningful). In the first row of this table, we see that the number of births is declining during the 1990s.</p><p>I asked for calendar years 1990-1999, thus there are 10 columns, one for each year. There is one row per age from 0 to 100, followed by a catch-all row for anyone 101 and older, thus there are 102 rows of data.</p><p>If you think of any individual, this person does not ever stay in a cell. Each person moves diagonally down toward the right, one step at a time, as years pass – until the year of death, at which point that individual's line ends, contributing to a drop in the count at the next step. Every person must start on the first row. </p><p>For most people shown above, we only see part of their life line. It's truncated on the left (because they were born before 1990), and it's truncated on the right (if they died after 1999). In statistics, we call this left- and right-censoring.</p><hr><p>This is a case in which the graphic is quite a bit more involved than the original data – the dataset is deceptively simple. RJ also cites a contemporary critic who properly pointed out that the Italian has turned a 2-D dataset into a 3-D object.</p><p>In unpacking the 3-D graphic, RJ offered this helpful view:</p><!--members-only--><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/02/rjandrews_stereogram_1.jpeg" class="kg-image" alt="" loading="lazy" width="1024" height="1024" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/2026/02/rjandrews_stereogram_1.jpeg 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/2026/02/rjandrews_stereogram_1.jpeg 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/02/rjandrews_stereogram_1.jpeg 1024w" sizes="(min-width: 720px) 720px"></figure><p>Like the table above, the years are laid out horizontally while the age groups (in groups of five years) are shown "vertically" (i.e., into the screen). The counts are the added third dimension, which represents the lift "up" from the base.</p><p>The red lines trace the counts by age in a given year. At the top, we have the total number of newborns, then the counts cascade down the cliff, eventually flattening out somewhat. This snapshot view is more familiarly presented in a population pyramid:</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/02/population-pyramids-us-1990-l.jpg" class="kg-image" alt="" loading="lazy" width="2000" height="1500" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/2026/02/population-pyramids-us-1990-l.jpg 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/2026/02/population-pyramids-us-1990-l.jpg 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1600/2026/02/population-pyramids-us-1990-l.jpg 1600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w2400/2026/02/population-pyramids-us-1990-l.jpg 2400w" sizes="(min-width: 720px) 720px"></figure><p>If we ignore the gender split in the pyramid, and look at the total bar lengths, then these bar lengths map to the heights of a single red line in the stereogram. (Percent vs. count doesn't matter here since we fix the year.)</p><p>Back on the stereogram, the dark gray lines that run horizontally trace the change in the size of an age group over time. It's a cross-sectional, longitudinal view. Births were scaling rapidly during that period in Sweden but the number who lived past 55 years had not grown much.</p><p>Such trends are typically shown on <a href="https://www.junkcharts.com/tag/line-chart/" rel="noreferrer">line charts</a>. Imagine collapsing the age-group axis, plotting the count against year, and one line per age group.</p><p>The light gray lines on the stereogram are effectively <a href="https://www.junkcharts.com/tag/gridlines/" rel="noreferrer">gridlines</a>.</p><hr><p>Perozzo, the Italian scientist who made this stereogram, was even more ambitious. He also put in blue lines to trace individuals as they age over time.</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/02/rjandrews_stereogram_2a.png" class="kg-image" alt="" loading="lazy" width="621" height="712" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/2026/02/rjandrews_stereogram_2a.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/02/rjandrews_stereogram_2a.png 621w"></figure><p>If an individual lived to 100 years old, then that person born in 1750 would follow the blue line down the hill as years passed all the way. The "drop" in height from one point to the next point represented those who left the cohort. In the 1700s, many Swedish kids didn't live to age 10, and then after age 45, the blue line plunged again. </p><p>The reader has to develop a feel for the rise and fall of the rolling terrain. Perozzo recognized this, and he applied shading to help out. I'm quite impressed by this little feature:</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/02/perozzo_shading.png" class="kg-image" alt="" loading="lazy" width="348" height="786"></figure><p>RJ mimicked this effect using a formula-driven approach. </p><p>It's worthwhile to read his entire <a href="https://www.chartography.net/p/bellissimi-data-graphics?ref=junkcharts.com">post</a> as he gets into even more details. </p><hr><p>It appears that many readers find the stereogram too much of a good thing. It's a grand feast  serving food coma. </p><p>I think it does have its place but it shouldn't stand alone. The 3-D graphic makes it clear that there are three ways of slicing the hill. Each slice can be represented as line charts in 2-D but we need the 3-D chart as a kind of legend to show where the slices are coming from.</p><p>What do you think? Let me know below. </p>
          ]]></content:encoded>
          <description><![CDATA[ Data visualization circa 1880 from Italy ]]></description>
        </item>
        <item>
          <title><![CDATA[ Coffee machine math ]]></title>
          <link>https://www.junkcharts.com/coffee-machine-math/</link>
          <guid isPermaLink="false">699764485d5713000140582c</guid>
          <category><![CDATA[ Food ]]></category>
          <dc:creator><![CDATA[ Kaiser Fung ]]></dc:creator>
          <pubDate>Tue, 24 Feb 2026 08:17:44 -0500</pubDate>
          <content:encoded><![CDATA[
            <style type="text/css">
              * { box-sizing: border-box !important; }
              img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                display: block !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              figure {
                margin: 0 !important;
                padding: 0 !important;
                max-width: 100% !important;
                width: 100% !important;
                overflow: hidden !important;
              }
              .kg-image, .kg-image-card img, .kg-gallery-image img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                object-fit: contain !important;
              }
              .kg-image-card, .kg-gallery-card {
                max-width: 100% !important;
                width: 100% !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              .post-content, .gh-content {
                max-width: 100% !important;
                overflow-x: hidden !important;
              }
              div, p, span {
                max-width: 100% !important;
              }
            </style>
            <p>In a previous <a href="https://www.junkcharts.com/me-on-the-coffee-line/">post</a>, I noticed a change in the user interface of the office's coffee machine. Instead of showing default settings, the new interface shows the settings of the previous user so every time, I use it, I see something different. Is this retain-previous strategy better or worse than the common-default strategy?</p><p><a href="https://www.junkcharts.com/me-on-the-coffee-line/">Previously</a>, I argued that the retain-previous strategy is worse. The main reason is that I don't like the assumption that coffee preferences are serially correlated for people working in the same office. If we take away the serial correlation assumption, then using the most common settings as the default makes more sense. In this post, I attempt to quantify the argument.</p><hr><p>Let's set up the stylized problem as follows. </p><p>We only have two settings (Large and Small). We assume 70% of users want Large, and 30% want Small. If the machine uses common default, it shows Large to all users, and it would predict correctly 70% of the time. We therefore are interested in whether the retain-previous strategy can be at least 70% accurate.</p><p>The retain-previous strategy has to show something to the first-ever user. Let's assume it does the sensible thing, which is to show the most common setting, i.e. Large.</p><p>It therefore has a 70% chance of getting the first prediction correct.</p><p>If the prediction is correct, then the first user indeed prefers Large, and the machine shows a Large setting to the second user. If the second user also prefers Large, then the second prediction will also be correct. This has probability \( 70\% \times 70\% = 49\%.\)</p><p>If the first prediction is wrong, then we know the second user sees a Small setting. In this case, a correct second prediction happens if the second user wants Small. The probability is \( 30\% \times 30\% = 9\%.\)</p><p>Taken together, the chance that the second prediction is correct is \(49\% +9\% = 58\%.\) Notice that already in the second prediction, the correct probability has dipped below 70% (the level of the common-default strategy). </p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/02/kfung_coffeemachineui_tree.png" class="kg-image" alt="" loading="lazy" width="1756" height="1260" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/2026/02/kfung_coffeemachineui_tree.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/2026/02/kfung_coffeemachineui_tree.png 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1600/2026/02/kfung_coffeemachineui_tree.png 1600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/02/kfung_coffeemachineui_tree.png 1756w" sizes="(min-width: 720px) 720px"></figure><p>If we roll this analysis over users 3, 4, 5, ...., each of the subsequent predictions will also have a 58% chance of being correct, and this is because the system only remembers one step prior. What the machine shows the third user is not affected by the user type of the first user. </p><p>We can check this by looking at the possible sequences up to User 3.</p><p>LLL - first user is accurately predicted, second user is shown Large and is accurately predicted, the third user is shown Large also, and is accurately predicted. This has a probability of \(70\% \times 70\% \times 70\%.\)</p><p>LLS - same as above except that the third user is incorrectly predicted, so won't count towards our correct probability.</p><p>LSL - first user is accurately predicted, second user is shown Large but prefers Small, third user is shown Small but prefers Large. Two wrong predictions in a row, and won't count towards our correct probability.</p><p>LSS - same as above, except that the third user is accurately predicted. This counts toward the correct probability of the third prediction, \(70\% \times 30\% \times 30\%.\)</p><p>SLL - first user is shown Large but prefers Small, the second user is shown Small but prefers Large, the third user is shown Large correctly. The contribution towards the correct probability is \(30\% \times 70\% \times 70\%.\)</p><p>SLS - same as above, except the third prediction errs. </p><p>SSL - first user is shown Large but prefers Small, the second user prefers Small and is accurately predicted, the third user is shown Small but prefers Large. No contribution to correct probability either.</p><p>SSS - first user is shown Large but prefers Small, the second and third users are both shown Small and they both prefer Small. The addition to the correct probability of the third prediction is \(30\% \times 30\% \times 30\%.\)</p><p>Now, group LLL and SLL. The sum \(70\% \times 70\% \times 70\% +  30\% \times 70\% \times 70\% = 70\% \times 70\%.\) There are two branches out of the first user but ultimately they converge to the same product. Similarly, group LSS and SSS. These two branches converge to the same product, \(30\% \times 30\%.\)</p><p>Thus, the correct probability of the third prediction is  \(70\% \times 70\% + 30\% \times 30\% = 49\% + 9\% = 58\%.\) Look familiar?</p><p>Under this retain-previous strategy, the first prediction is correct 70% of the time, then all subsequent ones are correct 58% of the time. Thus, the overall accuracy must be below 70%, under the level of the common-default strategy.</p><hr><p>For those who want to see more equations. The formula for the correct probability is \( p^{2} + (1-p)^{2}, \) where p is the probability of the majority user type. In my example, \( p = 70\%\); substituting that in, the formula gives 58% as I computed above.</p><p>As p ranges from 0 to 1, the probability curve is a "bowl" with minimum at \( p = 50\% \), and the value increases as p moves toward 0% or 100%. In other words, the more concentrated the preferences are, the more likely the retain-previous strategy is to make correct predictions. In a sense, the problem becomes easier because most users want the same settings.</p><p>The correct probability of the common-default strategy is the proportion of the majority user type, written as \( \max(p, 1-p) \). This curve also has a minimum at \( p = 50\% \), and bends upwards toward 0% or 100%. Instead of a quadratic curve, it is a straight line. </p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/02/kfung_coffeemachinemath_curves.png" class="kg-image" alt="" loading="lazy" width="1472" height="958" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/2026/02/kfung_coffeemachinemath_curves.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/2026/02/kfung_coffeemachinemath_curves.png 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/02/kfung_coffeemachinemath_curves.png 1472w" sizes="(min-width: 720px) 720px"></figure><p>Because the straight line always lies on top of the bowl, the common-default strategy "dominates" the retain-previous strategy. There are three points where the two strategies meet: \( p = 0\% \text{ or } 100\% \), meaning that everybody picks the same settings; and \( p = 50\% \). </p><p>In conclusion, while the retain-previous strategy improves in situations where the preferred settings are more concentrated, its predictive accuracy is still below that of the common-default strategy.</p>
          ]]></content:encoded>
          <description><![CDATA[ Showing the previous setting to the next user is idiotic ]]></description>
        </item>
        <item>
          <title><![CDATA[ Me on the coffee line ]]></title>
          <link>https://www.junkcharts.com/me-on-the-coffee-line/</link>
          <guid isPermaLink="false">699601df5d57130001405716</guid>
          <category><![CDATA[ Food ]]></category>
          <dc:creator><![CDATA[ Kaiser Fung ]]></dc:creator>
          <pubDate>Mon, 23 Feb 2026 08:42:49 -0500</pubDate>
          <content:encoded><![CDATA[
            <style type="text/css">
              * { box-sizing: border-box !important; }
              img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                display: block !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              figure {
                margin: 0 !important;
                padding: 0 !important;
                max-width: 100% !important;
                width: 100% !important;
                overflow: hidden !important;
              }
              .kg-image, .kg-image-card img, .kg-gallery-image img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                object-fit: contain !important;
              }
              .kg-image-card, .kg-gallery-card {
                max-width: 100% !important;
                width: 100% !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              .post-content, .gh-content {
                max-width: 100% !important;
                overflow-x: hidden !important;
              }
              div, p, span {
                max-width: 100% !important;
              }
            </style>
            <p>At the office, they swapped out the automated coffee machines. These are the ones in which you select "Espresso", then you select regular or decaf, then you choose the size of the cup, then you press the button, and in a few seconds, the espresso comes streaming out.</p><p>I noticed that the new UI operates differently from the old one. The current interface retains the previous settings while on the old machine, the settings return to the default ("regular" and "medium") after each use.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/02/oldcoffeemachine.jpg" class="kg-image" alt="" loading="lazy" width="600" height="450" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/02/oldcoffeemachine.jpg 600w"><figcaption><span style="white-space: pre-wrap;">UI of the old coffee machine showing the same default settings each time</span></figcaption></figure><p>The interface <a href="https://www.junkcharts.com/tag/design/" rel="noreferrer">design</a> reflects a choice by the developer, which embeds the developer's assumption about user <a href="https://www.junkcharts.com/tag/behavior/" rel="noreferrer">behavior</a>.</p><p>The current developer assumes a kind of serial correlation, that the next user is likely to require the same settings as the previous user. We can frame this problem as predicting the setting requested by the next user. We want to maximize the total number of correct predictions in a queue of users. The retain-previous strategy sounds reasonable. </p><p>The developer of the other machine adopts a different strategy: show the same default settings to everyone. Presumably, the default settings are the most commonly requested settings. Both strategies are easy to understand, and this latter one, even simpler.</p><p>The common-default strategy discards the serial nature of the problem, as it treats every user identically regardless of their position in the queue. If, say, the most common settings are desired by 40% of users, then this strategy will predict correctly 40% of the time. Its effectiveness is a function of how common the most common is.</p><p>The retain-previous strategy is more complicated to analyze. In these coffee machines, there are three settings of caffeination (regular, half decaf, decaf); and three settings of size; thus, there are eight possible types of users. If we have historical data, we can take adjacent pairs of users and count what proportion are same-same pairs.</p><p>Without data, we may call upon some standard probability model for simulating a queue of users. This starts getting a little complicated. Any standard model assumes independence between samples, which should preclude serial dependence! That said, a standard model is obviously capable of generating adjacent pairs that have the same settings, i.e. sequential users who select the same settings.</p><p>Flipping a coin continuously will result in "runs" even though the coin is perfectly fair. The probability that the next flip is a head given the previous flip is a head is $ \frac{1}{2} $; ditto tail given tail. So the probability of seeing a run of length 2 is $ \frac{1}{2} \times \frac{1}{2} + \frac{1}{2} \times \frac{1}{2} = \frac{1}{2} $. But since the coin is fair and flips are independent, the serial correlation is zero!</p><hr><p>Let's get back on track. I don't like the serial correlation assumption anyway. Does knowing about the prior user really provide information about the next user's requirements? It might matter, for example, if the coffee machine is in a family home; but for a shared office, I don't think so.</p><p>If the developer of the new coffee machine assumes serial independence, then the probability of the next user's settings is the same whether or not we condition on the previous user's settings. So, the prediction is driven by the overall preferences amongst the eight possible settings. We are back to the common-default strategy.</p><p>This is sufficient to argue that the retain-previous strategy is suboptimal relative to the common-default strategy. The only way it may be better is if user preferences are correlated serially in a material way.</p><p>If this is not convincing, see my future post for a more quantitative argument.</p><hr><p>The retain-previous interface is annoying me in another way. It adds variety when none is needed. There is a certain "comfort" that comes with seeing the same settings each time, even if they aren't my preferred settings. With retain-previous, I have to train myself to ignore the UI and just put in my requirements. If I accidentally press Start without looking, I'm not sure what I'm getting.</p>
          ]]></content:encoded>
          <description><![CDATA[ Design, probability and more ]]></description>
        </item>
        <item>
          <title><![CDATA[ As usual, no one&#x27;s doping ]]></title>
          <link>https://www.junkcharts.com/as-usual-no-ones-doping/</link>
          <guid isPermaLink="false">698fa4f57bd5370001e718d6</guid>
          <category><![CDATA[ doping ]]></category>
          <dc:creator><![CDATA[ Kaiser Fung ]]></dc:creator>
          <pubDate>Tue, 17 Feb 2026 09:09:58 -0500</pubDate>
          <content:encoded><![CDATA[
            <style type="text/css">
              * { box-sizing: border-box !important; }
              img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                display: block !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              figure {
                margin: 0 !important;
                padding: 0 !important;
                max-width: 100% !important;
                width: 100% !important;
                overflow: hidden !important;
              }
              .kg-image, .kg-image-card img, .kg-gallery-image img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                object-fit: contain !important;
              }
              .kg-image-card, .kg-gallery-card {
                max-width: 100% !important;
                width: 100% !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              .post-content, .gh-content {
                max-width: 100% !important;
                overflow-x: hidden !important;
              }
              div, p, span {
                max-width: 100% !important;
              }
            </style>
            <p>I like to watch the Olympics as much as anyone. One thing's for sure though: we're going to learn from this Milano-Cortina Winter Olympics 2026 that doping is as rare as the blue moon, just like we learned from other Olympics. </p><p>We will also learn that any failed test is due to a black-swan event that befell the unfortunate athlete, which in fact, partly explains the first statement.</p><hr><p>Long-time reader Antonio R. points me to the first doping finding at these Olympics. Italian biathlete Rebecca Passler failed a test prior to the start of the Olympics, and was immediately suspended; now, according to this Italian report (<a href="https://www.corriere.it/sport/olimpiadi-invernali/26_febbraio_13/doping-la-nutella-e-il-cucchiaio-con-tracce-di-letrozolo-della-mamma-malata-cosi-rebecca-passler-ha-convinto-i-giudici-9135def8-56ab-4b14-bc4c-535b6f557xlk.shtml?ref=junkcharts.com">link</a>), her suspension has been lifted. (Perhaps only temporarily as there are other agencies that will eventually review the case.)</p><p>It's because she accidentally ingested the banned substance by eating ... Nutella. Yes, the famous Italian spread made of chocolate and hazelnuts. This surely is a new one. It's also a head-scratcher. If Nutella contains traces of the banned substance, then surely all Italian athletes are aware, no? Reading further, I learned that it's not Nutella but contaminated Nutella. The spoon used to share Nutella among the members is to blame. </p><p>How did the banned substance get on the spoon? It's from the athlete's sick mother's cancer medication. According to the reporter, past cases suggest that this still isn't enough to avoid punishment because it's the athlete's responsibility to take all possible precautions. </p><p>So, we also get some family drama. Passler's mother didn't want to affect her preparation for these Olympics, so she has hidden her diagnosis. In addition, she hid the cancer medication in some secret cabinet. What is likely true is that the mother doesn't know that the medication contains a substance that is banned by anti-doping agencies.</p><p>I'm not here to condemn or condone Passler. While the story requires an unlikely sequence of unlikely events, it is not impossible. I don't know of a foolproof way to know if she is a victim of a black-swan event or not – unless you're in the inner circle of her staff. </p><p>I wrote extensively about anti-doping tests in Chapter 4 of <strong>Numbers Rule Your World</strong>, in the years before Lance Armstrong confessed. My analysis leads me to believe that there are many more false negatives than false positives. Armstrong, you might recall, repeatedly pointed to years of negative test findings to push back on doping rumors. </p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/02/_nryw_bookcover.jpg" class="kg-image" alt="" loading="lazy" width="1006" height="1620" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/2026/02/_nryw_bookcover.jpg 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/2026/02/_nryw_bookcover.jpg 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/02/_nryw_bookcover.jpg 1006w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">My </span><a href="https://amzn.to/2zssjj4?ref=junkcharts.com" rel="noreferrer"><span style="white-space: pre-wrap;">book page</span></a></figcaption></figure><p>Also, the term "false positive" is imprecise. To believe Passler's story requires us to accept the initial test result as correct. Her team is, in fact, endorsing the test finding as a true positive! </p><p>In the book, I differentiate between a lab false positive, and a real-world false positive. In Passler's case (as in the case of every athlete who happened to have eaten something that happened to contain trace amounts of some banned substance), the lab test is presumed correct; what these athletes are disputing is the <em>cause</em> of the positive result.</p><hr><p>There is a key computation in Chapter 4 of <strong>Numbers Rule Your World</strong>.</p><!--members-only--><p>The proportion of doping athletes is bounded above by the proportion of tests coming back positive in any Olympics (I'm simplifying a bit by assuming one test per athlete.) If 100 athletes are tested, and 1 tested positive, there can at most be one true positive. If there are more than 1 doper amongst the 100 athletes, then surely the testing program has a false-negative problem. If there are 5 dopers, at least four of them will have negative findings (the false-negative rate is a staggering 80%!!). If there are 10 dopers, at least nine will be cleared. </p><p>So, pay attention to the number of positive tests. If this is like other Olympics, the number will be very small. That can be interpreted as very few athletes are doping, or most dopers are evading detection.</p><p></p><p></p>
          ]]></content:encoded>
          <description><![CDATA[ Not me. ]]></description>
        </item>
        <item>
          <title><![CDATA[ Data hunting on the radar (chart) ]]></title>
          <link>https://www.junkcharts.com/data-hunting-on-the-radar-chart/</link>
          <guid isPermaLink="false">698f96897bd5370001e71848</guid>
          <category><![CDATA[ radar chart ]]></category>
          <dc:creator><![CDATA[ Kaiser Fung ]]></dc:creator>
          <pubDate>Mon, 16 Feb 2026 08:14:00 -0500</pubDate>
          <content:encoded><![CDATA[
            <style type="text/css">
              * { box-sizing: border-box !important; }
              img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                display: block !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              figure {
                margin: 0 !important;
                padding: 0 !important;
                max-width: 100% !important;
                width: 100% !important;
                overflow: hidden !important;
              }
              .kg-image, .kg-image-card img, .kg-gallery-image img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                object-fit: contain !important;
              }
              .kg-image-card, .kg-gallery-card {
                max-width: 100% !important;
                width: 100% !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              .post-content, .gh-content {
                max-width: 100% !important;
                overflow-x: hidden !important;
              }
              div, p, span {
                max-width: 100% !important;
              }
            </style>
            <p>Today, I take another look at the simple <a href="https://www.junkcharts.com/tag/radar-chart/" rel="noreferrer">radar chart</a> created for the previous <a href="https://www.junkcharts.com/alternatives-to-radar-chart-1/" rel="noreferrer">post</a> in this series. </p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/02/junkcharts_radar_area_singles-1.png" class="kg-image" alt="" loading="lazy" width="1238" height="1180" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/2026/02/junkcharts_radar_area_singles-1.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/2026/02/junkcharts_radar_area_singles-1.png 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/02/junkcharts_radar_area_singles-1.png 1238w" sizes="(min-width: 720px) 720px"></figure><p>The most troubling part of this chart form is that it makes us look at things that distort the data, namely the shaded areas, and/or the perimeters.</p><p>The underlying data of the four students:</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/02/junkcharts_radarchart_sampledata-1.png" class="kg-image" alt="" loading="lazy" width="1076" height="782" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/2026/02/junkcharts_radarchart_sampledata-1.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/2026/02/junkcharts_radarchart_sampledata-1.png 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/02/junkcharts_radarchart_sampledata-1.png 1076w" sizes="(min-width: 720px) 720px"></figure><p>All four students achieved two High and two Low grades in the four subjects, thus they have the same GPA (assuming each subject has the same weight).</p><p>The radar charts divide these students into two sub-groups (top two rows, bottom two rows) if we go by the shaded areas. The area of Adam is the same as that of Betty (by symmetry). Each area is divided into four equal parts, each of which is a right-angled triangle, so the area is 4 x (1/2 x 2 x 1) = 4. (I'm setting the outer radius to be 2 and the inner radius to be 1.) Also by symmetry, the area of Chad is the same as for Daisy. The area of Chad is 13% larger than the area of Adam. We compute Chad's area also by considering four right-angled triangles, so the area is (1/2 x 2 x 1) + (1/2 x 2 x 2) + (1/2 x 1 x 1) + (1/2 x 2 x 1) = 1/2 x (2+4+1+2) = 1/2 x (9) = 4.5.</p><p>The only difference between these two sub-groups is which two subjects they achieved the two High grades. To the extent that we claim that the difference in areas represents "data", then the radar chart must have assigned differential weights for the four subjects, in an implicit manner, which defies our understanding.</p><p>What this really is saying is the area shown on the radar chart is meaningless.</p><hr><p>The perimeter on the radar chart is also meaningless. </p><!--members-only--><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/02/radar_line_pairs-1.png" class="kg-image" alt="" loading="lazy" width="1066" height="1042" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/2026/02/radar_line_pairs-1.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/2026/02/radar_line_pairs-1.png 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/02/radar_line_pairs-1.png 1066w" sizes="(min-width: 720px) 720px"></figure><p>To estimate the perimeters, we use Pythagoras's Theorem (square of the hypotenuse is sum of the squares of the two other sides.)</p><p>For Adam, the perimeter is 4 x sqrt(4+1) = 4 x sqrt(5) = 8.9. Betty's perimeter is the same as Adam. For Chad, the perimeter is 2 x sqrt(4+1) + sqrt(4+4) + sqrt(1+1) = 2 x sqrt(5) + 3 x sqrt(2) = 8.7. Daisy's perimeter is the same as Chad.</p><p>Thus, Chad or Daisy's perimeter is about 2% smaller than Adam or Betty's perimeter. Again, the only explanation for a difference in perimeters is that it encodes a difference in the data, which in this case, represents in which subjects the students achieved their High grades.</p><p>Finally, not only do both area and perimeter distort the underlying data, they stretch in opposite directions!</p><hr><p>The radar chart doesn't really encode data in the area or the perimeter. The visual form makes us think that. The data are really to be found in the spokes of the chart; here is a chart from the first <a href="https://www.junkcharts.com/four-reasons-to-unplug-radar-charts/">post</a> of this series. </p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/02/junkcharts_redo_ormsradar_radials-2.png" class="kg-image" alt="" loading="lazy" width="1214" height="920" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/2026/02/junkcharts_redo_ormsradar_radials-2.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/2026/02/junkcharts_redo_ormsradar_radials-2.png 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/02/junkcharts_redo_ormsradar_radials-2.png 1214w" sizes="(min-width: 720px) 720px"></figure><p>See the previous posts in this series (<a href="https://www.junkcharts.com/four-reasons-to-unplug-radar-charts/">1</a>, <a href="https://www.junkcharts.com/alternatives-to-radar-chart-1/">2</a>). </p><p></p><p></p><p></p>
          ]]></content:encoded>
          <description><![CDATA[ The data are not where you think they are ]]></description>
        </item>
        <item>
          <title><![CDATA[ Alternatives to radar chart 1 ]]></title>
          <link>https://www.junkcharts.com/alternatives-to-radar-chart-1/</link>
          <guid isPermaLink="false">698cb9ec93d0070001fb8ce1</guid>
          <category><![CDATA[ Table ]]></category>
          <dc:creator><![CDATA[ Kaiser Fung ]]></dc:creator>
          <pubDate>Wed, 11 Feb 2026 13:01:43 -0500</pubDate>
          <content:encoded><![CDATA[
            <style type="text/css">
              * { box-sizing: border-box !important; }
              img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                display: block !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              figure {
                margin: 0 !important;
                padding: 0 !important;
                max-width: 100% !important;
                width: 100% !important;
                overflow: hidden !important;
              }
              .kg-image, .kg-image-card img, .kg-gallery-image img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                object-fit: contain !important;
              }
              .kg-image-card, .kg-gallery-card {
                max-width: 100% !important;
                width: 100% !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              .post-content, .gh-content {
                max-width: 100% !important;
                overflow-x: hidden !important;
              }
              div, p, span {
                max-width: 100% !important;
              }
            </style>
            <p>The radar chart is frequently used in the following setting: you are comparing some objects across K dimensions. Embedded in this data are K+1 rankings, which include K rankings, one for each dimension, plus an aggregate ranking.</p><p>My last <a href="https://www.junkcharts.com/four-reasons-to-unplug-radar-charts/">post</a> explains why I don't like the radar chart. In this post, I'll explain why the radar chart conveys the information worse than even a data table.</p><p>Here is a very simple dataset I'll be using:</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/02/junkcharts_radarchart_sampledata.png" class="kg-image" alt="" loading="lazy" width="1076" height="782" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/2026/02/junkcharts_radarchart_sampledata.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/2026/02/junkcharts_radarchart_sampledata.png 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/02/junkcharts_radarchart_sampledata.png 1076w" sizes="(min-width: 720px) 720px"></figure><p>Four students are rated on four subjects. Each rating is either High or Low. Each student earns two Highs and two Lows. Chad and Daisy (last two rows) are strong in Math/Science and Language/Arts respectively. One is the mirror image of the other. Adam and Betty are also mirror images of one another.</p><p>On an unadorned <a href="https://www.junkcharts.com/tag/table/" rel="noreferrer">data table</a>, the reader can already find various insights. Who's the best at Math? Adam and Chad. Who's good at Math &amp; Science? Chad. What subjects are Betty performing well at? Science and Arts. What subjects do Daisy need help with? Math and Science. Is Betty or Daisy better at Arts? Daisy.</p><hr><p>Now, try finding answers to those types of questions from this radar chart?</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/02/junkcharts_radarchart_area_fours.png" class="kg-image" alt="" loading="lazy" width="1270" height="900" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/2026/02/junkcharts_radarchart_area_fours.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/2026/02/junkcharts_radarchart_area_fours.png 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/02/junkcharts_radarchart_area_fours.png 1270w" sizes="(min-width: 720px) 720px"></figure><p>OK, the overlapping areas are distracting and annoying. Try this line version:</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/02/junkcharts_radarchart_line_fours.png" class="kg-image" alt="" loading="lazy" width="1382" height="936" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/2026/02/junkcharts_radarchart_line_fours.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/2026/02/junkcharts_radarchart_line_fours.png 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/02/junkcharts_radarchart_line_fours.png 1382w" sizes="(min-width: 720px) 720px"></figure><p>Not much better. In fact, this quick exploration reveals yet another reason to unplug the radar. It doesn't like categorical data, or any data with a good number of equal values. Equal data values cause lines (or perimeters of areas) to over-print.</p><!--members-only--><p>But it also doesn't handle continuous data. Imagine we add jitter to the equal values so they are minutely separated. This turn the overlapping lines into separate lines with different angles, producing even more criss-crossing!</p><p>Instead, let's do a <a href="https://www.junkcharts.com/tag/small-multiples/" rel="noreferrer">small-multiples</a> arrangement:</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/02/junkcharts_radar_area_singles.png" class="kg-image" alt="" loading="lazy" width="1238" height="1180" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/2026/02/junkcharts_radar_area_singles.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/2026/02/junkcharts_radar_area_singles.png 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/02/junkcharts_radar_area_singles.png 1238w" sizes="(min-width: 720px) 720px"></figure><p>At least the data become visible. However, this arrangement makes it harder to answer many of the questions we care about. Who's the best at Math? We have to look at all four charts. Is Betty or Daisy better at Arts? We have to compare two charts. What help does Daisy need? This one can be read from a single chart but it's still not as easy as the data table.</p><hr><p>The data table can be "enhanced" by adding <a href="https://www.junkcharts.com/tag/color/" rel="noreferrer">color</a>, styling, and symbols.</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/02/junkcharts_radaralts_table_color.png" class="kg-image" alt="" loading="lazy" width="960" height="730" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/2026/02/junkcharts_radaralts_table_color.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/02/junkcharts_radaralts_table_color.png 960w" sizes="(min-width: 720px) 720px"></figure><p>Our eyes are really great at <a href="https://www.junkcharts.com/tag/sorting/" rel="noreferrer">sorting</a> out two categories. Just a little color and bolding is sufficient. </p><p>Symbols are also useful:</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/02/junkcharts_radaralts_table_1symbol.png" class="kg-image" alt="" loading="lazy" width="1130" height="726" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/2026/02/junkcharts_radaralts_table_1symbol.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/2026/02/junkcharts_radaralts_table_1symbol.png 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/02/junkcharts_radaralts_table_1symbol.png 1130w" sizes="(min-width: 720px) 720px"></figure><p>With symbols, I better add a <a href="https://www.junkcharts.com/tag/legend/" rel="noreferrer">legend</a>.</p><p>Even better is if I vary both symbols and colors:</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/02/junkcharts_radaralts_table_2symbols.png" class="kg-image" alt="" loading="lazy" width="1118" height="724" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/2026/02/junkcharts_radaralts_table_2symbols.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/2026/02/junkcharts_radaralts_table_2symbols.png 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/02/junkcharts_radaralts_table_2symbols.png 1118w" sizes="(min-width: 720px) 720px"></figure><p>In future posts, I'll explore other options.</p>
          ]]></content:encoded>
          <description><![CDATA[ Sometimes, a table is all you need ]]></description>
        </item>
        <item>
          <title><![CDATA[ Snow math ]]></title>
          <link>https://www.junkcharts.com/snow-math/</link>
          <guid isPermaLink="false">6986143a60ee7c0001f8e56e</guid>
          <category><![CDATA[ business analytics ]]></category>
          <dc:creator><![CDATA[ Kaiser Fung ]]></dc:creator>
          <pubDate>Sat, 07 Feb 2026 09:49:36 -0500</pubDate>
          <content:encoded><![CDATA[
            <style type="text/css">
              * { box-sizing: border-box !important; }
              img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                display: block !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              figure {
                margin: 0 !important;
                padding: 0 !important;
                max-width: 100% !important;
                width: 100% !important;
                overflow: hidden !important;
              }
              .kg-image, .kg-image-card img, .kg-gallery-image img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                object-fit: contain !important;
              }
              .kg-image-card, .kg-gallery-card {
                max-width: 100% !important;
                width: 100% !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              .post-content, .gh-content {
                max-width: 100% !important;
                overflow-x: hidden !important;
              }
              div, p, span {
                max-width: 100% !important;
              }
            </style>
            <p>Glad to see some reporter is on the <a href="https://www.msn.com/en-us/news/us/new-york-city-issues-nearly-2800-tickets-over-sidewalk-snow-removal/ar-AA1VK2sF?ref=junkcharts.com">case</a> about snow ploughing in NYC after the snowstorm a couple of weeks ago. (Despite what you might have read elsewhere, there has so far been just one day of snow, followed by icy cold conditions.)</p><p>The reporter said the city issued 2,800 and so tickets to home and business owners who did not fulfil their civic duties of "clearing a path at least four-feet wide with clear access to crosswalks". </p><p>In my neighborhood, I found something counter-intuitive. I ventured outside that night after the snowing had mostly stopped. Most buildings made an effort to deal with the snow, and so it was quite easy to walk around. Notable exceptions were in front of Taco Bell and Starbucks. (Also, outside a Korean BBQ restaurant that is part of a national chain and one of the most popular businesses in the hood.) This creates the strange situation in which I could walk freely outside the little mom-and-pop stores that are barely surviving but must sink my feet in inches of snow in front of these large storefronts (that were ironically open for business).</p><p>That shocked me because you'd think that the large corporations should be the least likely offenders. I assumed the fines must not be large enough, or they must have found some loophole to avoid them. According to this <a href="https://pix11.com/news/local-news/you-can-be-fined-for-not-shoveling-the-sidewalk-in-nyc/?ref=junkcharts.com" rel="noreferrer">article</a>, the fine is $150 for the first offence, and up to $350 for subsequent offences. (I assume the money goes to funding the government workers who plough the snow instead.)</p><p>Those fines (if enforced) are clearly too low at current hourly wages. The businesses probably would have had to pay more than $150 to hire workers. So it comes down to whether the business owner wants to be a good citizen. I guess this is where the small businesses have an edge.</p>
          ]]></content:encoded>
          <description><![CDATA[ How big businesses snowed in locals ]]></description>
        </item>
        <item>
          <title><![CDATA[ Four reasons to unplug radar charts ]]></title>
          <link>https://www.junkcharts.com/four-reasons-to-unplug-radar-charts/</link>
          <guid isPermaLink="false">69819503c58aa4000196da51</guid>
          <category><![CDATA[ radar chart ]]></category>
          <dc:creator><![CDATA[ Kaiser Fung ]]></dc:creator>
          <pubDate>Thu, 05 Feb 2026 09:19:29 -0500</pubDate>
          <content:encoded><![CDATA[
            <style type="text/css">
              * { box-sizing: border-box !important; }
              img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                display: block !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              figure {
                margin: 0 !important;
                padding: 0 !important;
                max-width: 100% !important;
                width: 100% !important;
                overflow: hidden !important;
              }
              .kg-image, .kg-image-card img, .kg-gallery-image img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                object-fit: contain !important;
              }
              .kg-image-card, .kg-gallery-card {
                max-width: 100% !important;
                width: 100% !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              .post-content, .gh-content {
                max-width: 100% !important;
                overflow-x: hidden !important;
              }
              div, p, span {
                max-width: 100% !important;
              }
            </style>
            <p>I remain unconvinced by <a href="https://www.junkcharts.com/tag/radar-chart/" rel="noreferrer">radar charts</a>. Here is another example that popped up in a recent issue of ORMS Today (<a href="https://pubsonline.informs.org/do/10.1287/orms.2025.04.14/full/?ref=junkcharts.com">link</a>).</p><p>Some entity is being rated on five dimensions. These dimensions are laid out as five spokes from the center, equally spaced. The ratings appear as five dots on these spokes. The five dots are connected cyclically to form a sequence, using straight line segments.</p><p>This use of the radar chart is very popular on business dashboards. The purpose is to rate entities along multiple dimensions. When multiple entities are rated, they each appear as a cyclical sequence.</p><p>In this post, I present five reasons why you should stay away from the <a href="https://www.junkcharts.com/tag/radar-chart/" rel="noreferrer">radar chart</a>.</p><ol><li><strong>The Radar chart foregrounds fake connections.</strong></li></ol><p>In the example above, the reader's attention is focused on the thick blue line. Perhaps its jagged shape carries the key to unlocking the insights in the dataset. (Perhaps not.) In other examples, the designer shades the area enclosed by the connected line segments. The area has no more relevance than the shape. </p><p>The data are actually encoded in the distances between the center and the dots on the spokes. Ironically, the radial lines are backgrounded in favor of the envelope. </p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/02/junkcharts_redo_ormsradar_radials.png" class="kg-image" alt="" loading="lazy" width="1214" height="920" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/2026/02/junkcharts_redo_ormsradar_radials.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/2026/02/junkcharts_redo_ormsradar_radials.png 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/02/junkcharts_redo_ormsradar_radials.png 1214w" sizes="(min-width: 720px) 720px"></figure><p>If someone wants to understand the chart, they have to backfill these red lines while ignoring the blue envelope.</p><p>You might think of salvaging the radar chart by plotting those radial lines instead of the envelope. There is a reason why this isn't the common practice: the radial design of the radar chart is severely limited in scope. Imagine trying to compare two entities on those five dimensions. The radial axes overlap, messing up the comparison.</p><ol start="2"><li><strong>The Radar chart conveys fake neutrality.</strong></li></ol><p>Like all chart forms, the radar chart imposes a set of strict assumptions on the designer. Many of these assumptions are impractical, even harmful. One such restriction is the equal spacing between the spokes, which implies equal importance between the dimensions. </p>
<aside class="gh-post-upgrade-cta">
  <div class="gh-post-upgrade-cta-content" style="background-color: #FF1A75">


      <h2>To keep reading, please sign in.</h2>
      <p>(Membership is free.)</p>

      <a class="gh-btn" data-portal="signup" style="color: #FF1A75">Join Free</a>
      <p><small>Already have an account? <a data-portal="signin">Sign in</a></small></p>

  </div>
</aside>

          ]]></content:encoded>
          <description><![CDATA[ Ouch, ouch, ouch, ouch ]]></description>
        </item>
        <item>
          <title><![CDATA[ Visualizing hierarchies ]]></title>
          <link>https://www.junkcharts.com/visualizing-hierarchies/</link>
          <guid isPermaLink="false">697cedd7c58aa4000196d92b</guid>
          <category><![CDATA[ sports analytics ]]></category>
          <dc:creator><![CDATA[ Kaiser Fung ]]></dc:creator>
          <pubDate>Mon, 02 Feb 2026 09:56:59 -0500</pubDate>
          <content:encoded><![CDATA[
            <style type="text/css">
              * { box-sizing: border-box !important; }
              img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                display: block !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              figure {
                margin: 0 !important;
                padding: 0 !important;
                max-width: 100% !important;
                width: 100% !important;
                overflow: hidden !important;
              }
              .kg-image, .kg-image-card img, .kg-gallery-image img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                object-fit: contain !important;
              }
              .kg-image-card, .kg-gallery-card {
                max-width: 100% !important;
                width: 100% !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              .post-content, .gh-content {
                max-width: 100% !important;
                overflow-x: hidden !important;
              }
              div, p, span {
                max-width: 100% !important;
              }
            </style>
            <p>Long-time reader Chris P. sent me to an Instagram user (<a href="https://www.instagram.com/p/DTix2_RkW0g/?ref=junkcharts.com" rel="noreferrer">link</a>) who analyzed the travel schedules of all the NCAA men's volleyball teams. </p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/01/cp_setthebic_2.jpg" class="kg-image" alt="" loading="lazy" width="640" height="800" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/2026/01/cp_setthebic_2.jpg 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/01/cp_setthebic_2.jpg 640w"></figure><p>The analysis is deceptively simple if we ignore data collection – the outputs are distances travelled for each team, grouped by conference. The inputs? At least, schedules from each conference, including home/away indicators; maps; campus maps. The analyst describes how much work it is to put together this "simple" dataset. The process also includes the elephant in the room – <a href="https://www.junkcharts.com/tag/assumptions/" rel="noreferrer">assumptions</a>! </p><p>For example, to be accurate, one wants to know where each team's home gym is but that venue is not always obvious to an outsider. So, in some cases, the analyst resorts to using the coordinates of the school's campus. </p><p>Unpacking assumptions is like pulling apart an onion. In that latter scenario, how does one determine the coordinates of any campus? Many schools are not one contiguous space, and even if it's one connected space, it almost surely has a highly irregular shape! In the other scenario, we must make another assumption: that teams always depart from their home gym. </p><hr><p>Enough about the data. We're here to talk about the visualization.</p><p>Here is a different chart in the series, focused on comparing schools in a particular conference:</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/01/cp_setthebic_11.jpg" class="kg-image" alt="" loading="lazy" width="1080" height="1350" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/2026/01/cp_setthebic_11.jpg 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/2026/01/cp_setthebic_11.jpg 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/01/cp_setthebic_11.jpg 1080w" sizes="(min-width: 720px) 720px"></figure><p>The basis of this chart is a <a href="https://www.junkcharts.com/tag/bar-chart/" rel="noreferrer">bar chart.</a> Each bar has exposed tiles although they don't have a quantitative interpretation. In principle, the last digit of each data label represents the outer edge of each bar; in practice, it's plainly obvious that the bar lengths do not accurately encode the travel distances. So it's a bar chart in form but not in content.</p><p>In the NEC conference, FDU's bar should be about a quarter of the length of Saint Francis's; and about half the length of D'Youville's. But it's not.</p><p>I suppose the distances are horizontally dispersed in a way that roughly – very roughly – conveys the ranking of the data. </p><p>Is there a better way to visualize this dataset?</p><hr><p>In re-thinking the graph, I want to retain several satisfying features of the original:</p><ul><li>The chart form preserves a nested <a href="https://www.junkcharts.com/tag/hierarchy/" rel="noreferrer">hierarchy</a> in the data: everything &gt; conference &gt; school. It works identically at each level, thus reducing cognitive load moving from one level to another.</li><li>The tiles, colors and fonts suggest a light-hearted, playful mental state.</li><li>The data concern distances.</li><li>Something other than a standard bar chart is desired.</li></ul><p>Here's what I came up with:</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/01/redo_setthebic_volleytravel.png" class="kg-image" alt="" loading="lazy" width="503" height="529"></figure><p>This particular chart shows the data at the conference level. The leagues are arranged around the circumference of a circle. The arrow cues readers to read clockwise from the top. The first conference encountered, the ECC, has the least miles travelled. The conference that does the second-lowest mileage is the IVA. </p><p>The distance data are encoded as edge distances on the circle counting from the top. The gaps between consecutive dots represent the differences in travel distance between adjacently ranked leagues.</p><p>Next, I added "gridlines" to help readers gauge the <a href="https://www.junkcharts.com/tag/scale/" rel="noreferrer">scale</a> of the chart. These gridlines are the radii of the circle because the edge distance is proportional to the angle. In deciding the number of gridlines, I took a hint from the original chart, where the designer tells readers that a trip around the Earth is about 25,000 miles. The maximum traversed distance here is roughly 67,000 so I plotted 3 round trips, and three gridlines (at 0, 1, and 2 round trips).</p><hr><p>Just to show that this design meets the first requirement above, here is the chart for NEC:</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/01/junkcharts_redo_setthebic_volleydistances.png" class="kg-image" alt="" loading="lazy" width="988" height="1048" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/2026/01/junkcharts_redo_setthebic_volleydistances.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/01/junkcharts_redo_setthebic_volleydistances.png 988w" sizes="(min-width: 720px) 720px"></figure><p>Once the reader figures out how to read one of these charts, the reader has learned how to read all of them.</p><hr><p>Last thing... moving back to the D corner of the <a href="https://www.junkcharts.com/junk-charts-trifecta-checkup-the-definitive-guide/" rel="noreferrer">Trifecta Checkup</a>. What would make this analysis even more compelling is if a "Y" variable (i.e. outcome) is included. How does the variable travel distances affect the teams' performances?</p>
          ]]></content:encoded>
          <description><![CDATA[ Bonus: setting gridlines on a circular chart ]]></description>
        </item>
        <item>
          <title><![CDATA[ How MTA is spending its money ]]></title>
          <link>https://www.junkcharts.com/how-mta-is-spending-its-money/</link>
          <guid isPermaLink="false">69459202519f3800015d5f63</guid>
          <category><![CDATA[ experiments ]]></category>
          <dc:creator><![CDATA[ Kaiser Fung ]]></dc:creator>
          <pubDate>Thu, 29 Jan 2026 09:42:15 -0500</pubDate>
          <content:encoded><![CDATA[
            <style type="text/css">
              * { box-sizing: border-box !important; }
              img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                display: block !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              figure {
                margin: 0 !important;
                padding: 0 !important;
                max-width: 100% !important;
                width: 100% !important;
                overflow: hidden !important;
              }
              .kg-image, .kg-image-card img, .kg-gallery-image img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                object-fit: contain !important;
              }
              .kg-image-card, .kg-gallery-card {
                max-width: 100% !important;
                width: 100% !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              .post-content, .gh-content {
                max-width: 100% !important;
                overflow-x: hidden !important;
              }
              div, p, span {
                max-width: 100% !important;
              }
            </style>
            <p>New York's subway and bus operator, MTA, has forever been facing budget crises. In recent years, it has to contend with the work-from-home trend, and rampant fare evasion.</p><p>The MTA also appears eager to buy whatever vendors sell them, including snake oil and hot air. </p><p>I previously wrote about the so-called "Select Buses" (<a href="https://www.junkcharts.com/why-you-must-know-how-analytical-results-were-obtained/" rel="noreferrer">link</a>). The MTA spent money on a system that required installing special fare machines on sidewalks, from which all passengers, including those who held weekly or monthly passes, must obtain a paper receipt prior to boarding the bus. When the bus arrives, passengers may board from front <strong>and back</strong> doors. Effectively, this sets up an honor system: MTA does not validate whether someone has paid the fare. The driver can't be bothered to check those boarding from the front door either - what's the point when anyone can get on through the back doors?</p><p>Apparently, the vendor convinced the MTA that this Select Bus system would reduce waiting times; I have no reason to doubt this claim since the driver wouldn't have to deal with the hassle of passengers paying the fare while boarding the bus! Passengers are also streaming freely onto the bus through three open doors instead of just the front door. </p><p>My previous <a href="https://www.junkcharts.com/why-you-must-know-how-analytical-results-were-obtained/" rel="noreferrer">post</a> is highly recommended. I discovered that MTA management even purchased a "study" from a consultant in which they claimed to have found that the aforementioned system not only did not promote fare evasion but it curbed fare evasion! </p><hr><p>Another cost-saving tactic favored in all subway systems around the world is replacing human operators with machines. We have all seen people jumping over, or crawling under, the turnstiles. No MTA staff is present to enforce fares anymore. </p><p>Recently, some vendor convinced the MTA to install "spikes" and "sleeves" on the turnstiles to "stop" fare evaders. I kid you not. Below is a "sleeve" (hat tip to <a href="https://nypost.com/2025/12/18/us-news/nyc-subway-fare-jumpers-easily-beat-anti-theft-fins-as-mta-spends-7-3m-to-bring-program-to-nearly-every-station/?ref=junkcharts.com" rel="noreferrer">New York Post</a> for the images):</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2025/12/nypost_mtasleeves.webp" class="kg-image" alt="" loading="lazy" width="2000" height="1334" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/2025/12/nypost_mtasleeves.webp 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/2025/12/nypost_mtasleeves.webp 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1600/2025/12/nypost_mtasleeves.webp 1600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2025/12/nypost_mtasleeves.webp 2048w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">Image from NY Post</span></figcaption></figure><p>How is this stopping anyone from jumping over? </p><p>And below are the "spikes" (they are on top of the side wall):</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2025/12/nypost_mtaspikes.webp" class="kg-image" alt="" loading="lazy" width="2000" height="1333" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/2025/12/nypost_mtaspikes.webp 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/2025/12/nypost_mtaspikes.webp 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1600/2025/12/nypost_mtaspikes.webp 1600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2025/12/nypost_mtaspikes.webp 2048w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">Image from NY Post</span></figcaption></figure><p>I have rarely seen anyone "climbing" over. The usual manuevre is hurdling over so this is a mystery.</p><p>The same song is being played. A few months after these toothless interventions appeared, the MTA has declared victory, and will spend more money to install them everywhere! </p><p>It took the NY Post reporter one trip to two subway stations to eye-witness what the study's authors apparently couldn't see: neither the sleeves nor the spikes are stopping fare evaders (<a href="https://nypost.com/2025/12/18/us-news/nyc-subway-fare-jumpers-easily-beat-anti-theft-fins-as-mta-spends-7-3m-to-bring-program-to-nearly-every-station/?ref=junkcharts.com" rel="noreferrer">link</a>). Or, it just takes some common sense.</p><hr><p>The MTA actually told reporters the following with a straight face: "At stations where the equipment has already been installed, fare evasion has dropped by about 60%." (<a href="https://www.timeout.com/newyork/news/spikes-and-paddles-are-being-added-to-basically-every-nyc-subway-station-turnstile-121825?ref=junkcharts.com" rel="noreferrer">link</a>)</p><p>Let's say this out loud: the MTA believed that those sleeves and spikes have caused a 60% drop in fare evasion. </p><p>Again: the MTA determined that those sleeves and spikes have stopped 6 out of every 10 prospective fare evader.</p><p>This is the same MTA that told us by letting riders get onto Select buses from every door without validating tickets, they have curbed fare evasion below normal levels.</p><p>I can't find details as to how they conducted the study. So let's interpret the quote above.</p><p>First, they are describing only those stations with equipment. We don't know what's going on in stations without equipment. If they were to compare the two groups of stations, we'd need to know how they selected the stations for this pilot program. Are the stations with equipment similar to those without? (Probably not, unless they designed a rigorous testing program before the pilot started.)</p><p>Second, surely some of the reduction in fare evasion reflect a general trend. For example, as more companies are pushing employees back to the office, we have an influx of commuters who have well-paid jobs and are thus less likely to evade fares. Any pre-post type analysis must include factors like this.</p><p>Third, we also don't know if the equipment installation is all-or-none at each station, or what proportion of each station's turnstiles have those sleeves and spikes?</p><p>Fourth, the MTA is simultaneously rolling out many different interventions. I see warning notices, and hear warning messages. Sometimes, there are guards standing near the turnstiles (although I have never seen any guard stopping an ongoing act of fare evasion.) How did the study account for these other factors?</p><p>Fifth, by claiming a "drop", they must be comparing a current measurement against some baseline. What is this baseline? </p><p>Sixth, how do they even measure fare evasion? Do they have staff counting fare evaders? Are they analyzing video footage?</p><p>Designing a proper <a href="https://www.junkcharts.com/tag/tests/" rel="noreferrer">test</a> to measure the effect of the sleeves and spikes is an interesting project. </p><p></p><p></p><p></p>
          ]]></content:encoded>
          <description><![CDATA[ Wow, they found the secret to stopping fare evasion ]]></description>
        </item>
        <item>
          <title><![CDATA[ A cheesy graphic ]]></title>
          <link>https://www.junkcharts.com/a-cheesy-graphic/</link>
          <guid isPermaLink="false">69782dddb1c677000153a3a1</guid>
          <category><![CDATA[ Pie chart ]]></category>
          <dc:creator><![CDATA[ Kaiser Fung ]]></dc:creator>
          <pubDate>Wed, 28 Jan 2026 14:48:29 -0500</pubDate>
          <content:encoded><![CDATA[
            <style type="text/css">
              * { box-sizing: border-box !important; }
              img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                display: block !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              figure {
                margin: 0 !important;
                padding: 0 !important;
                max-width: 100% !important;
                width: 100% !important;
                overflow: hidden !important;
              }
              .kg-image, .kg-image-card img, .kg-gallery-image img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                object-fit: contain !important;
              }
              .kg-image-card, .kg-gallery-card {
                max-width: 100% !important;
                width: 100% !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              .post-content, .gh-content {
                max-width: 100% !important;
                overflow-x: hidden !important;
              }
              div, p, span {
                max-width: 100% !important;
              }
            </style>
            <p>Jorge Camoes featured this cheesy graphic by Eurostat in a recent Linkedin post. It's a fun graphic that brings out the unexpected, at least amongst the uneducated. Who knew Germany makes more cheese than France or Italy?</p><p>What kind of chart is it?</p><p>It's side-by-side <a href="https://www.junkcharts.com/tag/pie-chart/" rel="noreferrer">pie charts</a>. The twist is that the chart does not encode the raw data, neither the tonnage, nor the proportion of tonnage. Instead, the chart plots index values, with Germany set to 100%. In that scale, France is 1.9/2.4 = 80% and Italy is 1.4/2.4 = 60%.</p><p>Therein lies the problem. The chunks of cheese bitten off France and Italy's rinds are roughly equal sized so I don't think they are scaled properly.</p><p>Possibly, the designer is simultaneously manipulating the size of the pies, and the bitten-off chunks?</p><p>I took my ruler out and it's neither here nor there.</p><p>The closest is if we take whole pies of all three countries. I estimated that the radius of France is about 75% that of Germany, Italy is 56% of Germany, so close enough to 80% and 60% respectively. But even this encoding is problematic because we should be encoding the data in the areas not the radii of the pies. (If we take whole pies, we have moved from pie charts to <a href="https://www.junkcharts.com/tag/bubble-chart/" rel="noreferrer">bubble charts</a>.)</p><p>The ratio of areas is 66% and 32% respectively, which takes us further from the data. </p><p>If we now bite a chunk out of France and Italy but not Germany, as per the graphic, then the ratio further slides away to 44% for France, and 25% for Italy.</p><p></p>
          ]]></content:encoded>
          <description><![CDATA[ Which nation makes the most cheese? ]]></description>
        </item>
        <item>
          <title><![CDATA[ Instacart bows to pressure ]]></title>
          <link>https://www.junkcharts.com/instacart-bows-to-pressure/</link>
          <guid isPermaLink="false">69499dac519f3800015d6ac8</guid>
          <category><![CDATA[ Tests ]]></category>
          <dc:creator><![CDATA[ Kaiser Fung ]]></dc:creator>
          <pubDate>Tue, 27 Jan 2026 09:27:27 -0500</pubDate>
          <content:encoded><![CDATA[
            <style type="text/css">
              * { box-sizing: border-box !important; }
              img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                display: block !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              figure {
                margin: 0 !important;
                padding: 0 !important;
                max-width: 100% !important;
                width: 100% !important;
                overflow: hidden !important;
              }
              .kg-image, .kg-image-card img, .kg-gallery-image img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                object-fit: contain !important;
              }
              .kg-image-card, .kg-gallery-card {
                max-width: 100% !important;
                width: 100% !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              .post-content, .gh-content {
                max-width: 100% !important;
                overflow-x: hidden !important;
              }
              div, p, span {
                max-width: 100% !important;
              }
            </style>
            <p>The public reaction to the Consumer Reports study on Instacart's pricing strategies has forced the company to "end all item price tests," which is part of their Eversight offering. (See also my previous <a href="https://www.junkcharts.com/when-will-they-personalize-pricing/" rel="noreferrer">post</a> about the study.)</p><p>Within days of the CR report, Instacart initially disputed some details of the study (<a href="https://www.instacart.com/company/updates/the-truth-about-pricing-tests-on-instacart?ref=junkcharts.com" rel="noreferrer">link</a>), but eventually announced the end of these AI pricing experiments (<a href="https://www.instacart.com/company/updates/ending-item-price-tests-on-instacart?ref=junkcharts.com" rel="noreferrer">link</a>).</p><p>These press releases provide a few more insights about what was happening.</p><p>Instacart says retailers already charge different prices at different physical stores for the same item. The revised policy does not prohibit online retailers from charging different prices for the same item based on IP addresses (or phone numbers or other ways to geo-locate shoppers). This itself is an intriguing admission. The Internet is supposed to "flatten" the world, bringing everyone closer together but is that more hype than reality? The implication of retailers duplicating brick-and-mortar practices online is an admission that the online presence (which in theory can be <em>launch once for every location</em>) has not altered location-driven economics, if we believe what they're saying.</p><p>Retail partners who feature on Instacart can subscribe to a pricing tool called "Eversight." Instacart purchased this capability from a startup called Eversight Labs in 2022.</p><p>Eversight marketing materials said the platform is designed for retailers to run "millions of tests" "all the time". Instacart claimed the pricing experiments only "10 of its retail partners" use Eversight, according to CR. Instacart suggested that using Eversight could increase sales by 1-3 percent, and margins by 2-5 percent. Lets think about how that may be possible.</p><hr><p>We first entertain a traditional test-and-learn setting, in which the price experiment involves a randomly selected subset of shoppers, and is turned on for a short period of time to collect enough samples for a statistical read of the result. </p><p>The objective of the pricing experiment is to determine the "optimal" price for an item, attained by measuring customers' price elasticity. The expected outcome of the experiment is a price adjustment; the revised price is both fixed and universal (for the population specified in the test). If the outcome is a price hike, the test result must have predicted that the loss of sales due to the higher price is more than offset by the additional revenues generated by the price hike from customers undeterred by it.</p><p>As discussed in my previous <a href="https://www.junkcharts.com/when-will-they-personalize-pricing/" rel="noreferrer">post</a>, the net improvement in revenues has to be quite large to justify trading away the comfort of inertia. For this reason, the outcome is much more likely to be a price increase than a price decrease. This behavior I think is a type of <a href="https://en.wikipedia.org/wiki/Endowment_effect?ref=junkcharts.com" rel="noreferrer">endowment effect</a> of interest to behavioral psychologists. </p><p>If the expected outcome is a price increase (or no change), it follows that the set of test price levels looks more like [base, +2%, +3%] than [-2%, base, +2%]. That's why I suggested computing the average displayed price as a way of learning whether the pricing experiment effectively raised prices. </p><p>Instacart's primary pushback on the CR study is that Eversight offers "testing," suggesting that these are temporary price changes that disappear after the tests are over. This defence makes no sense for a number of reasons.</p><p>If the retailer has no intention of changing prices, why conduct pricing tests?</p><p>If indeed list prices remain the same post-test, then the incremental revenues marketed by Eversight would have been achieved during the period of testing. Further, if the price changes were purely randomly applied, as Instacart asserted, then the set of test price levels is likely skewed toward price hikes and not price decreases.</p><p>Assuming the retailer found out from the price testing that it makes more money by raising prices by 5%, why would they not roll out the price hike?</p><p>Finally, consider the possibility that a retailer is <em>always</em> running price tests. It gives a new meaning to "testing."</p><hr><p>Additionally, it is a fallacy to think that if the test price levels are symmetric, e.g. [-2%, base, +2%], then the experiment does not alter the status quo. This is a subtle point.</p><p>The no-change scenario only materializes if the tested prices do not affect consumer behavior. For example, test prices of [-1 cent, base, +1 cent] most likely result in all three test subgroups exhibiting the same buying propensity. This is of course a silly test.</p><p>A more probable outcome is transactions shift in inversely proportion to the price changes. The +2% subgroup buys fewer units while the -2% subgroup gets more units. The price elasticity may be nonlinear, in which case the total revenues obtained during the test may be higher, or lower, than the pre-test amount. </p><p>Running tests "all the time" only makes sense if the vendor is confident that these tests in aggregate improves business outcomes. This setting is incompatible with the idea of an unbiased test with symmetric price levels. If management has such a crystal ball, they should just implement the price changes, without a need to run tests!</p><p></p><p></p>
          ]]></content:encoded>
          <description><![CDATA[ What is the point of pricing tests if management isn&#39;t intending to change prices? ]]></description>
        </item>
        <item>
          <title><![CDATA[ VAR technology is ruining football ]]></title>
          <link>https://www.junkcharts.com/var-technology-is-ruining-football/</link>
          <guid isPermaLink="false">695fc8338ac3410001833ffa</guid>
          <category><![CDATA[ Analytics-business interaction ]]></category>
          <dc:creator><![CDATA[ Kaiser Fung ]]></dc:creator>
          <pubDate>Thu, 22 Jan 2026 10:00:41 -0500</pubDate>
          <content:encoded><![CDATA[
            <style type="text/css">
              * { box-sizing: border-box !important; }
              img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                display: block !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              figure {
                margin: 0 !important;
                padding: 0 !important;
                max-width: 100% !important;
                width: 100% !important;
                overflow: hidden !important;
              }
              .kg-image, .kg-image-card img, .kg-gallery-image img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                object-fit: contain !important;
              }
              .kg-image-card, .kg-gallery-card {
                max-width: 100% !important;
                width: 100% !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              .post-content, .gh-content {
                max-width: 100% !important;
                overflow-x: hidden !important;
              }
              div, p, span {
                max-width: 100% !important;
              }
            </style>
            <p>I have been watching quite a bit of football (mostly Italy's Serie A) recently, and it's become obvious that VAR technology is ruining the beautiful game.</p><p>Goal celebrations have always been fun to watch, especially after spectacular <em>golazos</em>. But nowadays, many an extravagant exultation is now fake. Because the goal is subject to a VAR review, which is a microscopic, backroom operation far away from the action on the field. The referee's (human) decision is not final. VAR's decision, made by VAR referees who are aided by technology, is final.</p><p>That's not the official word. But in reality, that's what's happening. The VAR either agrees with the referee's decision, or it doesn't. When the VAR dissents, depending on the situation, either it directly overturns the referee's ruling on the field, or it sends the referee to a viewing booth. The referee's walk to the viewing booth is always rejoiced by the team that is on the short end of the original decision; in almost every case, the referee accepts the VAR's view. Therefore, effectively, VAR is the real referee; the on-the-field referee is the intern standing in for the boss.</p><p>This process ruins goal celebrations. The player will celebrate but he knows it's just an act because until and unless VAR accepts the decision, the goal is not official. The review process may take many minutes, particularly if the alleged infraction is a matter of centimetres. We watch the players theatrically arguing for their respective cases. If the goal is ultimately allowed, it feels weird. Should the player restart the dance of joy? If VAR takes away the goal, the already-seen celebration has turned into a caricature. Fans switch from excitement to disbelief, then to anger. </p><p>Spontaneity is the casualty. </p><hr><p>Despite the orthodoxy, in some cases, it's not clear the VAR decision, even when aided by video, is the better one. </p><p>A couple of recent examples.</p><p>During an Atalanta-Roma match on Jan 3 2026, Atalanta's forward Scamacca scored a scorching header from right in front of goal, heading in a cross from the left side. Eventually, the goal was annulled by VAR officials for an "off-side" violation. This review took forever. </p><p>The off-side violation took place multiple passes before the final shot. That moment had no bearing on the ultimate goal, other than Scamacca was momentarily in an off-side position while he was near the midfield circle, that is to say, <em>where</em> he was judged off-side was nowhere near the spot from which he scored, and <em>when</em> he was judged off-side was long before his teammate sent the cross to meet his head.</p><p>It was worse than that... because it was an opposing player who gifted Scamacca the ball near mid-field. If one takes the official view, the entire sequence started when an Atalanta player (in blue) attempted to pass the ball to Scamacca in mid-field. </p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/01/ata_rom_official_offside_claim.png" class="kg-image" alt="" loading="lazy" width="1924" height="826" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/2026/01/ata_rom_official_offside_claim.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/2026/01/ata_rom_official_offside_claim.png 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1600/2026/01/ata_rom_official_offside_claim.png 1600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/01/ata_rom_official_offside_claim.png 1924w" sizes="(min-width: 720px) 720px"></figure><p>If Scamacca had received that pass, dribbled the ball towards goal, and scored, then the goal should be disallowed because of the off-side rule. No complaint.</p><p>But the Atalanta defender kicked a very poor ball that was way out of Scamacca's reach. ("Unreachable" would have been the call if this is NFL.) In fact, the ball went straight to a Roma player (in white), who failed to control the ball, gifting it to Scamacca in a backward "pass". So in fact, Scamacca did not receive the ball in an off-side position from his own teammate; he took a gift from the opponent, in which case off-side wasn't even pertinent. </p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/01/ata_rom_errant_long_pass.png" class="kg-image" alt="" loading="lazy" width="2000" height="1048" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/2026/01/ata_rom_errant_long_pass.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/2026/01/ata_rom_errant_long_pass.png 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1600/2026/01/ata_rom_errant_long_pass.png 1600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/01/ata_rom_errant_long_pass.png 2020w" sizes="(min-width: 720px) 720px"></figure><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/01/ata_rom_loose_ball_pass.png" class="kg-image" alt="" loading="lazy" width="1934" height="994" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/2026/01/ata_rom_loose_ball_pass.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/2026/01/ata_rom_loose_ball_pass.png 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1600/2026/01/ata_rom_loose_ball_pass.png 1600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/01/ata_rom_loose_ball_pass.png 1934w" sizes="(min-width: 720px) 720px"></figure><p>Scamacca then dribbled the ball half way to goal, then sent it to a teammate left. It then went to another teammate, who dribbled it to the goal-line and sent in an excellent cross. </p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/01/ata_rom_pass_goalline.png" class="kg-image" alt="" loading="lazy" width="1948" height="794" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/2026/01/ata_rom_pass_goalline.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/2026/01/ata_rom_pass_goalline.png 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1600/2026/01/ata_rom_pass_goalline.png 1600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/01/ata_rom_pass_goalline.png 1948w" sizes="(min-width: 720px) 720px"></figure><p>By this time, Scamacca had positioned himself right in front of goal, and headed the cross in. The referee immediately signaled goal. Celebration ensued. The (home) stadium erupted.</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/01/ata_rom_cross_header.png" class="kg-image" alt="" loading="lazy" width="2000" height="792" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/2026/01/ata_rom_cross_header.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/2026/01/ata_rom_cross_header.png 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1600/2026/01/ata_rom_cross_header.png 1600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/01/ata_rom_cross_header.png 2080w" sizes="(min-width: 720px) 720px"></figure><p>Even after the match was over, the officials' official stuck by the VAR ruling (<a href="https://www.yardbarker.com/soccer/articles/referees_confirm_atalanta_1_0_roma_had_correct_decisions_on_controversial_goals/s1_17344_43290193?ref=junkcharts.com" rel="noreferrer">link</a>). With the greatest benefit of doubt, this decision can only be justified if we go by some ludicrously strict interpretation of the rules of the game. If they go down this path on every goal, the players might as well wait till the VAR officials had their say before starting their goal celebrations. Spectators can expect five to ten minutes' delay to confirm every goal. (This also has the side effect of adding loads of "injury time" to the end of each half, another negative.)</p><p>This process is inherently unfair because not even the officials' official would advocate combing through the rulebook word by word to adjudicate every goal. At best, they might do this when there is obvious controversy. This is precisely why I was so annoyed with that Atalanta-Roma decision. They are not targeting controversial goals. Almost everyone who watched that match would have accepted it as a clear goal, before the VAR process drowned us in minutiae. </p><p>In my understanding, the spirit of the off-side rule is to prevent the striker from gaining undue advantage by camping out behind the defense. Nothing of this sort was happening there.</p><p>Further, the above goal was an exceptional team and individual effort, a fantastically conceived and executed sequence. Now, all that is in the dustbin of history, soon to be completely forgotten. I really fail to see how this use of VAR technology improved the experience.</p><p>(I previously <a href="https://www.junkcharts.com/illusion-of-perfection/" rel="noreferrer">wrote</a> about how VAR technology leads to "off-side" calls by a finger nail, which sends another bunch of beautiful goals to the dustbin, for the bragging title of I-go-strictly-by-the-book.)</p><hr><p>A second recent example. In the Lazio-Fiorentina match on Jan 7, 2026, Fiorentina was rewarded a penalty kick in the dying minutes after a VAR review that overturned the on-field referee's original decision (of no penalty). </p><p>The video technology compiled a sequence of views to justify its decision. It showed that Gudmundson, the Fiorentina striker (in purple), fell to ground inside the penalty box with the Lazio defender (in light blue) hot on his back. There was no doubt that there was a tangle of legs. The Lazio player fell first but it appeared that his leg might have obstructed Gudmundson, causing him to lose balance. </p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/01/laz_fio_penalty_or_no.png" class="kg-image" alt="" loading="lazy" width="512" height="528"></figure><p>The intern (oops, I mean, the referee) was summoned to watch the video replay. The referee had signaled no penalty, which meant he had determined that Gudmundson deliberately put his leg out to touch the Lazio player, simulating a fall. As expected, he decided to change his call and rewarded Fiorentina a penalty. </p><p>Whether it was a foul or simulation is a question of the level of force, not a question of the relative positions of the legs. During the broadcast, viewers were shown a video replay. We kept seeing the legs tangling up. We agreed that they touched but who pushed whom? In my view, no amount of video can answer the question. </p><p>Video replays present deceptively objective views of reality. They aren't what they appear to be. The videos are compiled by the VAR technology to explain its decision. They cherry-pick the angles and vantage points to build evidence. We'll never see a video replay contradict the VAR decision. (Similarly in tennis, no replay will ever show a ball falling inside the line, if the line-calling bot has said it's out!)</p><hr><p>This subject is near and dear to me because it's a real-world example of what happens when we use automated models to make real-life decisions.</p><p>Machines are only valuable if their conclusions differ from humans. If the machine always agree with humans, we don't need it. When the machines disagree with humans, we have two disagreeing points of view. How should a final decision be made?</p><p>Machines don't have special access to reality not visible to humans. Machines embody "models" of reality. These models express assumptions when there are not enough data. Embodied models are rarely explained, so these assumptions are not exposed, and thus not reviewable. In the case of VAR, particularly in the video replay cartoons, audiences have never been informed even one of the many assumptions that must have been adopted by the modelers. </p><p>Because they use models, machines can also make mistakes. They also have built-in biases, just possibly different biases than those found in humans. There will be certain aspects that human senses may work better, e.g. in judging the amount of force applied.</p><p>Machines have advantages, such as not subject to the variability between human referees. Think about that for a moment. We have made a trade-off: we agree to standardize on a single point of view (held by the developers of the technology); it's not that the problem of different opinions has vanished, we make it go away by adopting one viewpoint. It's like employing the same human referee for all matches.</p><p> </p>
          ]]></content:encoded>
          <description><![CDATA[ Two examples of calls that don&#39;t improve the experience ]]></description>
        </item>
        <item>
          <title><![CDATA[ The smell of pie charts ]]></title>
          <link>https://www.junkcharts.com/the-smell-of-pie-charts/</link>
          <guid isPermaLink="false">696d61a201d09700012d62ea</guid>
          <category><![CDATA[ Pie chart ]]></category>
          <dc:creator><![CDATA[ Kaiser Fung ]]></dc:creator>
          <pubDate>Mon, 19 Jan 2026 09:44:06 -0500</pubDate>
          <content:encoded><![CDATA[
            <style type="text/css">
              * { box-sizing: border-box !important; }
              img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                display: block !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              figure {
                margin: 0 !important;
                padding: 0 !important;
                max-width: 100% !important;
                width: 100% !important;
                overflow: hidden !important;
              }
              .kg-image, .kg-image-card img, .kg-gallery-image img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                object-fit: contain !important;
              }
              .kg-image-card, .kg-gallery-card {
                max-width: 100% !important;
                width: 100% !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              .post-content, .gh-content {
                max-width: 100% !important;
                overflow-x: hidden !important;
              }
              div, p, span {
                max-width: 100% !important;
              }
            </style>
            <p>So many experts have been dumping on <a href="https://www.junkcharts.com/tag/pie-chart/" rel="noreferrer">pie charts</a> for so long that they have acquired a stench. People still want to make pie charts but they're worried about rattling the experts. So they put on lipstick.</p><p>Reader Kirsten P. sent me this graphic by Minneapolis Star Tribune, illustrating the disproportionate forces the Federal Government has sent to a metropolitan region of the country. The graphic accompanies an article which gives further background (<a href="https://www.msn.com/en-us/news/us/homeland-security-presence-in-minnesota-dwarfs-twin-cities-largest-police-forces/ar-AA1U9Cid?ref=junkcharts.com" rel="noreferrer">link</a>).</p><p>At first sight, this looks like a new kind of chart: part <a href="https://www.junkcharts.com/tag/dot-plot/" rel="noreferrer">dot plot</a> and part <a href="https://www.junkcharts.com/tag/pie-chart/" rel="noreferrer">pie chart</a>.</p><p>Not really. You're looking at two side-by-side <a href="https://www.junkcharts.com/tag/pie-chart/" rel="noreferrer">pie charts</a>. Nothing more, nothing less. The dots play no role other than to disguise these pie charts. A text note even tells readers that each dot represents one officer. I'll take a wild guess: no one is out there counting dots.</p><hr><p>Here I strip away the dots:</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/01/redo_mst_twincitiesforces.png" class="kg-image" alt="" loading="lazy" width="1800" height="1074" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/2026/01/redo_mst_twincitiesforces.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/2026/01/redo_mst_twincitiesforces.png 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1600/2026/01/redo_mst_twincitiesforces.png 1600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/01/redo_mst_twincitiesforces.png 1800w" sizes="(min-width: 720px) 720px"></figure><p>The pie on the left shows federal forces in the Twin Cities. (The original label says Minnesota for an unknown reason but in the article, they wrote "Homeland Security Secretary Kristi Noem told Fox News on Jan. 6 that her agency sent 2,000 Immigration and Customs Enforcement agents to the Twin Cities" which referred to Twin Cities.) </p><p>The pie on the right depicts the sizes of the top 10 metro police forces within the Twin Cities area. The point is that the two pies are roughly the same size (the right pie should be about 1/6th smaller.) [Maybe there is a bigger point: how the Republican trifecta of president, Congress and Supreme Court is imposing "big government", and in particular, federal authority over states, both supposedly misguided Democratic policies. Confusing, no?]</p><hr><p>Inside the chart is a puzzle. What do the <a href="https://www.junkcharts.com/tag/color/" rel="noreferrer">colors</a> on the right signify?</p><p>I suppose Minnesotans will ace this test but as someone not from there, it took me a while to figure this out.</p><p>My first guess is red for Republican and blue for Democrat. This doesn't feel right as there shouldn't be many Republican counties in a metropolitan area. (And indeed, this guess does not pan out.)</p><p>This puzzle unlocks when I noticed the label "MSP": it is out of place because M = Minneapolis and SP = St Paul, each of which has its own slice of the pie. MSP is also the acronym for the airport, and the airport probably has its own police force.</p><p>Thus, the red slices are police forces belonging to counties while the blue slices are entities other than counties. Squinting harder, one can differentiate two shades of blue. MSP and Metro Transit are the lighter blue while the other blue show police forces associated with cities.</p><hr><p>The reader's attention is drawn to the divisions within each pie when the article's story is about the difference in the sizes of the two pies.</p><p>Here's a version that points the readers directly at the story:</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/01/junkcharts_redo_twincitiesforces_2.png" class="kg-image" alt="" loading="lazy" width="1334" height="988" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/2026/01/junkcharts_redo_twincitiesforces_2.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/2026/01/junkcharts_redo_twincitiesforces_2.png 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/01/junkcharts_redo_twincitiesforces_2.png 1334w" sizes="(min-width: 720px) 720px"></figure><p></p><p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p>
          ]]></content:encoded>
          <description><![CDATA[ An attempt to deodorize ]]></description>
        </item>
        <item>
          <title><![CDATA[ The failed coup against standardized testing ]]></title>
          <link>https://www.junkcharts.com/the-failed-coup-against-standardized-testing/</link>
          <guid isPermaLink="false">694caf69519f3800015d6c21</guid>
          <category><![CDATA[ Education ]]></category>
          <dc:creator><![CDATA[ Kaiser Fung ]]></dc:creator>
          <pubDate>Wed, 14 Jan 2026 09:06:34 -0500</pubDate>
          <content:encoded><![CDATA[
            <style type="text/css">
              * { box-sizing: border-box !important; }
              img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                display: block !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              figure {
                margin: 0 !important;
                padding: 0 !important;
                max-width: 100% !important;
                width: 100% !important;
                overflow: hidden !important;
              }
              .kg-image, .kg-image-card img, .kg-gallery-image img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                object-fit: contain !important;
              }
              .kg-image-card, .kg-gallery-card {
                max-width: 100% !important;
                width: 100% !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              .post-content, .gh-content {
                max-width: 100% !important;
                overflow-x: hidden !important;
              }
              div, p, span {
                max-width: 100% !important;
              }
            </style>
            <p>Standardized testing such as the SAT and ACT has attracted some relentless critics over the years. These adversaries argue that the tests are biased against minorities and the less well-to-do. Recently, many top U.S. colleges have been running an experiment, as they suspended the SAT/ACT requirement for admissions since Covid-19.</p><p>By the end of 2025, one school after another has backtracked and re-instated the standardized test requirement. MIT (2022, <a href="https://mitadmissions.org/blogs/entry/we-are-reinstating-our-sat-act-requirement-for-future-admissions-cycles/?ref=junkcharts.com">link</a>), Yale (2024, <a href="https://abcnews.go.com/US/yale-reintroduces-standardized-test-requirement-expands-list-test/story?id=107446872&ref=junkcharts.com">link</a>), Harvard (2024, <a href="https://news.harvard.edu/gazette/story/2024/04/harvard-announces-return-to-required-testing/?ref=junkcharts.com">link</a>), Princeton (2025, <a href="https://www.dailyprincetonian.com/article/2025/10/princeton-news-sat-act-standardized-test-optional-required-admissions?ref=junkcharts.com">link</a>; Alma mater, why so late?), and Stanford (2025, <a href="https://stanforddaily.com/2025/08/06/stanford-to-continue-legacy-admissions-reinstate-standardized-test-requirements/?ref=junkcharts.com">link</a>) have all changed their minds. Every school that reversed course pointed to data showing that the cohorts that have enrolled during the test-optional period are plainly unprepared for college. </p><p>In this Wall Street Journal <a href="https://www.wsj.com/opinion/a-math-horror-show-at-cal-at-san-diego-c91f2035?ref=junkcharts.com" rel="noreferrer">editorial</a>, they cited a dismal finding from the University of California (a top public university system):</p><blockquote>About half of UC campus math chairs say that the “number of first-year students that are unable to start in college-level precalculus”—which used to be a standard course for California’s top high school sophomores—doubled over the last five years.</blockquote><p>Wait, how about the other half of the campuses?</p><blockquote>The other half of chairs said the number tripled.</blockquote><p>Harvard started offering a remedial high-school math course (<a href="https://www.math.harvard.edu/course/ma5/?ref=junkcharts.com" rel="noreferrer">link</a>) to incoming first-year students to help them catch up. University of California, San Diego launched its remedial course 10 years ago, and recently saw enrollment jump 10 times (<a href="https://www.insidehighered.com/news/quick-takes/2025/11/12/uc-san-diego-sees-students-math-skills-plummet?ref=junkcharts.com">link</a>).</p><hr><p>The move to drop standardized testing was always going to be a disaster. As someone who has read applications (for graduate schools), it's clear that test scores represent the only item in an applicant's file that is interpretable. </p><p>High school GPAs are meaningless because the admissions officer lacks any context to interpret the data. Like college professors, high school teachers are giving away top grades like Halloween candy. </p><p>Transcripts present the same problem with GPAs – no context. There is never enough time to read course titles to infer what level they are at, and certainly no point staring at individual grades, having no idea what proportion of the class received the same grades.</p><p>Teacher references are not much better. Most references are vapidly nice, without distinguishing one student from another. </p><p>Once in a blue moon, you come across a teacher slamming a student. My first reaction is: why did the teacher bother to write it? My second reaction is: Poor student, who mistakenly assumed that s/he was on good terms with said teacher. My third reaction is: what a mess! Two vapidly nice ref + one viciously cruel ref = ?? The truth is I know neither the student nor the reference writers, and I don't feel like choosing who to believe.</p><p>Essays are somewhat informative but the prevalence of hired help, coupled with  the availability of AI writers, ensure that many essays do not say much real about the applicants. It's also a medium that favors those with better writing, and story-telling skills. Besides, this medium favors the well-to-do, who can afford more expensive coaches, and send their kids to far-away places for save-the-world type experiences. (Somehow, those critics who like to bash standarized test biases are quiet about obvious biases of other forms of assessment, such as essays.)</p><p>Without the standardized test scores, the application portfolio contains only subjective items. Picking one applicant over another is an act of randomness. No wonder the colleges admitted under-prepared students during the test-optional period.</p><p></p><p></p><p></p><p></p>
          ]]></content:encoded>
          <description><![CDATA[ Why the SAT found its way back in the admissions offices ]]></description>
        </item>
        <item>
          <title><![CDATA[ Know your data 47: bait and switch prices ]]></title>
          <link>https://www.junkcharts.com/know-your-data-47-bait-and-switch-prices/</link>
          <guid isPermaLink="false">6965ef9b8ac34100018340dd</guid>
          <category><![CDATA[ pricing ]]></category>
          <dc:creator><![CDATA[ Kaiser Fung ]]></dc:creator>
          <pubDate>Tue, 13 Jan 2026 09:08:01 -0500</pubDate>
          <content:encoded><![CDATA[
            <style type="text/css">
              * { box-sizing: border-box !important; }
              img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                display: block !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              figure {
                margin: 0 !important;
                padding: 0 !important;
                max-width: 100% !important;
                width: 100% !important;
                overflow: hidden !important;
              }
              .kg-image, .kg-image-card img, .kg-gallery-image img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                object-fit: contain !important;
              }
              .kg-image-card, .kg-gallery-card {
                max-width: 100% !important;
                width: 100% !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              .post-content, .gh-content {
                max-width: 100% !important;
                overflow-x: hidden !important;
              }
              div, p, span {
                max-width: 100% !important;
              }
            </style>
            <p>Long-time reader Mark P. sent me to this Guardian investigation of pricing discrepancies in "dollar stores" in the U.S. (<a href="https://www.theguardian.com/us-news/2025/dec/03/customers-pay-more-rising-dollar-store-costs?ref=junkcharts.com" rel="noreferrer">link</a>)</p><p>The dollar store is really the five-dollar store these days. I don't think one can find anything in a dollar store that costs a dollar!</p><p>The reporter did a great job finding customers and former employees of these stores to tell their stories. Many of the customers live in isolated areas, and on tight budgets. A dollar store is often the only retailer within walking distance to where they live, and so in a sense, they represent a captive audience.</p><p>The headline is bait-and-switch pricing. It turns out that these dollar stores (the article names two chains: Dollar General, and Family Dollar) frequently charges customers higher prices than the advertised shelf prices. </p><p>It's not a one-time anomaly. In some stores, as much as 80% of the register prices have been found to be higher than the respective shelf prices! The differences aren't mere rounding up. Examples given in the article include $5 frozen pizzas charged $7.65, and $11 npaper towels charged $15.50. Even after authorities have complained about this practice, and even assessed penalties, many such stores continue to bait and switch.</p><p>The situation is really shameful:</p><blockquote>Dollar General stores have failed more than 4,300 government price-accuracy inspections in 23 states since January 2022, a Guardian review found. Family Dollar stores have failed more than 2,100 price inspections in 20 states over the same time span, the review found.</blockquote><hr><p>"Industry watchers" want our pity. They claim that stores do not have sufficient staff to update shelf prices, leading to pricing discrepancies. In other words, they're saying that the register prices are correct, and the shelf prices are incorrect.</p><p>For the data nerds, this admission raises grave concerns about data collection. Imagine wanting to collect prices to estimate inflation. One method is to send people into stores to jot down prices. The data correspond to shelf prices. We now know that in dollar stores, the shelf prices may be much lower than the actual prices paid by consumers. So, the collected data are inaccurate.</p><p>Reading between the lines, we learn from the Guardian article that government bean-counters know about this issue. This is why they conduct inspections that have uncovered these price discrepancies. </p><hr><p>The red-faced publicists for these stores had more to say:</p><blockquote>[the Dollar General's] store teams “are empowered to correct the matter on the spot.”</blockquote><p>This statement contradicts the other claim. If we believed the industry insiders cited above, the correct prices are the register prices, so there is nothing to "correct." By "correcting the matter," they must mean charging customers the shelf prices instead of the register prices. So, they "correct" the matter by charging the <em>incorrect</em> prices. </p><p>Is this a Freudian slip? Did they admit that the shelf prices are real, and they overcharge unsuspecting customers? If and when this bait-and-switch scheme is noticed, they will do the right thing. </p><hr><p>The consumer advocates are equally confused. Since they use the term "overcharges," they must believe that the shelf prices are correct while the register prices are incorrect. If that is the case, then they can't accept that the cause of these "overcharges" is how the stores don't have the staff to update the shelf prices! But they swallowed that whole.</p><p></p><p></p>
          ]]></content:encoded>
          <description><![CDATA[ Can we measure inflation using in-store prices? ]]></description>
        </item>
        <item>
          <title><![CDATA[ The largest gambling market in Europe, and the largest online ]]></title>
          <link>https://www.junkcharts.com/the-largest-gambling-market-in-europe-and-the-largest-online/</link>
          <guid isPermaLink="false">695abbdd519f3800015d6d2d</guid>
          <category><![CDATA[ visual storytelling ]]></category>
          <dc:creator><![CDATA[ Kaiser Fung ]]></dc:creator>
          <pubDate>Wed, 07 Jan 2026 09:37:43 -0500</pubDate>
          <content:encoded><![CDATA[
            <style type="text/css">
              * { box-sizing: border-box !important; }
              img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                display: block !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              figure {
                margin: 0 !important;
                padding: 0 !important;
                max-width: 100% !important;
                width: 100% !important;
                overflow: hidden !important;
              }
              .kg-image, .kg-image-card img, .kg-gallery-image img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                object-fit: contain !important;
              }
              .kg-image-card, .kg-gallery-card {
                max-width: 100% !important;
                width: 100% !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              .post-content, .gh-content {
                max-width: 100% !important;
                overflow-x: hidden !important;
              }
              div, p, span {
                max-width: 100% !important;
              }
            </style>
            <p>In a prior <a href="https://www.junkcharts.com/forced-roommates-favoritism-and-more-in-data-visualization" rel="noreferrer">post</a>, I discussed why the dual-axes chart about the European gambling market is mind-boggling. </p><p>Here is an alternative visualization that focuses on the story behind the dataset:</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/01/junkcharts_redo_europegamblingmarket.png" class="kg-image" alt="" loading="lazy" width="1658" height="1090" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/2026/01/junkcharts_redo_europegamblingmarket.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/2026/01/junkcharts_redo_europegamblingmarket.png 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1600/2026/01/junkcharts_redo_europegamblingmarket.png 1600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2026/01/junkcharts_redo_europegamblingmarket.png 1658w" sizes="(min-width: 720px) 720px"></figure><p>I'd like to center attention first to each country's share of gross gambling revenues. The top five are Italy, U.K., Germany, France and Spain, each accounting for 10-18% of the market. Everybody else is relatively insignificant, with less than 5% share.</p><p>The next important insight from the data is the over/under performance of the online sector compared to aggregate. I decided to use only the online data series because online better implies offline worse, and vice versa.</p><p>The countries are divided into two groups, those with online share higher than their aggregate share (shown in purple), and those with online share smaller than their aggregate share (shown in orange).</p><p>For example, Italy's overall share is about 1% but its online share is only 11%. By contrast, the U.K.'s overall share is 17% while its online share is 26%.</p><p>I'm using a different measure of online share from the designer of the original. On my chart, "online share" is each country's share of the aggregate European online gambling revenues. The total of these online shares sum to 100%. On the original chart, "online share" is defined as online's share of total gambling revenues within each country. The total of these online shares across countries is meaningless. The online share and offline share sum to 100% for each country.</p>
          ]]></content:encoded>
          <description><![CDATA[ Kaiser re-imagines the chart about Europe&#39;s gambling revenues. ]]></description>
        </item>
        <item>
          <title><![CDATA[ Self-cancelling actions ]]></title>
          <link>https://www.junkcharts.com/self-cancelling-actions/</link>
          <guid isPermaLink="false">695cb757519f3800015d6dd1</guid>
          <category><![CDATA[ business analytics ]]></category>
          <dc:creator><![CDATA[ Kaiser Fung ]]></dc:creator>
          <pubDate>Tue, 06 Jan 2026 09:43:39 -0500</pubDate>
          <content:encoded><![CDATA[
            <style type="text/css">
              * { box-sizing: border-box !important; }
              img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                display: block !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              figure {
                margin: 0 !important;
                padding: 0 !important;
                max-width: 100% !important;
                width: 100% !important;
                overflow: hidden !important;
              }
              .kg-image, .kg-image-card img, .kg-gallery-image img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                object-fit: contain !important;
              }
              .kg-image-card, .kg-gallery-card {
                max-width: 100% !important;
                width: 100% !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              .post-content, .gh-content {
                max-width: 100% !important;
                overflow-x: hidden !important;
              }
              div, p, span {
                max-width: 100% !important;
              }
            </style>
            <p>One use of machine vision is to monitor road intersections to catch unruly drivers. The advantages of using machines over humans are plenty. Cops can only scope out a limited number of intersections for a limited duration of time while AI cameras can find every last offender at every intersection at all times of the day and night.</p><p>One possible business model for the AI company is to split "revenues" from traffic tickets issued for these infractions with local governments, e.g. for a $100 fine, the AI company gets $30.</p><p>This type of business model is challenging because of a self-cancelling property. After word gets out that a certain intersection is being monitored 24/7 by AI, most drivers will react by curbing their rule-breaking behavior, or taking a different, un-surveilled route. These counter-actions reduce the number of fines accessed, which depresses the amount of sharable revenues.</p><p>Ironically, to prove the success of such policies, one should look for a revenue reduction, not revenue growth. (There might be an initial spurt in fines before most drivers become aware of the AI traffic cop.)</p><hr><p>The Trump Republican tariffs have the same self-cancelling property.</p><p>If the tariffs are sufficiently high so that imported goods become unaffordable, then consumers will switch to domestic suppliers. This behavioral shift should reduce the amount of imports, which in turn should suppress tariff collection.</p><p>Thus, if the policy is successful, we should observe lower imports, and less tariffs.</p><p>If tariffs go up instead of down, it implies that consumers are paying higher prices than before for the imported products, probably because no domestic substitutes are available. (There might be an initial spurt before prices catch up to the new reality, before businesses run out of workarounds, or before consumer behavior changes.)</p><p>If the stated goal of returning manufacturing to the U.S. is achieved, there should be fewer imports, and thus less tariffs collected.</p><hr><p>The most notorious self-cancelling product is 100% effective medication. </p><p>If a new medicine is 100% effective, then patients are cured, removing the need to buy more drugs. The pharmaceutical company will eventually suffer a collapse of this line of business. This is why many observers believe that pharmas don't have a strong incentive to cure any disease.</p>
          ]]></content:encoded>
          <description><![CDATA[ And how to measure their effects ]]></description>
        </item>
        <item>
          <title><![CDATA[ Forced roommates, favoritism, and more in data visualization ]]></title>
          <link>https://www.junkcharts.com/forced-roommates-favoritism-and-more-in-data-visualization/</link>
          <guid isPermaLink="false">695aabe1519f3800015d6c66</guid>
          <category><![CDATA[ dual axes ]]></category>
          <dc:creator><![CDATA[ Kaiser Fung ]]></dc:creator>
          <pubDate>Mon, 05 Jan 2026 09:36:40 -0500</pubDate>
          <content:encoded><![CDATA[
            <style type="text/css">
              * { box-sizing: border-box !important; }
              img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                display: block !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              figure {
                margin: 0 !important;
                padding: 0 !important;
                max-width: 100% !important;
                width: 100% !important;
                overflow: hidden !important;
              }
              .kg-image, .kg-image-card img, .kg-gallery-image img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                object-fit: contain !important;
              }
              .kg-image-card, .kg-gallery-card {
                max-width: 100% !important;
                width: 100% !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              .post-content, .gh-content {
                max-width: 100% !important;
                overflow-x: hidden !important;
              }
              div, p, span {
                max-width: 100% !important;
              }
            </style>
            <p>Long-time reader Antonio R submitted the featured chart shown above.</p><p>From a visual perspective, the chart is overly ambitious. </p><p>It uses dual <a href="https://www.junkcharts.com/tag/axis/" rel="noreferrer">axes</a>, which is almost always a bad idea. The left-aside axis is related to the orange <a href="https://www.junkcharts.com/tag/line-chart/" rel="noreferrer">line</a>, which depicts the "online share" of gambling revenues within each country, expressed as percentages. The right-side axis concerns the stacked <a href="https://www.junkcharts.com/tag/bar-chart/" rel="noreferrer">columns</a>, which display each country's total gambling revenues, split by online (green) versus offline.</p><p>What are the reasons why this chart is mentally taxing?</p><p><em>Dual axes</em> </p><p>How is the reader supposed to figure out which axis pairs with which chart? The two <a href="https://www.junkcharts.com/tag/axis/" rel="noreferrer">axis</a> titles are "Online share (%)" and "Gross Gambling Revenues (€bn)". We'd have to move our eyes to the bottom of the chart, read the <a href="https://www.junkcharts.com/tag/legend/" rel="noreferrer">legend</a> <a href="https://www.junkcharts.com/tag/text/" rel="noreferrer">text</a>, and then mentally connect those with the axis titles. The "online share" gives us the first hint, then we presume that the "land-based revenue" and the "online revenue" must be the components of  "gross gambling revenues". Without that legend, we'd have been lost.</p><p><em>Redundancy</em></p><p>The online shares of revenues depicted in orange refers to the green sections of the columns. The orange and green objects are re-<a href="https://www.junkcharts.com/tag/scale/" rel="noreferrer">scaled</a> versions of the same revenues. Use the same color to represent the same quantity.</p><p><em>Stack order</em></p><p>Given the focus on online gambling revenues, the green sections of the column chart should be placed at the bottom of the columns. The bottom layer of a stacked column chart is the only layer with a uniform base, making it the easiest to read.</p><p><em>Forced roommates</em></p><p>Notice that the two axes share the same set of gridlines. Because of this arrangement, it is as if 50% equals €16 bn. That would be true if the two data series were re-scaled versions of the same underlying data but on this chart, the "Online share" is a re-scaled version of one component of "Gross gambling revenues", and therefore they represent different data. Since the total revenues vary by country, 50% share maps to a different amount in each country so there does not exist a set of gridlines that can meet the desired sharing objective.</p><p>The graphing software has taken on the hopeless role of assigning roommates. It wants gridlines for both axes but having two sets of competing gridlines would kill many brain cells. It decides to make the early-rising athlete share a room with the night-owl hacker; they are just going to have to make it work.</p><p>How does the software designer decide where to put the shared <a href="https://www.junkcharts.com/tag/gridlines/" rel="noreferrer">gridlines</a>? One way is to fix the grid of the primary axis (left-side). This sets the number of lines on the chart. Now, choose a <a href="https://www.junkcharts.com/tag/scale/" rel="noreferrer">scale</a> for the other data series so that the grid labels on the secondary axis are the "least ugly".</p><p>No matter what the designer does, the final gridlines serve one side better than the other.</p><p><em>Favoritism</em></p><p>The gridlines favor the orange series, and so does the <a href="https://www.junkcharts.com/tag/sorting/" rel="noreferrer">sorting</a> of countries. </p><p>When you have two data series, you can sort with respect to one series, not both (unless they are perfectly correlated in rank). In the featured chart, the countries are sorted by the online share of gambling revenues. </p><p>This sorting scheme arranges the other data series awkwardly, as it turns out, four of the top five markets have seen low online penetration. Italy, the largest market, ends up on the right side of the chart.</p><p>If you pay attention to the green sections only, you'll learn that the online segment in Italy (in terms of Euros) is still the second largest behind that of the U.K. The peculiar sorting scheme highlights five countries that have small gross gambling revenues.</p><hr><p>There is a good story behind this data. The top markets are much larger than the rest; most of these countries (except the United Kingdom) are stronger in offline than online segments. </p><p>In a future post, I'll offer an alternative view of this dataset.</p><p>P.S. [1/7/2026] Next post is <a href="https://www.junkcharts.com/the-largest-gambling-market-in-europe-and-the-largest-online/" rel="noreferrer">here</a>.</p><p></p>
          ]]></content:encoded>
          <description><![CDATA[ Why is this dual-axes chart so taxing to read? ]]></description>
        </item>
        <item>
          <title><![CDATA[ Making online help more helpful ]]></title>
          <link>https://www.junkcharts.com/making-online-help-more-helpful/</link>
          <guid isPermaLink="false">69483142519f3800015d6039</guid>
          <category><![CDATA[ Algorithms ]]></category>
          <dc:creator><![CDATA[ Kaiser Fung ]]></dc:creator>
          <pubDate>Wed, 24 Dec 2025 09:11:02 -0500</pubDate>
          <content:encoded><![CDATA[
            <style type="text/css">
              * { box-sizing: border-box !important; }
              img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                display: block !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              figure {
                margin: 0 !important;
                padding: 0 !important;
                max-width: 100% !important;
                width: 100% !important;
                overflow: hidden !important;
              }
              .kg-image, .kg-image-card img, .kg-gallery-image img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                object-fit: contain !important;
              }
              .kg-image-card, .kg-gallery-card {
                max-width: 100% !important;
                width: 100% !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              .post-content, .gh-content {
                max-width: 100% !important;
                overflow-x: hidden !important;
              }
              div, p, span {
                max-width: 100% !important;
              }
            </style>
            <p>A jocular post by Andrew (<a href="https://statmodeling.stat.columbia.edu/2025/12/20/hey-im-in-the-dictionary?ref=junkcharts.com" rel="noreferrer">link</a>) sent me on an unlikely excursion. </p><p>In the post, Andrew mused about finding quotes of himself in the online Merriam-Webster dictionary. The first example he cited was this quote:</p><blockquote>My quick answer, was, no, the <strong><em>persistence</em></strong> method would not have worked.</blockquote><p>This perked me up because of how silly this sentence is in teaching how to use the word "persistence" in a sentence.</p><p>The same sentence can be written for any number of nouns. The <em>regression</em> method, the <em>factor</em> method, the <em>omission</em> method, the <em>research </em>method, the <em>drinking</em> method, the <em>kissing</em> method, ....</p><p>Visiting the M-W page for <a href="https://www.merriam-webster.com/sentences/persistence?ref=junkcharts.com" rel="noreferrer">persistence</a>, I found a total of 27 example sentences with attribution, in addition to three generic examples without attribution.</p><p>The first attributed quotation is:</p><blockquote>Nothing in the world can take the place of <strong><em>persistence.</em></strong></blockquote><p>This has the same nature as Andrew's sentence. One can substitute "persistence" for many a noun, without destroying the sentence. This implies that the sentence cannot explain how to use the specific word "persistence".</p><p>Not all examples are useless. The following each contains enough context to learn the meaning of "persistence":</p><blockquote>These steps aren't easy, and can take some time and <strong>persistence</strong>.</blockquote><blockquote>Hall, who lives in Granbury, returned to the lake this winter and his <strong>persistence </strong>paid off on the last day of his trip.</blockquote><blockquote>The finish offers notes of black and brown spice notes and there is good <strong>persistence</strong>.</blockquote><p>It would be more helpful if Merriam-Webster grouped the examples by word sense. The third sentence shown above is distinct in using persistence to refer to lingering sensation. (The original article is found <a href="https://www.forbes.com/sites/tomhyland/2021/04/26/napa-valley-cabernet-sauvignonnew-releases/?ref=junkcharts.com" rel="noreferrer">here</a>.)</p><p>The usage by Andrew is even stranger. In the <em>Wired</em> feature, the "persistence method" is defined in the sentence immediately before the one cited by Merriam-Webster. Andrew mentioned a climate scientist who used "persistence" to describe "the assumption [used in climate models] that conditions remain unchanged from one year to the next." This word sense maps to Merriam-Webster's second <a href="https://www.merriam-webster.com/dictionary/persisting?ref=junkcharts.com" rel="noreferrer">definition</a> of "persist" (i.e., "to remain unchanged or fixed in a specified character, condition, or position"), which its editors have tagged as "obsolete." </p><p>In short, a reader can't figure out the meaning of persistence from reading Andrew's quotation. </p><hr><p>If the Merriam-Webster examples are representative, they suggest that "persistence" is most often used in the sense of a human trait, and when used in this way, authors like to pair them up with related traits and concepts, such as "patience and persistence", "vision, persistence, and sweat", "hunger and persistence", "time and persistence", and "hard work and dogged persistence". The bounty of these specimens feels redundant.</p><p>How does Merriam-Webster select these quotations? This is what they disclose:</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2025/12/merriamwebster_description.png" class="kg-image" alt="" loading="lazy" width="1448" height="260" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/2025/12/merriamwebster_description.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/2025/12/merriamwebster_description.png 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2025/12/merriamwebster_description.png 1448w" sizes="(min-width: 720px) 720px"></figure><p>I assume they use an <a href="https://www.junkcharts.com/tag/algorithm/" rel="noreferrer">algorithm</a>. I kept digging.</p><p>I investigated a quotation attributed to a Forbes feature:</p><blockquote>Coupled with <strong>persistence</strong>, passion lit a path in the sky for the WASP.</blockquote><p>This is a variation of the pattern "X and persistence" where X = "passion". But. But what is the casual reader supposed to make of "lit a path in the sky"? What is "WASP"?</p><p>If you know WASP, it's not what you're thinking. That meaning has little to do with paths in the sky. Read the Forbes <a href="https://www.forbes.com/sites/melissarowley/2021/04/14/the-dreammaker-chronicles-meet-the-woman-bringing-the-forgotten-story-of-wwiis-wasp-women-airforce-service-pilots-to-the-stage/?ref=junkcharts.com" rel="noreferrer">article</a>, and you'll learn that WASP stands for Women Airforce Service Pilots.</p><p>The word "persistence" appears in that article three other times.</p><blockquote>At the root of all dreams lies <strong>persistence</strong>.</blockquote><blockquote>The Power of <strong>Persistence</strong> and Honoring a Legacy</blockquote><blockquote>So one of the lessons I’ve learned from doing this project has definitely been <strong>persistence</strong>. I mean, they kept fighting for military status until 1977 under President Carter, and that was the first time that that happened.</blockquote><p>The algorithm evidently picked the most obstruse sentence, and also the one appearing furthest down the page. I'd have selected the third of this set - I cheated by including two sentences in the quote but without the second sentence, the meaning is elusive.</p><hr><p>How should we make an ideal section for word usage in sentences? I'd want fewer but sharper sentences; self-contained sentences, or including surrounding sentences that provide the context for comprehension; and sentences grouped by word sense.</p><p>Implementing this type of algorithm takes a lot of work. You have to deploy a "spider" or some way of compiling a collection of text from which to extract sentences. You need a search engine to find keywords. You hope your text extraction process successfully pulled down author, date and document source (not standardized across different websites). You have to design a scoring rubric to select which sentences to show. </p><p></p><p></p><p></p>
          ]]></content:encoded>
          <description><![CDATA[ Investigating an algorithm from an online dictionary ]]></description>
        </item>
        <item>
          <title><![CDATA[ Another type of algo pricing ]]></title>
          <link>https://www.junkcharts.com/another-type-of-algo-pricing/</link>
          <guid isPermaLink="false">69432dca519f3800015d5f21</guid>
          <category><![CDATA[ dynamic pricing ]]></category>
          <dc:creator><![CDATA[ Kaiser Fung ]]></dc:creator>
          <pubDate>Tue, 23 Dec 2025 09:45:25 -0500</pubDate>
          <content:encoded><![CDATA[
            <style type="text/css">
              * { box-sizing: border-box !important; }
              img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                display: block !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              figure {
                margin: 0 !important;
                padding: 0 !important;
                max-width: 100% !important;
                width: 100% !important;
                overflow: hidden !important;
              }
              .kg-image, .kg-image-card img, .kg-gallery-image img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                object-fit: contain !important;
              }
              .kg-image-card, .kg-gallery-card {
                max-width: 100% !important;
                width: 100% !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              .post-content, .gh-content {
                max-width: 100% !important;
                overflow-x: hidden !important;
              }
              div, p, span {
                max-width: 100% !important;
              }
            </style>
            <p>In my previous <a href="https://www.junkcharts.com/when-will-they-personalize-pricing/" rel="noreferrer">take</a> on "algorithmic pricing," I deliberately glossed over one nuance. </p><p>Typically, the inputs to a pricing algorithm are demographics, behavioral data (e.g. how many times the user has revisited the product page), estimated price sensitivity, and so on. Most of these are data about the individual customer. Thus, it has been shaded as "surveillance pricing."</p><p>It is also possible to build a pricing model using purely supply and demand data that do not identify individuals. How much inventory does the retailer have? What's the projected demand? How do price changes affect the demand? At what price does the retailer generate maximum profits?</p><p>Demand forecasting is likely to benefit from "surveillance" data such as the frequency of users browsing the item's page, or placing it in the shopping cart. In this setting, however, the data will be aggregated. That's because this design allows the price of an item to change over time but <strong>not</strong> to vary across individuals at a given moment.</p><p>The observations of Consumer Reports do not suggest this type of pricing algorithm; thus, I didn't mention it in the other <a href="https://www.junkcharts.com/when-will-they-personalize-pricing/" rel="noreferrer">post</a>.</p><hr><p>Dynamic pricing is normal in various industries. Airlines, hotels and the hospitality industry have long priced their products based on supply and demand data. That's why plane tickets and rooms get more expensive, the closer it is to the use date. (But excess inventory might go on fire sale for last-minute bookings.)</p><p>Customers end up paying different prices for similar products (note: never the same room or seat at the same time) but it doesn't feel unfair. That's because a room on Christmas Eve is clearly more valuable than the same room the week before. Besides, everyone who's willing to lock down the reservation months in advance get a discount. This dynamic pricing isn't offensive.</p><p>Differential grocery prices don't give the same vibes. The same can of tomatoes isn't worth more from one week to another, and sellers can order more inventory instead of hiking prices if demand exceeds expectation. </p><p>Turkey during Thanksgiving week doesn't have to be more expensive; the markets can stock up. Consider the alternative of dynamically adjusting prices. Imagine there are 10 turkeys on the shelf, and 50 shoppers will be looking to buy a turkey. If the current price is affordable to everyone, then the sales become effectively first-come, first-served. This feels fair because if you want a turkey, you hussle there before the others. If the vendor raises the price to price out 40 of the 50 shoppers, then the turkeys end up with the highest bidders. In reality, the algorithm isn't so precise but the overall effect is to sell the birds to those willing and able to pay more. </p><p>Economists may praise the dynamic pricing setting as more "optimal." It certainly maximizes the total revenues received by the sellers. The average customer of everyday items finds it unfair. Perhaps part of the opposition is against asymmetric application. I find it hard to believe that the same dynamic pricing algorithm would be allowed to lower prices in response to poor demand. Each such price adjustment is a bet by the seller that the price drop would generate sufficient additional purchases to pay for itself. It represents trading a sure thing for an uncertainty.</p><p>Interestingly, supermarkets don't tend to play with lead times, unlike airlines or hotels. Almost all grocery items have best-before dates but only a few stores I know put discounts on items that are about to expire. Why? I'm not sure. Too much administrative hassle? Too many customers shifting from paying list prices? Do you have a guess?</p>
          ]]></content:encoded>
          <description><![CDATA[ How does dynamic pricing work? ]]></description>
        </item>
        <item>
          <title><![CDATA[ When will they &quot;personalize&quot; pricing? ]]></title>
          <link>https://www.junkcharts.com/when-will-they-personalize-pricing/</link>
          <guid isPermaLink="false">693e68b17ccc52000107c211</guid>
          <category><![CDATA[ pricing ]]></category>
          <dc:creator><![CDATA[ Kaiser Fung ]]></dc:creator>
          <pubDate>Wed, 17 Dec 2025 09:17:50 -0500</pubDate>
          <content:encoded><![CDATA[
            <style type="text/css">
              * { box-sizing: border-box !important; }
              img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                display: block !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              figure {
                margin: 0 !important;
                padding: 0 !important;
                max-width: 100% !important;
                width: 100% !important;
                overflow: hidden !important;
              }
              .kg-image, .kg-image-card img, .kg-gallery-image img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                object-fit: contain !important;
              }
              .kg-image-card, .kg-gallery-card {
                max-width: 100% !important;
                width: 100% !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              .post-content, .gh-content {
                max-width: 100% !important;
                overflow-x: hidden !important;
              }
              div, p, span {
                max-width: 100% !important;
              }
            </style>
            <p>One of the oft-cited benefits of Web or mobile technology is "personalizing" the user experience. This concept starts with little conveniences such as remembering your log-in user name (via cookies). The most obvious tool of personalization is probably recommendation engines, made famous by Netflix. When a service is personalized, different users expect to encounter different experiences.</p><p>Consumers don't welcome all personalization modes. A recent survey by Consumer Reports found that 7 out of 10 respondents reject personalized pricing for groceries (<a href="https://www.consumerreports.org/money/questionable-business-practices/instacart-ai-pricing-experiment-inflating-grocery-bills-a1142182490/?ref=junkcharts.com" rel="noreferrer">link</a>). Something about paying different prices for the same can of tomatoes offend our sensibilities. Nevertheless, it's obvious that businesses will make more money if they are able to charge more for customers able or willing to pay more; and it's equally obvious that Web and mobile tools of personalization can be extended to pricing decisions. So, it's a matter of when, not if, that we will be charged different prices from our friends for the same things. </p><p>Huge props to Consumer Reports for conducting a rigorous study that confirms something many of us already suspect is happening: personalized pricing (<a href="https://www.consumerreports.org/money/questionable-business-practices/instacart-ai-pricing-experiment-inflating-grocery-bills-a1142182490/?ref=junkcharts.com" rel="noreferrer">link</a>).</p><hr><p>The CR study focused on Instacart, a popular shopping concierge service, by which the company dispatches shopping assistants to pick groceries from brick-and-mortar stores and deliver them to customers. Consumer Reports found that the Instacart website shows many different prices to different customers for the same item ordering from the same stores, with the maximum price sometimes as much as 20 percent higher. </p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2025/12/cr_instacart_pricing.png" class="kg-image" alt="" loading="lazy" width="2000" height="633" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/2025/12/cr_instacart_pricing.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/2025/12/cr_instacart_pricing.png 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1600/2025/12/cr_instacart_pricing.png 1600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2025/12/cr_instacart_pricing.png 2122w" sizes="(min-width: 720px) 720px"></figure><p>Today, those customers who paid $123.93 are likely to presume that everyone is being charged the same price - that's the norm when they shop in brick-and-mortar stores with posted price tags. We're assuming they know how much they're paying. But, as described in Chapter 7 (on inflation statistics) of <strong>Numbersense (</strong><a href="https://amzn.to/2DtUH8e?ref=junkcharts.com" rel="noreferrer"><strong>link</strong></a><strong>)</strong>, many American supermarket customers have no clue how much items they placed in their shopping carts cost. (I cited this <a href="https://www.jstor.org/stable/1251815?ref=junkcharts.com" rel="noreferrer">research</a> by marketing professors.) </p><p>Furthermore, there is not much a skeptical online shopper can do to learn if the store charges everyone the same prices. It's a bit easier to collect posted prices at physical stores for comparison with the online prices, but still too much a hassle for most shoppers.</p><p>Thus, online retailers have an incentive to personalize pricing because they can find more revenues from unsuspecting customers.</p><p>This is why the Consumer Reports study is so valuable. </p><p>How then did CR get around the data challenge? They recruited hundreds of people, arranging simultaneous Instacart shopping sessions for the same retailers, during which everyone placed the same basket of groceries into their shopping carts. Then, they recorded the prices. The variations in prices were visualized in the type of dot plots shown above. </p><hr><p>The CR team seemed to be of two minds about whether Instacart is really doing "personalized" pricing. They call it "algorithmic pricing experiments."</p><p>This coinage merges two distinct concepts: algorithmic pricing, and pricing experiments. Algorithmic pricing is what I call "personalized pricing": the price differentiation is most commonly achieved by deploying an algorithm that computes each item's price while the shopper is browsing the site or using the shopping app. The goal of such an algorithm is to maximize the store's revenues.</p><p>A pricing experiment is another species. The retailer might set up five treatment "cells," say, the base price, and four variations (±5%, ±10%). Every time an item's price is required during a shopping session, a virtual die is thrown to pick one of those five price levels. Thus, those shoppers facing steeper-than-normal prices are just "unlucky." This is how Instacart staff explained the CR observations.</p><p>Normally, a pricing experiment does not use a pricing algorithm because an experiment should be designed like a clinical trial requiring random assignment of treatments (i.e. prices). Therefore, I don't use the term "algorithmic pricing experiment".  </p><p>If I say "algorithmic pricing experiment," I mean something else. This test would also appear like a clinical trial, in which the treatment group comprises shoppers subjected to a pricing algorithm, and the control group contains shoppers being shown the standard, non-personalized prices. The treatment group itself would split into multiple cells with different pricing (analogous to testing dosages of medicine). The control group is included in order to measure the business-as-usual state.</p><hr><p>Whether Instacart is running experiments or personalized pricing, an outside observer should find price variability. How then can we tell one from the other?</p><p>First, look at how widespread the price variations are. Typically, experiments affect a subset of shoppers, especially for a website with millions of customers while an algorithm represents a pricing strategy applied to all.</p><p>Second, look at how sticky the price variations are. Experiments are run to answer strategic questions, after which a strategic decision is made whether to "roll out" a change to all customers. The alternate prices are not supposed to last. </p><p>Third, look at the average prices. In my design, randomization occurs at the item level, therefore if we compute the average price differential (0%, ±5%, ±10%) across all purchases by customer over a time window, those averages should be roughly zero (if the number of items is not too few).</p><p>In the case of a personalized pricing algorithm, which sets prices to match a customer's ability or willingness to pay, we should see some customers with elevated prices, and others with deflated prices. It's hard to imagine that the average prices stay close to zero for most shoppers.</p><hr><p>The rub is that external observers have almost nothing to work with. </p><p>In order to assess how widespread the price variations are, the study would have to recruit all subtypes of customers. To measure how longlasting the price differentials are, the study must be repeated regularly. Without access to transaction databases, outsiders can't gauge average prices by customer. </p><p>The evidence collected by Consumer Reports is very important; it's hard to ask for more. The study suggests to me that personalized pricing will become widespread within a few years, whether we like it or not.</p><p></p><p></p>
          ]]></content:encoded>
          <description><![CDATA[ A Consumer Reports study reviews how Instacart manipulates prices of groceries ]]></description>
        </item>
        <item>
          <title><![CDATA[ Facetune your charts ]]></title>
          <link>https://www.junkcharts.com/facetune-your-charts/</link>
          <guid isPermaLink="false">6938a008f93d760001294019</guid>
          <category><![CDATA[ visual storytelling ]]></category>
          <dc:creator><![CDATA[ Kaiser Fung ]]></dc:creator>
          <pubDate>Mon, 15 Dec 2025 09:02:20 -0500</pubDate>
          <content:encoded><![CDATA[
            <style type="text/css">
              * { box-sizing: border-box !important; }
              img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                display: block !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              figure {
                margin: 0 !important;
                padding: 0 !important;
                max-width: 100% !important;
                width: 100% !important;
                overflow: hidden !important;
              }
              .kg-image, .kg-image-card img, .kg-gallery-image img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                object-fit: contain !important;
              }
              .kg-image-card, .kg-gallery-card {
                max-width: 100% !important;
                width: 100% !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              .post-content, .gh-content {
                max-width: 100% !important;
                overflow-x: hidden !important;
              }
              div, p, span {
                max-width: 100% !important;
              }
            </style>
            <p>In a prior <a href="https://www.junkcharts.com/the-story-on-international-students-coming-to-the-u-s/" rel="noreferrer">post</a>, I featured the above chart to tell the story of international students in the U.S. </p><p>The story told by this chart is clean. Divide the total into two halves: the first half are the Indian and Chinese students; the other half comprises everyone else. Within the top half, India has 5/8 to China's 3/8. The bottom half is spliced into 8 parts. Europe is one, Canada+Mexico has one, the rest of the Americas occupy one, etc.</p><p>As I write this, I can hear purists screaming in my ears. "Your chart distorts the data", "You're spreading misinformation".</p><p>I confess. The chart is, for lack of a better word, face-tuned. </p><p>Below is the version of the chart that is "faithful" to the dataset:</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2025/12/fung_junkcharts_redo_nytinternationalstudents_faithful.png" class="kg-image" alt="" loading="lazy" width="1918" height="1044" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/2025/12/fung_junkcharts_redo_nytinternationalstudents_faithful.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/2025/12/fung_junkcharts_redo_nytinternationalstudents_faithful.png 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1600/2025/12/fung_junkcharts_redo_nytinternationalstudents_faithful.png 1600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2025/12/fung_junkcharts_redo_nytinternationalstudents_faithful.png 1918w" sizes="(min-width: 720px) 720px"></figure><p>India and China together made up 53% of the total, more than the 50% shown in my preferred version. </p><p>The eight country groups contributed between 4 to 8 percent of the students, not exactly 6 percent as my preferred chart suggests. To be precise: Europe 8%, East Asia (excluding China) 7%, Africa 6%, South Asia (excluding India) 6%, Canada+Mexico 5%, Southeast Asia 5%, Middle East 4%, and Americas (excluding Canada+Mexico) 4%. </p><p>The question: is the "faithful" version better than the "approximated" version? </p><hr><p>If my goal is for readers to walk away with insights that they can pass along to others, then I don't hesitate to use approximations.</p><p>If the readers thought the sum of Indian and Chinese students made up 50% of the total (rather than 53%), what is the harm?</p><p>If the readers thought that Europe, Middle East, Southeast Asia, etc. all contributed equal shares of international students - and they would be adrift by a few percentage points one way or the other, is it worse than them trying to recall which region had 8 percent, and which region had 5 percent?</p><p>Our brains are not designed to hold raw data. This is why we don't want to - and can't - remember passwords that are long strings of randomly selected, alphanumeric characters. This is also why rejecting approximations is frequently harmful. Face-tuning your charts is often beneficial!</p>
          ]]></content:encoded>
          <description><![CDATA[ The art of sacrificing precision ]]></description>
        </item>
        <item>
          <title><![CDATA[ The story on international students coming to the U.S. ]]></title>
          <link>https://www.junkcharts.com/the-story-on-international-students-coming-to-the-u-s/</link>
          <guid isPermaLink="false">693895b5f93d760001293f8d</guid>
          <category><![CDATA[ visual storytelling ]]></category>
          <dc:creator><![CDATA[ Kaiser Fung ]]></dc:creator>
          <pubDate>Wed, 10 Dec 2025 09:26:24 -0500</pubDate>
          <content:encoded><![CDATA[
            <style type="text/css">
              * { box-sizing: border-box !important; }
              img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                display: block !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              figure {
                margin: 0 !important;
                padding: 0 !important;
                max-width: 100% !important;
                width: 100% !important;
                overflow: hidden !important;
              }
              .kg-image, .kg-image-card img, .kg-gallery-image img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                object-fit: contain !important;
              }
              .kg-image-card, .kg-gallery-card {
                max-width: 100% !important;
                width: 100% !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              .post-content, .gh-content {
                max-width: 100% !important;
                overflow-x: hidden !important;
              }
              div, p, span {
                max-width: 100% !important;
              }
            </style>
            <p>The New York Times noted the drop in international students arriving in the U.S. in 2025 (<a href="https://www.nytimes.com/interactive/2025/10/06/upshot/us-international-student-travel.html?ref=junkcharts.com" rel="noreferrer">link</a>; paywall). As the following charts show, the schools have nearly recovered from the Covid-19 related dip but in the last year or so, the trend has reversed, probably due to the current hostility toward foreign-born persons.</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2025/12/nyt_internationalstudents_linechart.png" class="kg-image" alt="" loading="lazy" width="1342" height="1264" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/2025/12/nyt_internationalstudents_linechart.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/2025/12/nyt_internationalstudents_linechart.png 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2025/12/nyt_internationalstudents_linechart.png 1342w" sizes="(min-width: 720px) 720px"></figure><p>(Note that each chart above has a different <a href="https://www.junkcharts.com/tag/scale/" rel="noreferrer">scale</a>.)</p><p>These <a href="https://www.junkcharts.com/tag/line-chart" rel="noreferrer">line charts</a> are incredibly ugly because of the Covid-19 "shock." </p><p>Later in the article, the focus shifts to the change from 2024 to 2025. The time dimension is thus removed. They choose a bag of <a href="https://www.junkcharts.com/tag/bubble-chart" rel="noreferrer">bubbles</a> design: </p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2025/12/nyt_internationalstudents_bubblebag.png" class="kg-image" alt="" loading="lazy" width="1302" height="1438" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/2025/12/nyt_internationalstudents_bubblebag.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/2025/12/nyt_internationalstudents_bubblebag.png 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2025/12/nyt_internationalstudents_bubblebag.png 1302w" sizes="(min-width: 720px) 720px"></figure><p>Some readers will find this design engaging. You positively must play with the chart in order to learn something about the data. Which bubble represents which country? Is the declining trend affecting all regions? </p><p>Unlike the line charts above, if the reader is interested in the year-on-year change in student arrivals, this bubble chart gives out that information directly. </p><p>The size of the bubbles shows the 2025 data. This signals the relative importance of the bubbles. The main takeaway is that the erosion was widespread: most circles sit below the axis of no change.</p><p>The aggregate drop in arrivals was almost 20%. This value is printed on the chart as an annotation. Without the text, it would be impossible to figure it out. You'd have to do an average of the individual decline rates, using the relative bubble sizes as weights.</p><hr><p>Let's switch the perspective, and make a chart that gives readers some high-level takeaways. </p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2025/12/fung_junkcharts_redo_nytinternationalstudents.png" class="kg-image" alt="" loading="lazy" width="1890" height="1046" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/2025/12/fung_junkcharts_redo_nytinternationalstudents.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/2025/12/fung_junkcharts_redo_nytinternationalstudents.png 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1600/2025/12/fung_junkcharts_redo_nytinternationalstudents.png 1600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2025/12/fung_junkcharts_redo_nytinternationalstudents.png 1890w" sizes="(min-width: 720px) 720px"></figure><p>In this chart, the outer square displays all 1.2 million international students in 2024-5. The population of all students is divided into four rows, each containing ~300K students. </p><p>From this, you can see that India and China together account for half of the total. India is the top source of international students, comprising ~30% of the total (25% + 25%/4 =31%). </p><p>Each row is subdivided into four parts, so each "cell" covers about 75,000 students, ~6% of the total.</p><p>The bottom two rows show a classification of countries into eight regions with roughly equal contributions: South Asia (excl. India), East Asia (excl. China), Europe, Canada+Mexico, Southeast Asia, Middle East, Africa and Americas (excl. Canada+Mexico).</p><p>As an extra, I also show the relative sizes of Canada vs. Mexico. </p><p>The challenge of visualizing complex datasets like this one is to pick a problem of manageable size, and then to distill the stories contained in the data.</p><p>(Note that I obtained data from OpenDoorsData.org, which is a different source than what the Times used. As a result, I have full data on Canada and Mexico.)</p>
          ]]></content:encoded>
          <description><![CDATA[ How to visualize complex datasets ]]></description>
        </item>
        <item>
          <title><![CDATA[ MTA acknowledges OMNY defects ]]></title>
          <link>https://www.junkcharts.com/mta-acknowledges-omny-defects/</link>
          <guid isPermaLink="false">6931bc9d2b51b20001a2db29</guid>
          <category><![CDATA[ MTA ]]></category>
          <dc:creator><![CDATA[ Kaiser Fung ]]></dc:creator>
          <pubDate>Fri, 05 Dec 2025 09:59:52 -0500</pubDate>
          <content:encoded><![CDATA[
            <style type="text/css">
              * { box-sizing: border-box !important; }
              img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                display: block !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              figure {
                margin: 0 !important;
                padding: 0 !important;
                max-width: 100% !important;
                width: 100% !important;
                overflow: hidden !important;
              }
              .kg-image, .kg-image-card img, .kg-gallery-image img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                object-fit: contain !important;
              }
              .kg-image-card, .kg-gallery-card {
                max-width: 100% !important;
                width: 100% !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              .post-content, .gh-content {
                max-width: 100% !important;
                overflow-x: hidden !important;
              }
              div, p, span {
                max-width: 100% !important;
              }
            </style>
            <p>MTA (New York's subway operator) is poised to phase out the old swipe cards by the end of the year. I previously wrote about many issues with the new OMNY chip card (<a href="https://www.junkcharts.com/omny-needs-a-facelift/">here</a>, <a href="https://www.junkcharts.com/omnys-math-problem/">here</a>, and <a href="https://www.junkcharts.com/one-solution-to-omnys-math-problem/">here</a>). Recently, MTA has acknowledged these issues. Yet, they will retire the old system without fixing these problems!</p><p>First up. The OMNY card uses a tablet for scanning, and yet the spacious screen real estate is wasted without showing riders useful data: not how much is being charged for the trip, not a list of recent charges, not how much value remains on the card. According to this news report (<a href="https://www.msn.com/en-us/money/other/omny-users-may-soon-see-remaining-balances-in-nyc-subways/ar-AA1RIbxE?ref=junkcharts.com" rel="noreferrer">link</a>), MTA has raised the "possibility" of showing remaining balances. I want to be a fly on the wall to hear the opponents of displaying the data. The old swipe card system using the tiniest screen still managed to show such data. </p><p>Second, many riders complained about the lack of "visibility and transparency" relating to free rides. Absolutely agree. The real problem, as I explained in two blog posts (<a href="https://www.junkcharts.com/omnys-math-problem/">here</a> and <a href="https://www.junkcharts.com/one-solution-to-omnys-math-problem/">here</a>), is the mind-numbingly complex new method of rewarding free trips. The PR agency decided to dumb down the math, which compounds the problem because what they are promoting on the trains is a lie. They can't possibly be computing the free rides the way they are described to the public.</p><p>In a prior post, I guessed at what the real method of rewarding free rides is. While that method gets the job done, it is difficult to explain, and impossible for riders to audit without lots of data.</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2025/12/kfung_omny_countbackwards.png" class="kg-image" alt="" loading="lazy" width="1218" height="1076" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/2025/12/kfung_omny_countbackwards.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/2025/12/kfung_omny_countbackwards.png 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2025/12/kfung_omny_countbackwards.png 1218w" sizes="(min-width: 720px) 720px"></figure><p>Comparing the OMNY way to the old swipe cards highlights the problem. Riders used to pay upfront the fixed fee for a 7-day travel card, and after the purchase, they could do as many rides as they like, without a care. Now, they don't know what's going on. Nevertheless, we are relieved that the MTA heard the correct feedback: "riders...want... some help building a little trust that this new unlimited ride fare cap is giving them free rides".</p><p>They have committed a big rookie mistake of marketing. When you're giving customers a discount or freebie, you better make it super obvious what they are getting. </p><p>The first proposed "solution" doubles down on the opaqueness - they are asking riders to spend time going to a website to inspect their historical trips. How is this better than the old swipe-card system, in which riders know at the turnstile that they just received a free ride without needing to do anything else?</p><p>In fact, I did a transfer from subway to bus today, and I had no idea if I was charged once or twice. (The transfer to bus should have been free.) If I used the old swipe card, I'd have been told right after the swipe that the ride was a free transfer. With OMNY, the same green light greeted me whether or not I was transferring. </p><p></p>
          ]]></content:encoded>
          <description><![CDATA[ But it&#39;s not in a hurry to fix them ]]></description>
        </item>
        <item>
          <title><![CDATA[ Say Jon without the h in Chinese ]]></title>
          <link>https://www.junkcharts.com/say-jon-without-the-h-in-chinese/</link>
          <guid isPermaLink="false">6924b9e3de26d20001464813</guid>
          <category><![CDATA[ demographics ]]></category>
          <dc:creator><![CDATA[ Kaiser Fung ]]></dc:creator>
          <pubDate>Mon, 01 Dec 2025 09:44:16 -0500</pubDate>
          <content:encoded><![CDATA[
            <style type="text/css">
              * { box-sizing: border-box !important; }
              img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                display: block !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              figure {
                margin: 0 !important;
                padding: 0 !important;
                max-width: 100% !important;
                width: 100% !important;
                overflow: hidden !important;
              }
              .kg-image, .kg-image-card img, .kg-gallery-image img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                object-fit: contain !important;
              }
              .kg-image-card, .kg-gallery-card {
                max-width: 100% !important;
                width: 100% !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              .post-content, .gh-content {
                max-width: 100% !important;
                overflow-x: hidden !important;
              }
              div, p, span {
                max-width: 100% !important;
              }
            </style>
            <p>When you speak to any of the so-called "smart" devices, they can hear you, and perform tasks as you request them. One of the key components of such an application is voice-to-text software. There are many nuances that trip up such software. One puzzle is homophones: since John and Jon are pronounced the same, how can the "smart" device decide which one was spoken?</p><p>Humans encounter the same problem. We make the intention clear by saying "John with the h" or "Jon without the h". How does this issue arise with Chinese names?</p><p>My friend Ray V. sent me to a nice data visualization <a href="https://vis.csh.ac.at/notmyname/?ref=junkcharts.com" rel="noreferrer">project</a> by Liuhuaying Yang tackling this tricky subject.</p><hr><p>The situation with Chinese names is even more complex. Chinese names are made up of ordered characters (typically one, possibly two, characters for the surname, and typically one or two, possibly three, characters for the "given name"). The surname is written before the given name.</p><p>Each character is a single syllable. Homophones are numerous. The designer illustrates this as a tree:</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2025/12/notmyname_li_givennames.png" class="kg-image" alt="" loading="lazy" width="906" height="962" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/2025/12/notmyname_li_givennames.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2025/12/notmyname_li_givennames.png 906w" sizes="(min-width: 720px) 720px"></figure><p>This "li" tree contains forty "fruits," each being a Chinese character sharing the same sound. If all one has is "Li," it could be any of these characters (of course, some are more likely than others.) Thus, Li has to augment "li" by saying something like "the 'li' as used in pear." Jon without the h.</p><p>The situation is a bit better if the name is spoken out loud, because the "tone" is heard. Mandarin Chinese uses four tones, indicated by the <a href="https://www.junkcharts.com/tag/color/" rel="noreferrer">color</a> of the fruit in the tree above. Only three tones appear there, so apparently the first tone of "li" (shown in pink) is rarely used in Chinese names. In written text, the tone indicator is usually dropped, making it much harder to figure out which of these "li"s is the right one. </p><p>The canopy of the tree casts a shadow on the ground, the size of which encodes the difficulty of the puzzle. According to their statistics, the 40 "li"s show up in given names with a popularity of 38 per thousand people. If you look more closely, there are two shadows. The thicker shadow is related to surname usage while the thinner one, usage in given names. One of the "li"s is one of the top five surnames in China, and so the thicker shadow is the outer one (76 per thousand).</p><p>The writeup neglected to explain the two shadow rings. So, let's find one that has the opposite characteristic as "li" for comparison.</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2025/12/notmyname_rong.png" class="kg-image" alt="" loading="lazy" width="584" height="552"></figure><p>"Rong" is rarely a surname so the thicker shadow is right at the base of the trunk while the thinner shadow related to given names is more visible. Interestingly, only one of the four tones appears in this "rong" tree. </p><hr><p>Returning to the "li" tree, let's analyze "Li" as a surname.</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2025/12/notmyname_li_surnames.png" class="kg-image" alt="" loading="lazy" width="936" height="926" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/2025/12/notmyname_li_surnames.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2025/12/notmyname_li_surnames.png 936w" sizes="(min-width: 720px) 720px"></figure><p>The character in green is the second most common surname in China, but there are about a dozen other characters that can be someone's family name. Unsurprisingly, many of these characters only show up in given names. Because they are using the tint of the <a href="https://www.junkcharts.com/tag/color/" rel="noreferrer">color</a> to show relative popularity, we don't really know the drop in popularity relative to the green character (it's a huge drop-off).</p><hr><p>The value of this data visualization project is in structuring and presenting the data in a way that engages readers. This is a project that keeps readers focused on the trees, while losing themselves inside the forest, hopefully at will. </p><p>Enter the forest of Chinese names <a href="https://vis.csh.ac.at/notmyname/?ref=junkcharts.com" rel="noreferrer">here</a>. </p><p></p><p></p>
          ]]></content:encoded>
          <description><![CDATA[ Unraveling the Chinese name puzzle ]]></description>
        </item>
        <item>
          <title><![CDATA[ Light entertainment: Pi-orities ]]></title>
          <link>https://www.junkcharts.com/light-entertainment-pi-orities/</link>
          <guid isPermaLink="false">6924b834de26d200014647f6</guid>
          <category><![CDATA[ Pie chart ]]></category>
          <dc:creator><![CDATA[ Kaiser Fung ]]></dc:creator>
          <pubDate>Wed, 26 Nov 2025 10:11:59 -0500</pubDate>
          <content:encoded><![CDATA[
            <style type="text/css">
              * { box-sizing: border-box !important; }
              img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                display: block !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              figure {
                margin: 0 !important;
                padding: 0 !important;
                max-width: 100% !important;
                width: 100% !important;
                overflow: hidden !important;
              }
              .kg-image, .kg-image-card img, .kg-gallery-image img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                object-fit: contain !important;
              }
              .kg-image-card, .kg-gallery-card {
                max-width: 100% !important;
                width: 100% !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              .post-content, .gh-content {
                max-width: 100% !important;
                overflow-x: hidden !important;
              }
              div, p, span {
                max-width: 100% !important;
              }
            </style>
            <p>Screaming for attention in my twitter feed: Attila's <a href="https://attilabatorfy.substack.com/p/pie-chart-frenzy-from-brazil?ref=junkcharts.com" rel="noreferrer">post</a> in which he dug up loads of pie charts from old Brazilian publications.</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2025/11/attila_brazilpies_1.webp" class="kg-image" alt="" loading="lazy" width="474" height="364"></figure><p>It's Thanksgiving week in the U.S. Supermarkets sell lots of pies. Enjoy!</p>
          ]]></content:encoded>
          <description><![CDATA[ Stuffed full of pies ]]></description>
        </item>
        <item>
          <title><![CDATA[ More than a penny ]]></title>
          <link>https://www.junkcharts.com/more-than-a-penny/</link>
          <guid isPermaLink="false">691f43a3cb30200001c05585</guid>
          <category><![CDATA[ Economics ]]></category>
          <dc:creator><![CDATA[ Kaiser Fung ]]></dc:creator>
          <pubDate>Fri, 21 Nov 2025 09:07:47 -0500</pubDate>
          <content:encoded><![CDATA[
            <style type="text/css">
              * { box-sizing: border-box !important; }
              img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                display: block !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              figure {
                margin: 0 !important;
                padding: 0 !important;
                max-width: 100% !important;
                width: 100% !important;
                overflow: hidden !important;
              }
              .kg-image, .kg-image-card img, .kg-gallery-image img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                object-fit: contain !important;
              }
              .kg-image-card, .kg-gallery-card {
                max-width: 100% !important;
                width: 100% !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              .post-content, .gh-content {
                max-width: 100% !important;
                overflow-x: hidden !important;
              }
              div, p, span {
                max-width: 100% !important;
              }
            </style>
            <p>Farewell to the penny. </p><p>The current U.S. administration has decided to get rid of the "penny."</p><p>One of the reasons cited by many reports of the penny's demise is that it costs more than a penny to make a penny. Here's a quote from AP (<a href="https://www.msn.com/en-us/money/markets/us-mint-in-philadelphia-to-press-final-penny-as-the-1-cent-coin-gets-canceled/ar-AA1QipLr?ref=junkcharts.com" rel="noreferrer">link</a>):</p><blockquote>“For far too long the United States has minted pennies which literally cost us more than 2 cents,” Trump wrote in an online post in February. “This is so wasteful!”</blockquote><p>This is an example of a false friend that sounds reasonable, but in fact the reasoning buckles.</p><p>You can see this by asking: does it cost $100 to print a $100 bill? Should it cost $100, or anywhere close to it?</p><p>The cost efficiency of money-printing must be judged in aggregate; one shouldn't pick out one unit of currency and analyze it separately. The system is set up so that the printing of $100 bills subsidizes the printing of pennies. By that metric of "wastefulness," a lower denominated currency is going to be wasteful relative to a higher denominated one.</p><p>A better reason to get rid of the penny is that inflation has rendered it useless. Nothing can be bought for a penny; very few items can be had for even one dollar these days in the U.S. So, its death is better attributed to loss of utility.</p><p></p>
          ]]></content:encoded>
          <description><![CDATA[ False friends in analytics ]]></description>
        </item>
        <item>
          <title><![CDATA[ Avinash&#x27;s scoring rubric for data visualization ]]></title>
          <link>https://www.junkcharts.com/avinashs-scoring-rubric-for-data-visualization/</link>
          <guid isPermaLink="false">691b940e237016000130b94d</guid>
          <category><![CDATA[ data visualization ]]></category>
          <dc:creator><![CDATA[ Kaiser Fung ]]></dc:creator>
          <pubDate>Tue, 18 Nov 2025 09:04:33 -0500</pubDate>
          <content:encoded><![CDATA[
            <style type="text/css">
              * { box-sizing: border-box !important; }
              img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                display: block !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              figure {
                margin: 0 !important;
                padding: 0 !important;
                max-width: 100% !important;
                width: 100% !important;
                overflow: hidden !important;
              }
              .kg-image, .kg-image-card img, .kg-gallery-image img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                object-fit: contain !important;
              }
              .kg-image-card, .kg-gallery-card {
                max-width: 100% !important;
                width: 100% !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              .post-content, .gh-content {
                max-width: 100% !important;
                overflow-x: hidden !important;
              }
              div, p, span {
                max-width: 100% !important;
              }
            </style>
            <p>Here's the always-entertaining Avinash on data visualizations (<a href="https://www.kaushik.net/avinash/smart-data-visualizations-quality-assessment-algorithm/?utm_source=newsletter&utm_medium=email&utm_campaign=tinyletter" rel="noreferrer">link</a>).</p><p>tldr; He presents a (pseudo-)"algorithm" for great data visualizations. His scoring rubric consists of eight items: time to insight, effort to comprehend, trust, hierarchy, logic, nuance, no gimmicks, and influence.</p><p>All of these should be familiar to Junk Charts readers. I like to express the first two in terms of a "return on effort" metric. See this <a href="https://www.junkcharts.com/the-return-on-effort-in-data-graphics/" rel="noreferrer">post</a>. It's not that every graphic that requires a long time to process is bad; the issue is when we expend the effort but don't receive the reward.</p><p>The last metric ("influence") is a very high bar. It's something we dream of, but rarely achieve. Worse, it may be easier to attain influence by deception using flawed graphics.</p><p>Avinash then analyzes four infographics that each explain Covid risks to illustrate his scoring mechanism. (The post was originally published during the Covid era.)</p><hr><p>Since I included <a href="https://xkcd.com/2333/?ref=junkcharts.com" rel="noreferrer">xkcd</a>'s cartoon up top, let's take a closer look. Like Avinash, I'm treating it as a data visualization, which was not the intention – so be warned.</p><p>We are looking at a grid. Based on the <a href="https://www.junkcharts.com/tag/color/" rel="noreferrer">color</a> scheme, it's a 4x4 grid. There's something of a <a href="https://www.junkcharts.com/tag/scatter-plot/" rel="noreferrer">scatter plot</a> living on this grid. Think of each "dot" in the scatter plot as a text box. Each text box contains an activity. Each activity is rated on two dimensions: Covid risk, and non-Covid risk. </p><p>These <a href="https://www.junkcharts.com/tag/text/" rel="noreferrer">axis labels</a> are concise but imprecise. "Covid risk" really means the risk of catching Covid while doing said activity while "non Covid risk" signifies the general risk(s) of said activity other than catching Covid. For example, "staying home" has negligible risk of catching Covid (assuming there isn't an infected family member), and "staying home" presents low risks in general to someone, even if we ignore Covid risk (top left corner). By contrast, "singing in the church" is not typically regarded as a risky activity, but during Covid, it was a super-spreader event (top right corner).</p><p>This leads us to one avenue to consume this infographic. The diagonal going from top left to bottom right represents status quo: activities that didn't change in risk profile due to Covid. Our attention should be drawn to the top right corner, where those activities have elevated risks of catching Covid, relative to doing them prior to the pandemic. As a matter of curiosity, the activities shown in the bottom left corner are ironically less risky during Covid than prior. </p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2025/11/junkcharts_xkcd_covidrisk.png" class="kg-image" alt="" loading="lazy" width="1514" height="1124" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/2025/11/junkcharts_xkcd_covidrisk.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/2025/11/junkcharts_xkcd_covidrisk.png 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2025/11/junkcharts_xkcd_covidrisk.png 1514w" sizes="(min-width: 720px) 720px"></figure><p>Top right corner: singing in church, going to a restaurant, going to a bar, going to a party, attending in-person classes, etc. are usually considered dangerous but during Covid, that was how people got infected.</p><p>Bottom left corner: bungee jumping while doing sword tricks, going down a waterslide on an electric scooter, running and sliding headfirst into the pins at a bowling alley, etc. For these activities, the risk during Covid was rated lower than prior, probably because many of these recreational centers were closed, and they don't involve crowds of people.</p><p>xkcd actually makes a subtle point that isn't conveyed in the other infographics: that the risk profiles of some activities changed dramatically during a pandemic.</p><hr><p>I can't figure out the <a href="https://www.junkcharts.com/tag/color/" rel="noreferrer">color</a> scheme of the graphic. The green, yellow, orange, and red colors correspond to the distance from the top left corner ("origin"), which represents low Covid risk and low non-Covid risk. </p><p>Take the red boxes. They show activities that have <em>either</em> high risk of catching Covid <em>or</em> high non-Covid risk (or both). The latter segment includes some activities with low Covid risk. It's confusing.</p><hr><p>Though I said above the form of the plot is that of a <a href="https://www.junkcharts.com/tag/scatter-plot/" rel="noreferrer">scatter plot</a>, I really should make a clarification. </p><p>For the notion of a "scatter" or "cluster" does not exist. What xkcd did is to fill the entire grid with evenly spaced data. The data are made up to represent all points in the grid; the density of the data does not vary, and as such, they do not contain any statistical meaning, unlike the usual scatter plots.</p><p></p>
          ]]></content:encoded>
          <description><![CDATA[ Eight ingredients of great graphics ]]></description>
        </item>
        <item>
          <title><![CDATA[ We aren&#x27;t getting $2,000 checks ]]></title>
          <link>https://www.junkcharts.com/we-arent-getting-2-000-checks/</link>
          <guid isPermaLink="false">6913fc43e44f8e00013560f1</guid>
          <category><![CDATA[ Economics ]]></category>
          <dc:creator><![CDATA[ Kaiser Fung ]]></dc:creator>
          <pubDate>Fri, 14 Nov 2025 09:29:43 -0500</pubDate>
          <content:encoded><![CDATA[
            <style type="text/css">
              * { box-sizing: border-box !important; }
              img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                display: block !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              figure {
                margin: 0 !important;
                padding: 0 !important;
                max-width: 100% !important;
                width: 100% !important;
                overflow: hidden !important;
              }
              .kg-image, .kg-image-card img, .kg-gallery-image img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                object-fit: contain !important;
              }
              .kg-image-card, .kg-gallery-card {
                max-width: 100% !important;
                width: 100% !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              .post-content, .gh-content {
                max-width: 100% !important;
                overflow-x: hidden !important;
              }
              div, p, span {
                max-width: 100% !important;
              }
            </style>
            <p>The $2,000 per person checks "funded by tariffs" do not make sense.</p><p>The math doesn't work. For simpler math, I'd take the U.S. population to be 300 million. Multiply 300 million x $2,000, and we get $600,000 million, or $600 billion.</p><p>Are there $600 billion of tariff money to spend? Of course not.</p><p>The U.S. government collected about $200 billion in 2025 (<a href="https://www.crfb.org/blogs/tariff-revenue-soars-fy-2025-amid-legal-uncertainty?ref=junkcharts.com" rel="noreferrer">link</a>) but wait, a chunk of that number comes from tariffs that existed before. The incremental Trump tariffs amounted to ~$120 billion. </p><p>After four years, they will still be more than $100 billion short. </p><p>This is assuming every dollar of "revenue" can be spent: there is no cost of setting up and administering the tariff system.</p><p>(Note: If they distributed the whole $200 billion, instead of the incremental $120 billion, they must somehow raise $80 billion from somewhere, because that money would have been spent, since the U.S. government runs a gigantic deficit.)</p><hr><p>Sure, they already said they will exclude "high earners". We can turn this around. Let's say they get $500 billion to spend. That works out to $500,000 million / $2,000 = 250 million checks. Around 20% of the population will not get them. </p><p>This calculation assumes they send checks out on the last day of Trump's term. Otherwise, if they issued checks in 2026, they would be spending money they didn't have. </p><p>They can also bait and switch, and ultimately give $2,000 checks only to the poorest Americans. Kind of like retailers who say "sale up to 50%" except you can find only one item at half price.</p><hr><p>Even if we buy snakeoil from these people, the whole tariff business still makes no sense. </p><p>Because Americans will then spend the $2,000 "windfall" on groceries, and everything else with rapidly inflating prices - primarily caused by these tariffs!</p><p>Imagine this scenario: Uncle Sam forces McDonald's to pay additional taxes (i.e. tariffs) amounting to 20% of revenues. McDonald's immediately passes the additional cost to consumers, raising the Big Mac price by 20%. Uncle Sam takes the incremental taxes, and sends Big Mac coupons to Americans, who then go to McDonald's to buy the iconic burgers at the new higher price with the coupons covering the 20% tax-related price hike.</p><p>Nothing of value has happened. McDonald's profits stay the same since the incremental revenues from the price hike counteracts the additional taxes paid to Uncle Sam. The fiscal situation with the U.S. government do not change, since all incremental tax receipts are immediately paid out. The consumers effectively do not feel the tariff-related price hike. So, what's the point?</p><p>Interestingly, McDonald's may like this arrangement because its revenue line gets inflated. I suppose the U.S. government can call this nominal GDP growth.</p><p>(Note: For reasons explained in this prior <a href="https://www.junkcharts.com/doing-tariff-math-right/" rel="noreferrer">post</a>, if McDonald's raised prices by the tariff rate, Uncle Sam's collected tariffs would in fact not be sufficient to cover the entire price hike.)</p>
          ]]></content:encoded>
          <description><![CDATA[ Numbersense for tariffs ]]></description>
        </item>
        <item>
          <title><![CDATA[ Let&#x27;s be Finn-icky ]]></title>
          <link>https://www.junkcharts.com/lets-be-finn-icky/</link>
          <guid isPermaLink="false">69094dca32c28900017f7616</guid>
          <category><![CDATA[ statistical literacy ]]></category>
          <dc:creator><![CDATA[ Kaiser Fung ]]></dc:creator>
          <pubDate>Tue, 11 Nov 2025 09:07:14 -0500</pubDate>
          <content:encoded><![CDATA[
            <style type="text/css">
              * { box-sizing: border-box !important; }
              img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                display: block !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              figure {
                margin: 0 !important;
                padding: 0 !important;
                max-width: 100% !important;
                width: 100% !important;
                overflow: hidden !important;
              }
              .kg-image, .kg-image-card img, .kg-gallery-image img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                object-fit: contain !important;
              }
              .kg-image-card, .kg-gallery-card {
                max-width: 100% !important;
                width: 100% !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              .post-content, .gh-content {
                max-width: 100% !important;
                overflow-x: hidden !important;
              }
              div, p, span {
                max-width: 100% !important;
              }
            </style>
            <p>Long-time reader Aleks J. sent me to this very nice statistical literacy effort, produced by a team at Tampere University and Tampere University of Applied Science in Finland (<a href="https://webpages.tuni.fi/gamelab/2022/mediawatch/?ref=junkcharts.com" rel="noreferrer">link</a>).</p><p>Mediawatch uses as a background story that resonates with me: the modern world is flooded with lots of bad graphs and statistics, and we don't have enough time or bodies to debunk them. The technology of debunking is not keeping up with the tools of dispatching.</p><p>The Mediawatch player navigates around the website, interpreting various charts that pop up. </p><p>Here's one example:</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2025/11/mediawatch_detail.png" class="kg-image" alt="" loading="lazy" width="2000" height="1441" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/2025/11/mediawatch_detail.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/2025/11/mediawatch_detail.png 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1600/2025/11/mediawatch_detail.png 1600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2025/11/mediawatch_detail.png 2038w" sizes="(min-width: 720px) 720px"></figure><p>This chart explains the start at zero rule for <a href="https://www.junkcharts.com/tag/bar-chart/" rel="noreferrer">column charts</a>. </p><p>Spread the word! It's an excellent project worthy of our support.</p>
          ]]></content:encoded>
          <description><![CDATA[ Excellent project for data graphics literacy ]]></description>
        </item>
        <item>
          <title><![CDATA[ Govt shutdown shines light on missing data ]]></title>
          <link>https://www.junkcharts.com/govt-shutdown-shines-light-on-missing-data/</link>
          <guid isPermaLink="false">690e6fa713bdf900013885f8</guid>
          <category><![CDATA[ missing data ]]></category>
          <dc:creator><![CDATA[ Kaiser Fung ]]></dc:creator>
          <pubDate>Mon, 10 Nov 2025 08:06:40 -0500</pubDate>
          <content:encoded><![CDATA[
            <style type="text/css">
              * { box-sizing: border-box !important; }
              img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                display: block !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              figure {
                margin: 0 !important;
                padding: 0 !important;
                max-width: 100% !important;
                width: 100% !important;
                overflow: hidden !important;
              }
              .kg-image, .kg-image-card img, .kg-gallery-image img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                object-fit: contain !important;
              }
              .kg-image-card, .kg-gallery-card {
                max-width: 100% !important;
                width: 100% !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              .post-content, .gh-content {
                max-width: 100% !important;
                overflow-x: hidden !important;
              }
              div, p, span {
                max-width: 100% !important;
              }
            </style>
            <p>You haven't seen me on Tiktok, but I'm there now, thanks to FullTake who interviewed me last week about how the government shutdown affects data collection.</p><p>Here's the clip:</p><figure class="kg-card kg-embed-card"><blockquote class="tiktok-embed" cite="https://www.tiktok.com/@fulltake/video/7569992315406716190" data-video-id="7569992315406716190" data-embed-from="oembed" style="max-width:605px; min-width:325px;"> <section> <a target="_blank" title="@fulltake" href="https://www.tiktok.com/@fulltake?refer=embed&ref=junkcharts.com">@fulltake</a> <p>The longest government shutdown in U.S. history has stalled data reports from the Bureau of Labor Statistics. Former Columbia University program director and data science consultant, Kaiser Fung, tells us what will happen as a result of these report suspensions. <a title="data" target="_blank" href="https://www.tiktok.com/tag/data?refer=embed&ref=junkcharts.com">#data</a></p> <a target="_blank" title="♬ original sound - fulltake" href="https://www.tiktok.com/music/original-sound-7569992302542867231?refer=embed&ref=junkcharts.com">♬ original sound - fulltake</a> </section> </blockquote> <script async="" src="https://www.tiktok.com/embed.js"></script></figure><p>Statisticians don't like holes in the data, especially avoidable ones.</p><p>The government shutdown is punching holes in the datasets that underlie U.S. economic reports. These datasets rely on "shoe leather," staff conducting interviews about employment situation, or visiting retail stores to compile lists of prices. During the impasse, data collection has been suspended. </p><p>What happens after the government reopens?</p><p>We know that the furloughed employees typically get back pay, undoing the damage in one sense. However, data that weren't collected could not be replaced.</p><p>Prices displayed on store shelves one or two or three months later aren't necessarily the prices during the shutdown. When it comes to employment, it is possible to ask someone how many hours they were working several months ago. But such replacement data introduce recall bias. The more unsteady is a person's employment, the more inaccurate his/her answer. In fact, anyone with a steady job isn't contributing to recall error.</p><p>Alternatively, BLS can apply statistical methods to fill data gaps. Think of these fillers as part data, part assumptions. The most famous simple backfill method is "mean imputation," which is a jargonistic way of saying "replace missing values with the average value of the non-missing." Backfilling is typically biased toward maintaining the status quo, because the most common – and least assailable – assumption is that the future replays the past. This assumption is likely to misfire in light of high economic uncertainty.</p><p>The government statisticians can elect not to fill in the gaps. This is an act of passing the buck because analysts who use these data series would then have to prepare their own filling materials.</p><hr><p>How will any of this affect you and I? </p><p>Here's one way. The CPI is used by the government to determine cost-of-living adjustments for Social Security payments. Similarly, employers may use CPI to figure out annual pay increases. </p><p>Let's say BLS economists backfilled the missing values caused by the pause in data collection. These fillers mostly reflect assumptions as there aren't much, if any, data. The key assumption is likely rolling forward the status quo. If the inflation trend continues, then we would have a few months in which the CPI is under-estimated. This could lead to lower-than-warranted cost-of-living adjustments. (Imagine, for example, the adjustment formula is based on an average of some number of historical monthly inflation figures.)</p>
          ]]></content:encoded>
          <description><![CDATA[ What data will be missing, how will they be backfilled, and how do they affect us ]]></description>
        </item>
        <item>
          <title><![CDATA[ The way of the statistician ]]></title>
          <link>https://www.junkcharts.com/the-way-of-the-statistician/</link>
          <guid isPermaLink="false">690acb0cf7d40900014deccc</guid>
          <category><![CDATA[ Statisticians ]]></category>
          <dc:creator><![CDATA[ Kaiser Fung ]]></dc:creator>
          <pubDate>Thu, 06 Nov 2025 08:57:27 -0500</pubDate>
          <content:encoded><![CDATA[
            <style type="text/css">
              * { box-sizing: border-box !important; }
              img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                display: block !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              figure {
                margin: 0 !important;
                padding: 0 !important;
                max-width: 100% !important;
                width: 100% !important;
                overflow: hidden !important;
              }
              .kg-image, .kg-image-card img, .kg-gallery-image img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                object-fit: contain !important;
              }
              .kg-image-card, .kg-gallery-card {
                max-width: 100% !important;
                width: 100% !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              .post-content, .gh-content {
                max-width: 100% !important;
                overflow-x: hidden !important;
              }
              div, p, span {
                max-width: 100% !important;
              }
            </style>
            <p>What do statisticians do? A lot of people seem not to know.</p><p>If you are curious to know, try reading Andrew's post about Tuesday's local elections in the U.S. (<a href="https://statmodeling.stat.columbia.edu/2025/11/04/polls-betting-odds-nonsampling-errors-win-probabilities-vote-margins/?ref=junkcharts.com#comment-2405452" rel="noreferrer">link</a>) Living in the northeast, we were served a flurry of late-breaking reports, claiming that the New York City mayor's race was tightening (<a href="https://www.newsweek.com/andrew-cuomo-zohran-mamdani-polls-nyc-mayor-election-10984438?ref=junkcharts.com" rel="noreferrer">link</a>), and the New Jersey governor's race was a toss-up (<a href="https://thehill.com/homenews/campaign/5524533-new-jersey-governors-race-tightens/?ref=junkcharts.com" rel="noreferrer">link</a>, <a href="https://www.axios.com/2025/10/04/democrats-worry-new-jersey-governors-race?ref=junkcharts.com" rel="noreferrer">link</a>). Andrew's analysis would suggest comfortable wins for the Democratic candidates in each case; and since I'm writing this post after polls closed, I can report that his findings weren't far from the actual outcomes.</p><p>How did Andrew determine that the Democratic candidate in each case has around 80-90% chance of winning?</p><p>The starting point are poll numbers. People are asked which candidate they intend to vote for in each election. This data are then converted into vote shares. There are multiple pollsters, and each runs periodic polls so we have a dataset consisting of a series of vote share values for each candidate in each election.</p><p>Here is New Jersey Democrat Mikie Sherrill's vote-share series: 56%, 51%, 51%, 54%, 55%, and 55%. Each value came from a different poll. The average vote share is 53%. Crudely, one predicts that Sherrill will win 53% of the election-day votes. </p><p>Statisticians don't like that answer. A moment's thought should convince you that the ability of prior polls to predict the election outcome depends, in part, on the variability in that vote-share series. Two of the six values sit uncomfortably close to 50%. How do we capture this observation quantitively?</p><p>The canonical tool used by statisticians is the <a href="https://www.junkcharts.com/tag/margin-of-error/" rel="noreferrer">margin of error</a>. Here, it's ±4.4%. (This number is derived from the standard deviation of the vote-share data series.) Notably, the left side of the margin of error dips below 50%. </p><p>On election day, Sherrill needs at least 50% of the votes to win. How likely is she to get half or more of the votes, given the series of poll numbers averaging 53%? We now appeal to the statistics gods.</p><p>The gods tend to a pool of "truth". The prior polls are random samples of this truth. Since they didn't measure every likely voter, each polling sample is different, so the series of polling averages exhibited variability. The margin of error quantifies such uncertainty: the probability that a poll average falls within 49.6% and 57.4% is 95%.</p><p>That doesn't directly answer our question. Using the same tools, we can show that there is 91% chance of obtaining a sample average of 50% or higher. In other words, the New Jersey Governer's election is not a toss-up as the media led us to believe.</p><p>Andrew also explained why he lowered Sherrill's chance to 84%. The margin of error only accounts for sampling <a href="https://www.junkcharts.com/tag/variability/" rel="noreferrer">variability</a> – think of it as random error. As recent elections have shown, polls also suffer from systematic error, that is to say, some other factor causes most polls to skew in the same direction. Andrew modeled this source of error by adjusting the margin of error upwards, to ±6%, which leads to a downward revision of her winning probability. (She won handily with 56% of the votes, at the time of writing.)</p><p>For the NYC mayor election, Andrew gave several reasons why he lowered Mamdani's chance of winning futher. This election is a three-way race, while the above methodology uses the two-candidate vote shares, ignoring the Republican candidate's values. It's reasonable to assume that on election day, some of the voters who had intended to vote for the Republican would decide not to waste their vote, and most of them were expected to gift their votes to Cuomo.</p><p>By making assumptions about the size of this group of late switchers, and the amount of skew towards Cuomo, we can adjust the expected two-candidate vote shares between Mamdani and Cuomo. This maneuver only moved the average vote share by a couple of percentage points (because the Republican weren't going to get that many votes). Even after inserting more uncertainty to account for more variability in a three-horse race, Andrew's analysis still shows Mamdani's chance of winning to be over 86%.</p><p>The prediction market Kalshi was heavily advertising at the bus stops in NYC last week. These displays consistently showed Mamdani's winning probability in the 80-90% range. As Andrew indicated, since the people betting on these markets had access to the same polling data, it's not surprising they arrived at a similar place.</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2025/11/nyt_nycmayor_result.png" class="kg-image" alt="" loading="lazy" width="1852" height="812" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/2025/11/nyt_nycmayor_result.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/2025/11/nyt_nycmayor_result.png 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1600/2025/11/nyt_nycmayor_result.png 1600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2025/11/nyt_nycmayor_result.png 1852w" sizes="(min-width: 720px) 720px"></figure><hr><p>There are some details I and/or Andrew shoved under the rug but these minor items need not bother anyone except statisticians.</p><p>For example, we effectively treated the values in each vote-share series as a different "random sample" from an underlying population. One can complain that these are non-random samples due to different pollsters, and different polling periods.</p><p>Another complaint may be that the series of polls is too short, only about six values per race. One can, in theory, fetch a longer series but there is a trade-off; polls conducted far from the election day are generally less reliable, and the further back, the more unreliable.</p><p>We used an "empirical" estimate of the sampling variability by computing the variability of the series of numbers. As the series is short, this estimate is error-prone. Nevertheless, it's better than the so-called "parametric" alternative, which results in a severe underestimate of the uncertainty. (This parametric estimate arises from a theoretical model.)</p><p>Finally, all steps above require <a href="https://www.junkcharts.com/tag/assumptions/" rel="noreferrer">assumptions</a>. If one uses a different guess of how many Republican voters would shift their vote to Cuomo, the estimated vote shares would have been different. When it comes to assumptions, what's certain is that not making assumptions is the worst possible strategy. In this case, not making assumptions is the same as assuming that no Republican voters would vote for Cuomo.</p><p></p>
          ]]></content:encoded>
          <description><![CDATA[ It wasn&#39;t that close after all ]]></description>
        </item>
        <item>
          <title><![CDATA[ Turnout tuneout ]]></title>
          <link>https://www.junkcharts.com/turnout-tuneout/</link>
          <guid isPermaLink="false">6909004032c28900017f7537</guid>
          <category><![CDATA[ interaction ]]></category>
          <dc:creator><![CDATA[ Kaiser Fung ]]></dc:creator>
          <pubDate>Wed, 05 Nov 2025 08:44:03 -0500</pubDate>
          <content:encoded><![CDATA[
            <style type="text/css">
              * { box-sizing: border-box !important; }
              img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                display: block !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              figure {
                margin: 0 !important;
                padding: 0 !important;
                max-width: 100% !important;
                width: 100% !important;
                overflow: hidden !important;
              }
              .kg-image, .kg-image-card img, .kg-gallery-image img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                object-fit: contain !important;
              }
              .kg-image-card, .kg-gallery-card {
                max-width: 100% !important;
                width: 100% !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              .post-content, .gh-content {
                max-width: 100% !important;
                overflow-x: hidden !important;
              }
              div, p, span {
                max-width: 100% !important;
              }
            </style>
            <p>I enjoy the <a href="https://www.junkcharts.com/tag/scatter-plot/" rel="noreferrer">scatter plot</a> published by the <a href="https://www.junkcharts.com/tag/nyt/" rel="noreferrer">New York Times</a> team about New Jersey's elections (<a href="https://www.nytimes.com/2025/10/30/us/politics/republicans-hispanics-new-jersey.html?nl=The+Upshot&ref=junkcharts.com" rel="noreferrer">link</a>; paywall).</p><p>On this plot, each dot represents a "township" (with at least 500 votes cast in 2021). The yellow dots depict "majority non-white" towns; based on the accompanying article, the driving force are Hispanic voters. The gray dots, unlabeled, show the majority white towns. </p><p>When the analyst classifies the towns in this manner, a clear pattern emerges. Almost all the yellow dots are found in the lower right quadrant while the gray dots cluster in the upper left of the chart. The data tell a compelling story; what is it?</p><p>The backdrop of the chart are two successive recent elections: the 2020 Presidential election, and the 2021 state Governer's election. The horizontal <a href="https://www.junkcharts.com/tag/axis/" rel="noreferrer">axis</a> shows the vote margin in the 2021 election: the right side (of zero) represents the towns Democrats won while the left side (of zero), where Republicans won. It's not surprising that the Democrats are stronger in the majority non-white towns; almost all the yellow dots are on the right side of zero. (Much of the NYT article concerns the shift of Hispanic voters towards Trump in 2024 but this isn't the story of this scatter plot.)</p><p>The vertical <a href="https://www.junkcharts.com/tag/axis/" rel="noreferrer">axis</a> shows the drop in turnout from 2020 to 2021. The change was dramatic, ranging from about -20% to above -60%. </p><p>The story: the vast majority of the gray dots lie above the yellow dots. This means that the tuneout was much more severe in the majority non-white towns relative to the majority white towns. Add to that, the Democrats' strongholds are majority non-white towns. So, the turnout of Democratic voters deteriorated much more than that of Republican voters. We expect the Governer's margin of victory to be much smaller than the President's. </p><hr><p>Lurking behind this scatter plot are four quadrants, with most towns found in two of the quadrants.</p><p>It's easy to delineate the left and right sides: just use the 0% voting margin divider so that left are Republican towns while right are Democratic towns.</p><p>How about the top and bottom divider? Here, I can find the average change in turnout in the whole of New Jersey. This turns out to be 40% - 72% = -32%. </p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2025/11/junkcharts_redo_nytnjturnout_averages.png" class="kg-image" alt="" loading="lazy" width="1178" height="1202" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/2025/11/junkcharts_redo_nytnjturnout_averages.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/2025/11/junkcharts_redo_nytnjturnout_averages.png 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2025/11/junkcharts_redo_nytnjturnout_averages.png 1178w" sizes="(min-width: 720px) 720px"></figure><p>The location of the average turnout drop is a bit odd. It makes me wonder if NYT is using a different data source. I'd have expected the line to sit lower since the large dots are mostly below the line, and in addition, the pile of small dots also appears below the line. </p><hr><p>What we just observed is an example of an "<a href="https://www.junkcharts.com/tag/interaction/" rel="noreferrer">interaction</a>" effect. The observed data result from the simultaneous operation of two effects. We cannot artificially impose "change one while keeping the other constant."</p><p>Effect 1 is the correlation between vote margin and race; majority non-white towns skew Democrat. Effect 2 is the correlation between vote margin and change in turnout; towns with larger turnout drops skew Democrat. Both effects are driven by the Hispanic voters so they happen simultaneously.</p><p>Let's see what we should observe if only one of the effects exists.</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2025/11/junkcharts_redo_nytnjturnout_scenarios.png" class="kg-image" alt="" loading="lazy" width="1856" height="938" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/2025/11/junkcharts_redo_nytnjturnout_scenarios.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/2025/11/junkcharts_redo_nytnjturnout_scenarios.png 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1600/2025/11/junkcharts_redo_nytnjturnout_scenarios.png 1600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2025/11/junkcharts_redo_nytnjturnout_scenarios.png 1856w" sizes="(min-width: 720px) 720px"></figure><p>On the LHS chart, I assume that turnout is dropping everywhere about the same in 2021 so Effect 2 is absent. The relationship between Turnout Drop and vote margin is modeled as a flat line. Since Effect 1 is present, I expect the majority non-white towns to skew Democratic, and therefore the cluster of yellow dots is situated to the right of the cluster of gray dots. </p><p>On the RHS chart, I assume Effect 1 is absent, meaning that vote margin is not associated with race. (This is a thought experiment.) This assumption implies that the yellow and gray clusters must overlap, so that voting behavior does not depend on the majority race in these towns. If Effect 2 is present, e.g. if I assume that  towns suffering higher drop in turnout skew Democratic, then the relationship between the two variables plotted is a negatively-sloped line. </p><p>The actual pattern is the combination of these two, which is what statisticians mean by an "interaction." </p><p>It's from the RHS chart that we can see why it's silly to impose "change one while keeping the other constant." To keep the other effect at bay, we have to assume that towns with majority non-white populations have similar voting margins as towns with majority white populations, a clear misrepresentation of reality. Said differently, many of the voters who are skipping the 2021 elections are the same voters who live in majority non-white towns, and we can't keep them in one column while deleting them from another column.</p>
          ]]></content:encoded>
          <description><![CDATA[ A clever scatter plot reveals an interaction ]]></description>
        </item>
        <item>
          <title><![CDATA[ Light entertainment: home work ]]></title>
          <link>https://www.junkcharts.com/light-entertainment-home-work/</link>
          <guid isPermaLink="false">690a0f4e32c28900017f7628</guid>
          <category><![CDATA[ Light entertainment ]]></category>
          <dc:creator><![CDATA[ Kaiser Fung ]]></dc:creator>
          <pubDate>Tue, 04 Nov 2025 10:49:47 -0500</pubDate>
          <content:encoded><![CDATA[
            <style type="text/css">
              * { box-sizing: border-box !important; }
              img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                display: block !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              figure {
                margin: 0 !important;
                padding: 0 !important;
                max-width: 100% !important;
                width: 100% !important;
                overflow: hidden !important;
              }
              .kg-image, .kg-image-card img, .kg-gallery-image img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                object-fit: contain !important;
              }
              .kg-image-card, .kg-gallery-card {
                max-width: 100% !important;
                width: 100% !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              .post-content, .gh-content {
                max-width: 100% !important;
                overflow-x: hidden !important;
              }
              div, p, span {
                max-width: 100% !important;
              }
            </style>
            <p>Trick or treat seems to have died in the city. Glad to see it's alive somewhere.</p><p>(Tip from long-time reader Chris P.)</p>
          ]]></content:encoded>
          <description><![CDATA[ What happened on Halloween night ]]></description>
        </item>
        <item>
          <title><![CDATA[ Big drops and big dots ]]></title>
          <link>https://www.junkcharts.com/big-drops-and-big-dots/</link>
          <guid isPermaLink="false">69043ea432c28900017f7466</guid>
          <category><![CDATA[ line chart ]]></category>
          <dc:creator><![CDATA[ Kaiser Fung ]]></dc:creator>
          <pubDate>Mon, 03 Nov 2025 09:15:33 -0500</pubDate>
          <content:encoded><![CDATA[
            <style type="text/css">
              * { box-sizing: border-box !important; }
              img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                display: block !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              figure {
                margin: 0 !important;
                padding: 0 !important;
                max-width: 100% !important;
                width: 100% !important;
                overflow: hidden !important;
              }
              .kg-image, .kg-image-card img, .kg-gallery-image img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                object-fit: contain !important;
              }
              .kg-image-card, .kg-gallery-card {
                max-width: 100% !important;
                width: 100% !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              .post-content, .gh-content {
                max-width: 100% !important;
                overflow-x: hidden !important;
              }
              div, p, span {
                max-width: 100% !important;
              }
            </style>
            <p>For some unclear reasons, the response rates for various instruments used by the Bureau of Labor Statistics (BLS) to measure the nation's well-being have been dropping . As the above chart shows, the CPS survey (which is used to measure the unemployment rate) are seeing 15% fewer responses than about 10 years ago.</p><p>The chart is unnecessarily busy. This combination of dots and <a href="https://www.junkcharts.com/tag/line-chart/" rel="noreferrer">line</a> segments curiously elevate the dots above the lines. As a result, the chart spotlights the "noise" in the data series. </p><p>Two years later, someone at the BLS noticed this problem, and published a new design, which has definitely improved:</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2025/10/household-survey-response-rate-2025.png" class="kg-image" alt="" loading="lazy" width="1160" height="1060" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/2025/10/household-survey-response-rate-2025.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/2025/10/household-survey-response-rate-2025.png 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2025/10/household-survey-response-rate-2025.png 1160w" sizes="(min-width: 720px) 720px"></figure><p>The dots are no longer there, and so they don't steal all of our attention.</p><p>Nevertheless, this revised design still lets the background noise drown out the signal.</p><p>Besides, you're involuntarily twisting your neck as you work out which color and which line is which survey.</p><hr><p>In this revision, I put the dots back but push them to the background. I add a smoothed line for each survey that depicts the downward trend in response rate. The line labels are at the end of each line.</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2025/10/junkcharts_redo_blsresponserates.png" class="kg-image" alt="" loading="lazy" width="1910" height="1188" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/2025/10/junkcharts_redo_blsresponserates.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/2025/10/junkcharts_redo_blsresponserates.png 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1600/2025/10/junkcharts_redo_blsresponserates.png 1600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2025/10/junkcharts_redo_blsresponserates.png 1910w" sizes="(min-width: 720px) 720px"></figure><p>I didn't check why ATUS has zero response since 2024. It might just have been suspended.</p>
          ]]></content:encoded>
          <description><![CDATA[ Droppings at BLS ]]></description>
        </item>
        <item>
          <title><![CDATA[ How many gigs? ]]></title>
          <link>https://www.junkcharts.com/how-many-gigs/</link>
          <guid isPermaLink="false">6904227832c28900017f740e</guid>
          <category><![CDATA[ measurement ]]></category>
          <dc:creator><![CDATA[ Kaiser Fung ]]></dc:creator>
          <pubDate>Fri, 31 Oct 2025 08:41:14 -0400</pubDate>
          <content:encoded><![CDATA[
            <style type="text/css">
              * { box-sizing: border-box !important; }
              img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                display: block !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              figure {
                margin: 0 !important;
                padding: 0 !important;
                max-width: 100% !important;
                width: 100% !important;
                overflow: hidden !important;
              }
              .kg-image, .kg-image-card img, .kg-gallery-image img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                object-fit: contain !important;
              }
              .kg-image-card, .kg-gallery-card {
                max-width: 100% !important;
                width: 100% !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              .post-content, .gh-content {
                max-width: 100% !important;
                overflow-x: hidden !important;
              }
              div, p, span {
                max-width: 100% !important;
              }
            </style>
            <p>During a recent interview, I was asked an excellent question: how does the gig economy affect the accuracy of our employment statistics?</p><p>The short answer is: it shouldn't have a large impact.</p><p>Nevertheless, a quick web (or AI) search suggests that certain experts have put up arguments claiming that official statistics under-count gig workers.</p><p>That's hard to believe.</p><p>Let's start with how BLS (Bureau of Labor Statistics) counts employment. The primary source is the Current Population Survey (CPS). A random sample of households are contacted, and asked a bunch of questions. The key question related to their employment status is whether they worked at least one hour during a so-called "reference week." Anyone who responded yes is counted as an employed person.</p><p>I did just say "person." Because the CPS survey - by extension, the unemployment rate - counts people, not jobs. It doesn't matter if a gig worker has five jobs; working just one hour during that week for any single employer is sufficient for him/her to be counted as employed. Thus, all gig workers who work one hour or more should already be counted.</p><p>Those people who allege under-counting make the following argument: they assert that some gig workers do not see their gigs as "work," and therefore when contacted by BLS data collectors, they would proclaim themselves unemployed.</p><p>As we are now years deep after the emergence of the gig economy, I don't believe the claim that gig workers see themselves as not working.</p><hr><p>Conversely, there is an argument to be made that the transition into the gig economy may artificially inflate the number of employed people.</p><p>Take an adjunct professor as an example of a gig worker. The university has turned a single teaching job previously held by a single person into several adjunct teaching jobs held by different people. In fact, the administration probably hires an accountant whose job is to make sure that each adjunct professor doesn't work too many hours, as otherwise, the school must treat him/her as an employee with benefits.</p><p>Since only one hour of work suffices to qualify as employed, each of the adjunct professors count as an employed person. This splitting of one job into several pushes the number employed upwards. </p>
          ]]></content:encoded>
          <description><![CDATA[ Counting people in the gig economy ]]></description>
        </item>
        <item>
          <title><![CDATA[ Ten pie charts. Are you worried yet? ]]></title>
          <link>https://www.junkcharts.com/ten-pie-charts-are-you-worried-yet/</link>
          <guid isPermaLink="false">68fec1e5047b3800011c7734</guid>
          <category><![CDATA[ Pie chart ]]></category>
          <dc:creator><![CDATA[ Kaiser Fung ]]></dc:creator>
          <pubDate>Mon, 27 Oct 2025 09:35:14 -0400</pubDate>
          <content:encoded><![CDATA[
            <style type="text/css">
              * { box-sizing: border-box !important; }
              img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                display: block !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              figure {
                margin: 0 !important;
                padding: 0 !important;
                max-width: 100% !important;
                width: 100% !important;
                overflow: hidden !important;
              }
              .kg-image, .kg-image-card img, .kg-gallery-image img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                object-fit: contain !important;
              }
              .kg-image-card, .kg-gallery-card {
                max-width: 100% !important;
                width: 100% !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              .post-content, .gh-content {
                max-width: 100% !important;
                overflow-x: hidden !important;
              }
              div, p, span {
                max-width: 100% !important;
              }
            </style>
            <p>Statista published this series of <a href="https://www.junkcharts.com/tag/pie-chart/" rel="noreferrer">pie charts</a> that illustrate results from a <a href="https://www.junkcharts.com/tag/survey/" rel="noreferrer">survey</a> asking Americans what they are worried about (<a href="https://www.statista.com/chart/35340/respondents-rank-issues-united-states/?__sso_cookie_checker=failed&ref=junkcharts.com" rel="noreferrer">link</a>). The survey question has 18 options, while the chart covers the top 10 issues. "Top" is defined by the proportion of respondents who ranked the issue as their topmost concern.</p><p>The chart form is a <a href="https://www.junkcharts.com/tag/small-multiples/" rel="noreferrer">small multiples</a> of pie charts. Each pie chart addresses a specific issue, and contains one data point – the proportion of people who ranked that issue as their top worry. The data series is encoded twice, first in the area (or angle) of the sector, and also in its <a href="https://www.junkcharts.com/tag/color/" rel="noreferrer">color</a>. </p><hr><p>This chart fails our <a href="https://www.junkcharts.com/tag/self-sufficiency-test/" rel="noreferrer">self-sufficiency test</a>. If stripped of the <a href="https://www.junkcharts.com/tag/data-labels/" rel="noreferrer">data labels</a>, we are left with:</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2025/10/junkcharts_redo_statista_americanworries_1.png" class="kg-image" alt="" loading="lazy" width="1030" height="1074" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/2025/10/junkcharts_redo_statista_americanworries_1.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/2025/10/junkcharts_redo_statista_americanworries_1.png 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2025/10/junkcharts_redo_statista_americanworries_1.png 1030w" sizes="(min-width: 720px) 720px"></figure><p>It takes some effort to figure out the proportion of each sector. It also shows the minimal contribution from the use of color. Using <a href="https://www.junkcharts.com/tag/color/" rel="noreferrer">color</a> alone, no reader can possibly learn the data in any of these pie charts.</p><p>It's not clear to me that the color assignments were applied using a formula. The change between Immigration and Unemployment on the second row is quite noticeable, and feels larger than the change between Health and social security and Poverty on the first row. Yet, the former is a fraction of a percent while the latter represents 3 percent.</p><hr><p>Here is a <a href="https://www.junkcharts.com/tag/bar-chart/" rel="noreferrer">bar chart</a> showing the same data:</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2025/10/junkcharts_redo_statista_americanworries_2.png" class="kg-image" alt="" loading="lazy" width="1664" height="946" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/2025/10/junkcharts_redo_statista_americanworries_2.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/2025/10/junkcharts_redo_statista_americanworries_2.png 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1600/2025/10/junkcharts_redo_statista_americanworries_2.png 1600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2025/10/junkcharts_redo_statista_americanworries_2.png 1664w" sizes="(min-width: 720px) 720px"></figure><p>I like to extend the <a href="https://www.junkcharts.com/tag/axis/" rel="noreferrer">axis</a> to the full 100 percent, making it easier to judge the length of each bar, as a proportion of the total.</p><p>I chose only two shades because the gaps between successive data points are modest for the most part.</p><p>The bar chart does not require printing the entire dataset to be understood.</p>
          ]]></content:encoded>
          <description><![CDATA[ Ten pie charts about American worries ]]></description>
        </item>
        <item>
          <title><![CDATA[ Peak social media? Depends on how you measure it ]]></title>
          <link>https://www.junkcharts.com/peak-social-media-depends-on-how-you-measure-it/</link>
          <guid isPermaLink="false">68f40d3acb7b040001087e92</guid>
          <category><![CDATA[ cohort ]]></category>
          <dc:creator><![CDATA[ Kaiser Fung ]]></dc:creator>
          <pubDate>Mon, 20 Oct 2025 08:00:52 -0400</pubDate>
          <content:encoded><![CDATA[
            <style type="text/css">
              * { box-sizing: border-box !important; }
              img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                display: block !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              figure {
                margin: 0 !important;
                padding: 0 !important;
                max-width: 100% !important;
                width: 100% !important;
                overflow: hidden !important;
              }
              .kg-image, .kg-image-card img, .kg-gallery-image img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                object-fit: contain !important;
              }
              .kg-image-card, .kg-gallery-card {
                max-width: 100% !important;
                width: 100% !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              .post-content, .gh-content {
                max-width: 100% !important;
                overflow-x: hidden !important;
              }
              div, p, span {
                max-width: 100% !important;
              }
            </style>
            <p>John at the Financial Times produced the above graph, with the headline "Time on social media peaked in 2022, with young people cutting back first" (<a href="https://www.ft.com/content/a0724dd9-0346-4df3-80f5-d6572c93a863?ref=junkcharts.com" rel="noreferrer">link</a>).</p><p>What does he really mean? </p><p>The first part probably refers to the first chart on the left, showing the <a href="https://www.junkcharts.com/tag/aggregation/" rel="noreferrer">aggregate</a> population in the study. The study measures the "average number of hour spent on social media per day." It's a double average: average per day, and average per person. He's describing the peak in the curve observed in 2022, followed by a downtrend in the last two years.</p><p>The second part probably refers to the set of <a href="https://www.junkcharts.com/tag/line-chart/" rel="noreferrer">line charts</a> starting from the second-left chart. Here, he disaggregated the dataset by age group. Each line represents social-media usage of people in the labeled age group. The first three lines exhibit dips at the end, similar to the aggregate while the last two age groups do not show a downward slope. This is summarized as "young people cutting back first".</p><p>There is more we can see from these very nicely drawn charts. For example:</p><ul><li>The older someone is, the less time they tend to spend on social media (the entire lines shift lower as we move left to right)</li><li>From 2014 to 2020 (roughly speaking), the average person increased their social-media usage, so did the average person representing each of these age groups (every line rose steeply during that period)</li></ul><hr><p>"Young people are cutting back first." This statement is quite ambiguous. An unspoken corollary is that older people are not yet cutting back. The statement also implies that young people are reducing usage <strong>as they become older</strong>, as there can be no other way.</p><p>So, there are two distinct issues to consider:</p><ul><li>young people reduce social-media usage as they age (e.g. have family commitment, less free time)</li><li>young people of today use less social media than young people of the past (e.g. no longer "cool", has other pastimes)</li></ul><p>The first point concerns the aging process of individuals while the second point suggests a cultural or generational factor. These are different things.</p><p>The FT charts address the second point only. The reduction in social-media usage is observed between one "generation" and the previous. </p><p>In order to study the first point, one needs a "cohort analysis" in which the <a href="https://www.junkcharts.com/tag/cohort/" rel="noreferrer">cohort</a> is defined by birth year. Think about tracking individuals as they age.</p><hr><p>In the FT analysis, the building block is the single-age <a href="https://www.junkcharts.com/tag/subgroup-analysis/" rel="noreferrer">subgroup</a>, e.g. 18 years old. Between 2014 and 2024, anyone who's 18 years old belongs to the 18 group; these people have birth years from 1996 to 2006.</p><p>A lot changed between 2014 and 2024. In 2014, Instagram had 200 million active users and was a photo app; by 2024, it claimed over 2 billion users, and has pivoted to videos. In 2014, Snapchat and Whatsapp were just gaining traction; Tiktok hasn't even launched; the concept of an influencer was novel. Being 18 in 2024 is very different from being 18 in 2014!</p><p>When analysts from a subgroup, we are claiming that people in the subgroup can be treated as "alike." Sometimes, this isn't the case. The FT analysis further combines several single-age subgroups into larger age groups. For example, the 16-24-year-old age group contains nine single-year subgroups, and together, the people in this age group has birth years spanning 1990 and 2008.  </p><hr><p>Alternatively, we can build birth year cohorts. The most recent cohort that contributes data to John's dataset consists of those born in 2008. These youngsters reached 16 years old in 2024, the last year of data collection. This group is not very informative as we only have one year of observations. One year does not make a trend. We don't yet know how much social media they will consume when they reach 30.</p><p>A more interesting cohort are those born in 1998. In 2014, they were 16 years old and thus, became part of this study. By 2024, they were 26 years old. The study followed them for 10 years. They contribute to our understanding of social-media usage of people between 16 and 26 years old.</p><p>Birth-year cohort is a direct analysis of the effect of aging. Generational change can be captured by modeling level shifts between different birth-year cohorts. For example, the curve for those born in 2000 should probably start at a higher level of usage and possibly remain at a higher level than the curve for those born in 1970. </p><hr><p>In this last section, I attempt to illustrate the ties between the age cohorts and the birth-year cohorts, using FT's charts. I can only tell part of the story because the aggregation has wiped out some of the necessary data.</p><p>Let's consider a 20-year-old in 2014. This person sits in the middle of the 16-to-24 age group.  According to the chart, the average such person consumed about 2.1 hours per day of social media in 2014.</p><p>Five years later (2019), this person was 25 years old, and therefore, his/her data fell into the 25-34 age group. In 2019, that age group on average used 2.6 hours per day of social media. Another five years later (2024), the person is 30 years old. Still part of the 25-34 age group, s/he is still associated with 2.6 hours per day (we can't say anything more unless we have single-age cohorts.)</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2025/10/junkcharts_redo_ftsocialtimebyage_1.png" class="kg-image" alt="" loading="lazy" width="1808" height="1166" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/2025/10/junkcharts_redo_ftsocialtimebyage_1.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/2025/10/junkcharts_redo_ftsocialtimebyage_1.png 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1600/2025/10/junkcharts_redo_ftsocialtimebyage_1.png 1600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2025/10/junkcharts_redo_ftsocialtimebyage_1.png 1808w" sizes="(min-width: 720px) 720px"></figure><p>Let's take another group (55-to-64), for which the middle age is 59.5. In 2024, the average usage was around 1.6 hours per day. Five years earlier, in 2019, the 59.5-year-old was 54.5 years old, which means the data fell into the 45-to-54 age group. The average for that age group in 2019 was also 1.6 hours per day. Another five years earlier, in 2014, the 59.5-year-old was 49.5. The age group remains the same; the usage level in 2014 though was much lower, at 1.1 hours per day. (Once, again, we need single-age cohort data to know if this aggregate number is representative or not.)</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2025/10/junkcharts_redo_ftsocialtimebyage_2.png" class="kg-image" alt="" loading="lazy" width="1836" height="1162" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/2025/10/junkcharts_redo_ftsocialtimebyage_2.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/2025/10/junkcharts_redo_ftsocialtimebyage_2.png 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1600/2025/10/junkcharts_redo_ftsocialtimebyage_2.png 1600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2025/10/junkcharts_redo_ftsocialtimebyage_2.png 1836w" sizes="(min-width: 720px) 720px"></figure><p></p><p> </p>
          ]]></content:encoded>
          <description><![CDATA[ Separating individual and group effects over time ]]></description>
        </item>
        <item>
          <title><![CDATA[ Can measure vs should measure ]]></title>
          <link>https://www.junkcharts.com/can-measure-vs-should-measure/</link>
          <guid isPermaLink="false">68efd893cb7b040001087dd4</guid>
          <category><![CDATA[ data quality ]]></category>
          <dc:creator><![CDATA[ Kaiser Fung ]]></dc:creator>
          <pubDate>Thu, 16 Oct 2025 08:55:57 -0400</pubDate>
          <content:encoded><![CDATA[
            <style type="text/css">
              * { box-sizing: border-box !important; }
              img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                display: block !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              figure {
                margin: 0 !important;
                padding: 0 !important;
                max-width: 100% !important;
                width: 100% !important;
                overflow: hidden !important;
              }
              .kg-image, .kg-image-card img, .kg-gallery-image img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                object-fit: contain !important;
              }
              .kg-image-card, .kg-gallery-card {
                max-width: 100% !important;
                width: 100% !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              .post-content, .gh-content {
                max-width: 100% !important;
                overflow-x: hidden !important;
              }
              div, p, span {
                max-width: 100% !important;
              }
            </style>
            <p><a href="https://www.junkcharts.com/tag/enrico-bertini/" rel="noreferrer">Enrico Bertini</a> has been putting out a lot of good content lately. In this post (<a href="https://filwd.substack.com/p/good-charts-wrong-data-a-data-sanity?ref=junkcharts.com" rel="noreferrer">link</a>), he advises that "no amount of design or data processing skills can overcome problems inherent in the data due to the way it was generated and collected."</p><p>Readers here will be familiar with this sentiment. This is one of the reasons why I created the <a href="https://www.junkcharts.com/junk-charts-trifecta-checkup-the-definitive-guide/" rel="noreferrer">Junk Charts Trifecta Checkup</a>. Under this framework, the problem raised by Enrico is identified as a "Type D" chart, defined as a chart that deploys a good visual design to answer a well-posed problem but as data visualization, because it fails to convey the meaning of the underlying data.</p><p>Enrico goes on to delineate the modes of failure:</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2025/10/enricobertini_modesoffailure.webp" class="kg-image" alt="" loading="lazy" width="1456" height="425" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/2025/10/enricobertini_modesoffailure.webp 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/2025/10/enricobertini_modesoffailure.webp 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2025/10/enricobertini_modesoffailure.webp 1456w" sizes="(min-width: 720px) 720px"></figure><p>Let's walk through these concepts, while mapping his terminology to other similar concepts you may have heard of. For simplicity, I'm going to imagine below a  dataset that measures consumer behavior, so the unit of measurement is a person.</p><p><strong>Representation gap</strong> – this happens when the observed sample of people does not fully represent the "population" of people that the analyst intends to describe. This is otherwise known as <strong>sampling bias</strong> but I do like Enrico's alternative phrasing. Depending on your discipline, you may also call it selection or filtering effect.</p><p><strong>Accuracy gap</strong> – this is familiar to statisticians as <strong>measurement error</strong>, defined as the gap between the (unobservable) true values and the observed values. If we are measuring someone's weight, it may be the case that the scale does not give accurate measurements - the inaccuracy can be a systematic bias (e.g. it starts not from zero) or the scale can suffer from a high margin of error, or both at the same time.</p><p><strong>Interpretation gap –</strong> this issue is sometimes called low <strong>construct validity</strong>. The problem is what is measured isn't quite what one wants to measure. This happens in real life because we measure what we can measure, which is not necessarily the same as what one ought to measure. Note that this issue exists even where there is no measurement error (or accuracy gap). The act of using a "proxy" measure creates this interpretation gap.</p><p><strong>Consistency gap</strong> – I see this is a component of the measurement error mentioned above. This has to do with variability across multiple measurement devices, for example, different interviewers when conducting surveys. In addition to variability within each instrument, there may be differences between instruments.</p><hr><p>I have always been fascinated by how a device like the Apple Watch measures a user's blood oxygen level. Below is an excerpt of what AI tells me how Apple's technology works:</p><blockquote>**Infrared Light and Red Light**: The watch features a specific set of sensors that emit both infrared and red light. These lights penetrate the skin and are absorbed differently by oxygen-rich (arterial) and oxygen-poor (venous) blood.</blockquote><blockquote>**Photodiodes**: These sensors then measure the amount of light that is reflected back to the watch. By analyzing the reflected light, the device can calculate the proportion of oxygenated versus deoxygenated blood.</blockquote><p>&nbsp;This answer is actually incomplete because what shows up on the Apple Watch is a single percentage that presumably ranges from 0 to 100%. It is not a proportion. </p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2025/10/applewatch_bloodoxygen-1.jpg" class="kg-image" alt="" loading="lazy" width="1248" height="702" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/2025/10/applewatch_bloodoxygen-1.jpg 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/2025/10/applewatch_bloodoxygen-1.jpg 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2025/10/applewatch_bloodoxygen-1.jpg 1248w" sizes="(min-width: 720px) 720px"></figure><p>So, I asked a follow-up question:</p><blockquote>Explain this more. "By analyzing the reflected light, the device can calculate the proportion of oxygenated versus deoxygenated blood."</blockquote><p>The key bit is this:</p><blockquote>**Ratio Analysis**: The device analyzes the ratio of red light to infrared light that has been reflected back.</blockquote><blockquote>&nbsp; - High levels of infrared light indicate a higher concentration of deoxygenated blood, while high levels of red light indicate oxygenated blood.</blockquote><blockquote>**Algorithm Implementation**: Using complex algorithms, the watch converts these measurements into a percentage of blood oxygen saturation, usually expressed as SpO2.</blockquote><blockquote>&nbsp; - Normal SpO2 levels typically range from **95% to 100%**; levels below this may indicate potential health issues.</blockquote><p>Note where it mentions "complex <a href="https://www.junkcharts.com/tag/algorithm/" rel="noreferrer">algorithms</a>". So, what the watch actually measures are the levels of red and infrared light reflected back. What the user expects to be measured is the blood oxygen saturation percentage. There is an "interpretation gap" because the algorithms take what is measured, and transforms them to what we ought to measure; the value shown on the watch is an indirect, proxy measurement that inherently involves construct validity.</p><p>Further, any device has measurement error. Some of this is caused by the user, e.g. is the watch tightly fitted on the wrist. Even if we accept the proxy measure as sufficient, the observed values may still deviate from the truth. Add to this the inconsistency from one Apple Watch to the next.</p><hr><p>I further asked AI what methods of measurement are used in a clinic. </p><p>The answer describes two options: a non-invasive method that uses fingertip sensors and relies on "complex algorithms" similar to the Apple Watch; and an invasive procedure (arterial blood gas analysis) that draws blood out, and more directly measures the blood oxygen level (along with other tests).</p><p>In this case, what ought to be measured can be measured more directly. The trade-off is a less accurate but more available method. </p>
          ]]></content:encoded>
          <description><![CDATA[ Sources of inaccuracy in data ]]></description>
        </item>
        <item>
          <title><![CDATA[ Putting the ladder of abstraction into practice ]]></title>
          <link>https://www.junkcharts.com/putting-the-ladder-of-abstraction-into-practice/</link>
          <guid isPermaLink="false">68ec03981dcae200014d944f</guid>
          <category><![CDATA[ abstraction ]]></category>
          <dc:creator><![CDATA[ Kaiser Fung ]]></dc:creator>
          <pubDate>Tue, 14 Oct 2025 08:41:45 -0400</pubDate>
          <content:encoded><![CDATA[
            <style type="text/css">
              * { box-sizing: border-box !important; }
              img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                display: block !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              figure {
                margin: 0 !important;
                padding: 0 !important;
                max-width: 100% !important;
                width: 100% !important;
                overflow: hidden !important;
              }
              .kg-image, .kg-image-card img, .kg-gallery-image img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                object-fit: contain !important;
              }
              .kg-image-card, .kg-gallery-card {
                max-width: 100% !important;
                width: 100% !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              .post-content, .gh-content {
                max-width: 100% !important;
                overflow-x: hidden !important;
              }
              div, p, span {
                max-width: 100% !important;
              }
            </style>
            <p>In a previous post, I mentioned <a href="https://www.junkcharts.com/tag/andrew-gelman/" rel="noreferrer">Gelman's</a> advice to use the ladder of abstraction to explain complex charts (<a href="https://www.junkcharts.com/an-abstract-chart-only-statisticians-love/" rel="noreferrer">link</a>).</p><p>The complex chart is this <a href="https://www.junkcharts.com/tag/nyt/" rel="noreferrer">New York Times</a>'s graphic showing the cloud cover at different locations along the path of totality during 2024's full solar eclipse.</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2025/10/nyt_solareclipse_clouds_linechart-2.png" class="kg-image" alt="" loading="lazy" width="1178" height="1426" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/2025/10/nyt_solareclipse_clouds_linechart-2.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/2025/10/nyt_solareclipse_clouds_linechart-2.png 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2025/10/nyt_solareclipse_clouds_linechart-2.png 1178w" sizes="(min-width: 720px) 720px"></figure><p>Using this example, I drafted how one can construct a series of charts that build up to this <a href="https://www.junkcharts.com/tag/line-chart/" rel="noreferrer">line chart</a>.</p><p>Start with the least abstract graphic that stays closest to the human experience of our world:</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2025/10/nyt_solareclipse_pathoftotality-2.png" class="kg-image" alt="" loading="lazy" width="2000" height="1484" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/2025/10/nyt_solareclipse_pathoftotality-2.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/2025/10/nyt_solareclipse_pathoftotality-2.png 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1600/2025/10/nyt_solareclipse_pathoftotality-2.png 1600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2025/10/nyt_solareclipse_pathoftotality-2.png 2008w" sizes="(min-width: 720px) 720px"></figure><p>Our sense of space is tied up in <a href="https://www.junkcharts.com/tag/map/" rel="noreferrer">maps</a>, even though maps are also abstractions, subject to the specific distortions of the chosen projection scheme. This map presents the background story of the moon's shadow sweeping along the path of totality from the bottom left to the top right of the map.</p><p>At the same time, the map also foreshadows which parts of the map are less relevant to the topic at hand. This prepares the readers for when I drill down to specific locations in forthcoming graphics.</p><p>The map also couples space and time. <a href="https://www.junkcharts.com/tag/time-series/" rel="noreferrer">Time</a> is provided in <a href="https://www.junkcharts.com/tag/data-labels/" rel="noreferrer">data labels</a> at specific locations. Time increments as one moves along the path of totality.</p><hr><p>Next, I pick a specific location (Rochester, NY) to demonstrate how an entire experience is reduced to a point in time.</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2025/10/junkcharts_redo_rochester_solareclipse_map.png" class="kg-image" alt="" loading="lazy" width="1596" height="958" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/2025/10/junkcharts_redo_rochester_solareclipse_map.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/2025/10/junkcharts_redo_rochester_solareclipse_map.png 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2025/10/junkcharts_redo_rochester_solareclipse_map.png 1596w" sizes="(min-width: 720px) 720px"></figure><p>We focus our attention only on the center of the path of totality, as depicted by the added red line. Recall that in the <a href="https://www.junkcharts.com/tag/interactive/" rel="noreferrer">interactive</a> map in the original New York Times's <a href="https://www.junkcharts.com/an-abstract-chart-only-statisticians-love/" rel="noreferrer">article</a>, the designer lets the reader play with the top slider bar to control the moon's shadow as it moves through time and space. </p><p>Here, I nail the moon's shadow to the middle of the observation period, presumably the best time for viewing. This graphic explains how each location will eventually become a single point, which corresponds to a geographical location and a point in time.</p><hr><p>In the above map, I'd have left out the cloud cover data, holding them for later. </p><p>Then, I make use of the historical cloud cover map to introduce the new variable of visibility.</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2025/10/nyt_solareclipse_clouds_historical-1.png" class="kg-image" alt="" loading="lazy" width="2000" height="1108" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/2025/10/nyt_solareclipse_clouds_historical-1.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/2025/10/nyt_solareclipse_clouds_historical-1.png 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1600/2025/10/nyt_solareclipse_clouds_historical-1.png 1600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2025/10/nyt_solareclipse_clouds_historical-1.png 2242w" sizes="(min-width: 720px) 720px"></figure><p>A key takeaway is the <a href="https://www.junkcharts.com/tag/color/" rel="noreferrer">color</a> <a href="https://www.junkcharts.com/tag/legend/" rel="noreferrer">legend</a>, with yellow indicating better visibility. I shall reuse this color scheme.</p><hr><p>As a next step, I prepare readers for the shift from maps to a line chart. This step significantly ratchets up the abstraction.</p><p>I reduce the above map to a line chart for Rochester.</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2025/10/junkcharts_rochester_cloud_cover_illustrative_chart.png" class="kg-image" alt="" loading="lazy" width="1430" height="966" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/2025/10/junkcharts_rochester_cloud_cover_illustrative_chart.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/2025/10/junkcharts_rochester_cloud_cover_illustrative_chart.png 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2025/10/junkcharts_rochester_cloud_cover_illustrative_chart.png 1430w" sizes="(min-width: 720px) 720px"></figure><p>The <a href="https://www.junkcharts.com/tag/color/" rel="noreferrer">color</a> scheme is retained (not quite, as I made 3 not 7 levels of color but you get the drift). </p><p>The previously developed idea of looking only at the expected time of totality is repeated to focus attention to a single point on this line chart. This sets the reader up to recognize that Rochester will soon feature as a single dot in our final chart.</p><hr><p>Lastly, we arrive at the final graphic, which is the New York Times's line chart – with an added flourish of yellow inspired by that historical cloud cover map.</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2025/10/junkcharts_redo_nytsolarsclipsecloudlinechart.png" class="kg-image" alt="" loading="lazy" width="1032" height="1260" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/2025/10/junkcharts_redo_nytsolarsclipsecloudlinechart.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/2025/10/junkcharts_redo_nytsolarsclipsecloudlinechart.png 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2025/10/junkcharts_redo_nytsolarsclipsecloudlinechart.png 1032w" sizes="(min-width: 720px) 720px"></figure><p>Instead of being plunged directly into a complex graphic, I have utilized the ladder of abstraction to present important components, one at a time, building up to the final abstraction.</p><p>P.S. I haven't found an elegant way to explain turning the line chart sideways so a small gap persists.</p>
          ]]></content:encoded>
          <description><![CDATA[ Explaining complexity one piece at a time ]]></description>
        </item>
        <item>
          <title><![CDATA[ Applying the band-aid, missing the wound ]]></title>
          <link>https://www.junkcharts.com/applying-the-band-aid-missing-the-wound/</link>
          <guid isPermaLink="false">68e68069a6f93c000172b1f2</guid>
          <category><![CDATA[ Aggregation ]]></category>
          <dc:creator><![CDATA[ Kaiser Fung ]]></dc:creator>
          <pubDate>Thu, 09 Oct 2025 08:27:28 -0400</pubDate>
          <content:encoded><![CDATA[
            <style type="text/css">
              * { box-sizing: border-box !important; }
              img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                display: block !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              figure {
                margin: 0 !important;
                padding: 0 !important;
                max-width: 100% !important;
                width: 100% !important;
                overflow: hidden !important;
              }
              .kg-image, .kg-image-card img, .kg-gallery-image img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                object-fit: contain !important;
              }
              .kg-image-card, .kg-gallery-card {
                max-width: 100% !important;
                width: 100% !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              .post-content, .gh-content {
                max-width: 100% !important;
                overflow-x: hidden !important;
              }
              div, p, span {
                max-width: 100% !important;
              }
            </style>
            <p>Long-time reader Chris P. sent me the featured <a href="https://www.junkcharts.com/tag/bar-chart/" rel="noreferrer">column chart</a>, through a tweet. The original was published in an Axios article (link).</p><p>The Axios author sourced the data to an CSIS report (<a href="https://www.csis.org/analysis/left-wing-terrorism-and-political-violence-united-states-what-data-tells-us?ref=junkcharts.com" rel="noreferrer">link</a>), which means the original original is this column chart:</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2025/10/csis_leftwingrise.png" class="kg-image" alt="" loading="lazy" width="1360" height="1098" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/2025/10/csis_leftwingrise.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/2025/10/csis_leftwingrise.png 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2025/10/csis_leftwingrise.png 1360w" sizes="(min-width: 720px) 720px"></figure><p>The Axios chart is a subset of this CSIS chart, after dropping the last three categories of Jihadist, Ethnonationalist, and Other. This is an unorthodox distillation of a chart; typically, one would combine those three categories into "Others" and keep it on the plot. By removing them completely, the reader may mistakenly assume that the column heights represent the total count of "terrorist attacks and plots."</p><p>Amusingly, the CSIS report is headlined "Left-wing terrorism...." – the story being pushed is that if one looks at 20 years worth of history, and predicts what might happen in the next few years, one should ignore all the data except the last 6 months. </p><hr><p>On a report centering left-wing terrorism, the data in the stacked column chart are <a href="https://www.junkcharts.com/tag/sorting/" rel="noreferrer">ordered</a> in such a way that the left-wing components (dark blue) sit in the middle of the stack. This means that every such block starts at a different base level, making it difficult to compare heights.</p><p>The designer recognizes this difficulty, and uses an <a href="https://www.junkcharts.com/tag/interactive/" rel="noreferrer">interactive</a> element to overcome it. Clicking on one of the blocks pushes all the other blocks to the background:</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2025/10/junkcharts_csis_leftwingrise_focus.png" class="kg-image" alt="" loading="lazy" width="1314" height="1110" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/2025/10/junkcharts_csis_leftwingrise_focus.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/2025/10/junkcharts_csis_leftwingrise_focus.png 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2025/10/junkcharts_csis_leftwingrise_focus.png 1314w" sizes="(min-width: 720px) 720px"></figure><p>I'd call this a "band-aid." It doesn't cure the malady but it is an improvement.</p><p>The Axios designer applies a different solution – reducing the number of categories plotted to two. With two categories, the subject of <a href="https://www.junkcharts.com/tag/sorting/" rel="noreferrer">order</a> becomes even more prominent: the category shown at the bottom has blocks that start on the horizontal axis while the category shown above is given floating blocks that start at different levels. </p><p>Amusingly, Axios then places the left-wing data in the floating blocks, which means it applied the band-aid but missed the wound!</p><p>This is what the same chart looks like when the order of the categories are reversed:</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2025/10/junkcharts_redo_axiosleftwing_order.png" class="kg-image" alt="" loading="lazy" width="1566" height="900" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/2025/10/junkcharts_redo_axiosleftwing_order.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/2025/10/junkcharts_redo_axiosleftwing_order.png 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2025/10/junkcharts_redo_axiosleftwing_order.png 1566w" sizes="(min-width: 720px) 720px"></figure><p>The <a href="https://www.junkcharts.com/tag/times-series/" rel="noreferrer">trend</a> in left-wing attacks and plots (orange) is much clearer.</p><p>Here, I <a href="https://www.junkcharts.com/tag/aggregation/" rel="noreferrer">combined</a> the three omitted categories into "Others":</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2025/10/junkcharts_redo_axiosleftwing_order_others.png" class="kg-image" alt="" loading="lazy" width="1518" height="960" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/2025/10/junkcharts_redo_axiosleftwing_order_others.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/2025/10/junkcharts_redo_axiosleftwing_order_others.png 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2025/10/junkcharts_redo_axiosleftwing_order_others.png 1518w" sizes="(min-width: 720px) 720px"></figure><p></p>
          ]]></content:encoded>
          <description><![CDATA[ Common mistakes in stacked column charts ]]></description>
        </item>
        <item>
          <title><![CDATA[ Notes on vibe coding 6 ]]></title>
          <link>https://www.junkcharts.com/notes-on-vibe-coding-6/</link>
          <guid isPermaLink="false">68e12d8804c09e00018ee653</guid>
          <category><![CDATA[ vibe coding ]]></category>
          <dc:creator><![CDATA[ Kaiser Fung ]]></dc:creator>
          <pubDate>Tue, 07 Oct 2025 09:36:11 -0400</pubDate>
          <content:encoded><![CDATA[
            <style type="text/css">
              * { box-sizing: border-box !important; }
              img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                display: block !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              figure {
                margin: 0 !important;
                padding: 0 !important;
                max-width: 100% !important;
                width: 100% !important;
                overflow: hidden !important;
              }
              .kg-image, .kg-image-card img, .kg-gallery-image img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                object-fit: contain !important;
              }
              .kg-image-card, .kg-gallery-card {
                max-width: 100% !important;
                width: 100% !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              .post-content, .gh-content {
                max-width: 100% !important;
                overflow-x: hidden !important;
              }
              div, p, span {
                max-width: 100% !important;
              }
            </style>
            <p>In the last note about my vibe coding experiment (<a href="https://www.junkcharts.com/notes-on-vibe-coding-5" rel="noreferrer">link</a>), I mentioned a spreadsheet that I had asked the AI to build, used to keep track of all the posts and images from the old Typepad blog. This little device proves pivotal to the project.</p><p>One piece of information the spreadsheet captures is the name of each image file. Typepad's practice was to assign a random string of characters to name each uploaded image, erasing the user's file names while doing so. I had to reverse the process and gave each image file a unique name. Later, while fixing the images scraped as HTML files, I asked GPT5 to replace the *.html file names with the corrected *.jpg file names. </p><p>Further, when - because of Ghost's practice of not overwriting any uploaded image files - I had to append "-1" to any image transferred twice, I asked GPT5 to update the spreadsheet with the revised *-1.jpg file names. </p><p>So far, so good. I was quite proud of my migration strategy.</p><hr><p>Then, a flash of panic. Suddenly, I realized the spreadsheet's design has a serious flaw. </p><p>It dawned on me that there were duplicate file names. How is that possible, since I explicitly designed the index to be unique?</p><p>The reason is obvious from hindsight. </p><p>For my long-time readers, you may remember I published two separate blogs on Typepad - Junk Charts (data visualization), and Big Data, Plainly Spoken (book blog). On Ghost, the two blogs merged into one stream. All of the data coming out of Typepad are specific for each blog. In my workflow, I made one set of code and applied it to both, sequentially. </p><p>This means I actually have two spreadsheets, one for each blog. The index on each spreadsheet starts from 1, and as a result, there are collisions of some file names across the two spreadsheets.</p><p>Worse, this flaw has already broken some posts that have been migrated. Sadly, checking a few posts confirmed my hunch: these posts contain text from one blog and images from the other!</p><p>I had a big mess on my hands. </p><p>First, I must re-index all the images on the book blog so that their names do not overlap with those of Junkcharts images. Next, I had to round up the file names that collided, and find the posts showing these wrong images.</p><p>So far (and for this reason), I have held off a large-scale migration of posts, so the number of corrections was moderate. But the path out of this mess was riddled with potholes.</p><p>Ghost has this weird policy of not overwriting files with the same name. If a post is showing the wrong image, this implies that the particular file name is duplicated across the two blogs, and the wrong one has been uploaded. The correct image may or may not have been moved yet. If the correct one exists on Ghost's servers, then it must have been given a "-1" suffix due to sending the same file twice.</p><p>If the correct image hasn't yet been given to Ghost, the situation is identical. If I upload that image now, it won't replace the incorrect file but will instead be renamed *-1.jpg.</p><p>So, to fix this mess, I asked the AI coder to find the posts that contain the collided images, and edit the image links to point to the *-1.jpg, instead of the *.jpg. </p><p>What a relief when I visited those pages with incorrect images, and discovered that the fix was in.</p><hr><p>After all these twists and turns, the spreadsheet(s) of the posts and images are still happily tracking everything. It's been a life-saver many times over.</p><p></p><p></p><p></p>
          ]]></content:encoded>
          <description><![CDATA[ The fine line between smart and not smart ]]></description>
        </item>
        <item>
          <title><![CDATA[ An abstract chart only statisticians love ]]></title>
          <link>https://www.junkcharts.com/an-abstract-chart-only-statisticians-love/</link>
          <guid isPermaLink="false">68e1da0004c09e00018ee70c</guid>
          <category><![CDATA[ line chart ]]></category>
          <dc:creator><![CDATA[ Kaiser Fung ]]></dc:creator>
          <pubDate>Mon, 06 Oct 2025 08:47:35 -0400</pubDate>
          <content:encoded><![CDATA[
            <style type="text/css">
              * { box-sizing: border-box !important; }
              img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                display: block !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              figure {
                margin: 0 !important;
                padding: 0 !important;
                max-width: 100% !important;
                width: 100% !important;
                overflow: hidden !important;
              }
              .kg-image, .kg-image-card img, .kg-gallery-image img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                object-fit: contain !important;
              }
              .kg-image-card, .kg-gallery-card {
                max-width: 100% !important;
                width: 100% !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              .post-content, .gh-content {
                max-width: 100% !important;
                overflow-x: hidden !important;
              }
              div, p, span {
                max-width: 100% !important;
              }
            </style>
            <p>In reviewing the New York Times's visual story anticipating the full solar eclipse in 2024 (<a href="https://www.junkcharts.com/there-is-a-time-and-a-place-for-every-shadow-and-cloud/" rel="noreferrer">link</a>), I skipped over their most intricate chart.</p><p>This chart has the handiwork of a statistician. I wonder if non-statisticians would appreciate it. </p><p>The <a href="https://www.junkcharts.com/tag/line-chart" rel="noreferrer">line chart</a> looks simple but is very abstract. It is many layers removed from reality. The chart can't really stand on its own feet. It's better that readers see something less abstract first. (This follows Andrew Gelman's advice in his post on the <a href="https://statmodeling.stat.columbia.edu/2025/05/31/the-ladder-of-abstraction-in-statistical-graphics/?ref=junkcharts.com" rel="noreferrer">ladder of abstraction.</a>)</p><p>Such as this <a href="https://www.junkcharts.com/tag/map" rel="noreferrer">map</a> from the <a href="https://www.junkcharts.com/tag/nyt" rel="noreferrer">NYT</a> article, which was featured in my previous review (<a href="https://www.junkcharts.com/there-is-a-time-and-a-place-for-every-shadow-and-cloud/" rel="noreferrer">link</a>):</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2025/10/nyt_solareclipse_pathoftotality-1.png" class="kg-image" alt="" loading="lazy" width="2000" height="1484" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/2025/10/nyt_solareclipse_pathoftotality-1.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/2025/10/nyt_solareclipse_pathoftotality-1.png 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1600/2025/10/nyt_solareclipse_pathoftotality-1.png 1600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2025/10/nyt_solareclipse_pathoftotality-1.png 2008w" sizes="(min-width: 720px) 720px"></figure><p>This <a href="https://www.junkcharts.com/tag/map" rel="noreferrer">map</a> is less abstract, showing the outline of North America, and the path of totality running from Mexico through the U.S. to Canada. It also shows the path's width. </p><p>In the statistician's chart, the path of totality has been straightened out and made vertical, running from top to bottom, instead of west to east. As the designer explains, the width has been simplified away, the vertical <a href="https://www.junkcharts.com/tag/axis" rel="noreferrer">axis</a> depicting only the center of the path.</p><p>The cloud cover data have also been stripped down. The interactive chart shows everything, including areas far away from the path of totality (both in space and in time). </p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2025/10/nyt_solareclipse_clouds_rochester-2.png" class="kg-image" alt="" loading="lazy" width="2000" height="1162" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/2025/10/nyt_solareclipse_clouds_rochester-2.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/2025/10/nyt_solareclipse_clouds_rochester-2.png 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1600/2025/10/nyt_solareclipse_clouds_rochester-2.png 1600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2025/10/nyt_solareclipse_clouds_rochester-2.png 2254w" sizes="(min-width: 720px) 720px"></figure><p>The statistician's chart, once again, pare things back to the bare minimum. At each location, there is a single number for cloud coverage. Presumably, this is the expected cloud cover at the best time for observation in the highlighted region. The shades of gray have become a single percentage.</p><p>The point of this <a href="https://www.junkcharts.com/tag/line-chart" rel="noreferrer">line chart</a> is to remove all extraneous information, and make a beeline for the key correlation between cloud cover and space. </p><p><a href="https://www.junkcharts.com/tag/time-series" rel="noreferrer">Time</a> is also present, insofar as it is coupled with the vertical <a href="https://www.junkcharts.com/tag/axis" rel="noreferrer">axis</a> of space.</p><hr><p>Statisticians like these types of simple charts that have been cleansed of extraneous information. But we often forget that simplicity begets abstraction.</p><p>The basic story of the data is that most of the places along the center of the path of totality would experience very cloudy conditions during the solar eclipse.</p><hr><p>That cough I'm hearing may be you hinting at me to get an eye checkup. Will the cloud cover really be that dense since there is still quite a bit of white space in the chart?</p><p>It turns out that the right <a href="https://www.junkcharts.com/tag/axis/" rel="noreferrer">axis</a> line has been curiously erased. As a result, readers may mistakenly take the right edge of the chart as its right axis. In the following chart, I restored the right axis, showing where the cloud cover hits 100%. </p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2025/10/junkcharts_redo_nytsolareclipse_linechart_axis.png" class="kg-image" alt="" loading="lazy" width="1128" height="1242" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/2025/10/junkcharts_redo_nytsolareclipse_linechart_axis.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/2025/10/junkcharts_redo_nytsolareclipse_linechart_axis.png 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2025/10/junkcharts_redo_nytsolareclipse_linechart_axis.png 1128w" sizes="(min-width: 720px) 720px"></figure><p> You can now see that the viewing conditions aren't great towards the east coast. </p><hr><p>One last thing... if this line chart is your thing, one of the key design decisions is its orientation.</p><p>Here is what the chart looks like if we turn it on its side:</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2025/10/junkcharts_redo_nytsolareclipse_linechart_horiz.png" class="kg-image" alt="" loading="lazy" width="1542" height="1232" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/2025/10/junkcharts_redo_nytsolareclipse_linechart_horiz.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/2025/10/junkcharts_redo_nytsolareclipse_linechart_horiz.png 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2025/10/junkcharts_redo_nytsolareclipse_linechart_horiz.png 1542w" sizes="(min-width: 720px) 720px"></figure><p>In Western cultures, it's more natural to have time running left to right (as opposed to top down). The path of totality also happens to go west to east which aligns with a left to right orientation. Higher levels of a key metric being shown higher up the chart is also natural. </p><p>There is a reason why the New York Times printed this chart sideways. They would rather that we don't have to turn our heads when reading the names of places. </p><p>It's a tradeoff. Which orientation would you select?</p><p></p>
          ]]></content:encoded>
          <description><![CDATA[ The NYT made this abstract chart about clouds on full eclipse day ]]></description>
        </item>
        <item>
          <title><![CDATA[ Notes on vibe coding 5 ]]></title>
          <link>https://www.junkcharts.com/notes-on-vibe-coding-5/</link>
          <guid isPermaLink="false">68d6dd1b8ad60c00013669bd</guid>
          <category><![CDATA[ vibe coding ]]></category>
          <dc:creator><![CDATA[ Kaiser Fung ]]></dc:creator>
          <pubDate>Wed, 01 Oct 2025 09:01:24 -0400</pubDate>
          <content:encoded><![CDATA[
            <style type="text/css">
              * { box-sizing: border-box !important; }
              img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                display: block !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              figure {
                margin: 0 !important;
                padding: 0 !important;
                max-width: 100% !important;
                width: 100% !important;
                overflow: hidden !important;
              }
              .kg-image, .kg-image-card img, .kg-gallery-image img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                object-fit: contain !important;
              }
              .kg-image-card, .kg-gallery-card {
                max-width: 100% !important;
                width: 100% !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              .post-content, .gh-content {
                max-width: 100% !important;
                overflow-x: hidden !important;
              }
              div, p, span {
                max-width: 100% !important;
              }
            </style>
            <p>I'm still refining my blog migration code, that is completely written by <a href="https://www.junkcharts.com/tag/AI" rel="noreferrer">AI</a>. In this <a href="https://www.junkcharts.com/tag/vibe-coding" rel="noreferrer">vibe coding</a> experiment, I refrain from meddling with the AI-generated code. </p><p>In the previous update (<a href="https://www.junkcharts.com/notes-on-vibe-coding-4" rel="noreferrer">link</a>), the initial script hit several snags, and I covered the URL matching mystery.</p><p>The other big problem I had to contend with concerned the file type of scraped images. Recall from <a href="https://www.junkcharts.com/notes-on-vibe-coding-1" rel="noreferrer">Notes 1</a> that the Typepad export process does not save images, and so I asked GPT5 to produce a script that reads the Typepad export file, flags every image tag, catalogs and labels them, and then "clicks" on every link to download the images. </p><p>I was highly satisfied with the result. I had avoided a lot of tedious work, such as figuring out all the different ways Typepad may store images. </p><p>I was not that surprised to learn about edge cases that neither I nor the AI coder anticipated. In this case, Typepad sometimes serves up HTML pages with embedded image links instead of the image files themselves. I'm not sure why Typepad sometimes uses this method. It would have been transparent to readers because the images load automatically.</p><p>However, in such cases, the image-downloading script has saved an HTML file, instead of an image file. I discovered this when some of the migrated posts displayed broken image links. Ghost does not expect to find HTML files in an image block, and can't handle them.</p><hr><p>In modern coding, we expect to be able to "roll back" changes. In this case, I was hoping to undo the upload of images that included those useless HTML files. This then clears the way to redo the upload, after cleaning up the HTML files.</p><p>That would be too easy! Ghost does not provide a method to delete files. According to GPT, if a file is uploaded twice, the first file is still present while the second file is renamed "file-2".</p><p>Because the corrected images have different suffixes from the HTML files, I worked around this restriction. I could leave the HTML files stranded in the server, without ever using them.</p><hr><p>I had yet to find the real image files associated with those HTML files. </p><p>A little foresight proved crucial at this stage of <a href="https://www.junkcharts.com/tag/vibe-coding" rel="noreferrer">vibe coding</a>. In my first ask, I wanted and received a spreadsheet that documents key information such as post index and title, image index and title, and so on. From this spreadsheet, it's quick to find all the posts that presented HTML files. The AI coder then implemented a number of ways to grab the underlying images, ridding the HTML wrapping. </p><p>After retrieving these images, I packaged them up and uploaded them to Ghost. </p><p>As mentioned above, I'd much prefer to undo and redo the upload. This ensures that the number of image files on the Ghost server is exactly the total number of images in my blog posts. Doing what I just did, the server contains more files than the expected number, because there is a subset that has duplicates, with one HTML file and one image file. </p><p>If this were the only patch, the impact would have been light. The risk is in accumulating patches as more issues are discovered. The server becomes more and more bloated with "dead" files, which I'm not allowed to remove.</p><hr><p>Meanwhile, codes and scripts are also piling up. All of the above steps were accomplished using AI-generated code. </p><p>The same principle of hygiene applies. A cleaner process would involve just one master script to which I add handlers for edge cases. My hands are tied because Ghost does not let me overwrite anything. They treat a post the same as an image. If I upload another post of the same name, the first one stays put while the second one is given a new name. Instead of regenerating everything, I end up repairing bits and parts.</p><p>My spreadsheet summarizing all the posts and images has been a life-saver throughout this process. At any time, it gives me a snapshot of everything.</p><p>But a serious flaw would soon emerge. Stay tuned.</p><p></p>
          ]]></content:encoded>
          <description><![CDATA[ How I fixed a problem with my AI-generated code ]]></description>
        </item>
        <item>
          <title><![CDATA[ There is a time and a place for every shadow and cloud ]]></title>
          <link>https://www.junkcharts.com/there-is-a-time-and-a-place-for-every-shadow-and-cloud/</link>
          <guid isPermaLink="false">68dbfd8dccd1840001363755</guid>
          <category><![CDATA[ NYT ]]></category>
          <dc:creator><![CDATA[ Kaiser Fung ]]></dc:creator>
          <pubDate>Tue, 30 Sep 2025 12:38:52 -0400</pubDate>
          <content:encoded><![CDATA[
            <style type="text/css">
              * { box-sizing: border-box !important; }
              img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                display: block !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              figure {
                margin: 0 !important;
                padding: 0 !important;
                max-width: 100% !important;
                width: 100% !important;
                overflow: hidden !important;
              }
              .kg-image, .kg-image-card img, .kg-gallery-image img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                object-fit: contain !important;
              }
              .kg-image-card, .kg-gallery-card {
                max-width: 100% !important;
                width: 100% !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              .post-content, .gh-content {
                max-width: 100% !important;
                overflow-x: hidden !important;
              }
              div, p, span {
                max-width: 100% !important;
              }
            </style>
            <p>The <a href="https://www.junkcharts.com/tag/nyt" rel="noreferrer">New York Times </a>published several beautiful pieces related to the solar eclipse in April, 2024. Let's take a look at one of those articles (<a href="https://www.nytimes.com/interactive/2024/science/solar-eclipse-cloud-cover-forecast-map.html?ref=junkcharts.com" rel="noreferrer">link</a>).</p><p>They started with a simple <a href="https://www.junkcharts.com/tag/map" rel="noreferrer">map</a> that quite remarkably captures the main reason for intensive media interest at the time.</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2025/09/nyt_solareclipse_clouds_topmap.png" class="kg-image" alt="" loading="lazy" width="912" height="708" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/2025/09/nyt_solareclipse_clouds_topmap.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2025/09/nyt_solareclipse_clouds_topmap.png 912w" sizes="(min-width: 720px) 720px"></figure><p>This map draws a "path of totality" that swipes across the U.S. from southeast to northwest. The residents of those regions will experience total solar eclipse. Importantly, a <a href="https://www.junkcharts.com/tag/time-series" rel="noreferrer">time</a> element is lurking beneath, represented by the arrow. Several key cities along the path are shown.</p><p>The <a href="https://www.junkcharts.com/tag/map" rel="noreferrer">map</a> is simple but not too simple: it provides the minimal context for readers to interpret the much more complicated charts below.</p><p>***</p><p>For the moment, I switched my focus to Rochester, NY, which is right in the middle of the path of totality. People were flocking to these areas at the time in order to witness the rare meteorological event. One potential party spoiler is visibility, measured by cloud cover. This is the pivot of NYT's data visualization project: producing a graphic that merges where one is and how much cloud there will be.</p><p>I have already mentioned the crucial role of "<a href="https://www.junkcharts.com/tag/time-series" rel="noreferrer">time</a>". When a path of totality is depicted as a line through the map of the United States, it is as if everyone along the path sees the same sky. The following map, scanned from a different NYT article (<a href="https://www.nytimes.com/interactive/2024/science/total-solar-eclipse-maps-path.html?ref=junkcharts.com" rel="noreferrer">here</a>), adds the missing time element:</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2025/09/nyt_solareclipse_pathoftotality.png" class="kg-image" alt="" loading="lazy" width="2000" height="1484" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/2025/09/nyt_solareclipse_pathoftotality.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/2025/09/nyt_solareclipse_pathoftotality.png 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1600/2025/09/nyt_solareclipse_pathoftotality.png 1600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2025/09/nyt_solareclipse_pathoftotality.png 2008w" sizes="(min-width: 720px) 720px"></figure><p>Two things are happening at once. First, the time window for observation varies as the shadow of the moon shifts across space. Second, each location uses its own time zone. On the augmented map above, the key locations are also labeled with an observation time expressed in local time.</p><p>Now comes another <a href="https://www.junkcharts.com/tag/dynamic" rel="noreferrer">dynamic</a> element - cloud cover also moves. To address these issues, the <a href="https://www.junkcharts.com/tag/nyt" rel="noreferrer">NYT</a> team deploys a dynamic map.</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2025/09/nyt_solareclipse_clouds_rochester.png" class="kg-image" alt="" loading="lazy" width="2000" height="1162" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/2025/09/nyt_solareclipse_clouds_rochester.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/2025/09/nyt_solareclipse_clouds_rochester.png 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1600/2025/09/nyt_solareclipse_clouds_rochester.png 1600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2025/09/nyt_solareclipse_clouds_rochester.png 2254w" sizes="(min-width: 720px) 720px"></figure><p>The slider above establishes the location, as well as the relevant time window for observation, expressed in local time. The midpoint of the slider is the best time for viewing, and it recenters for each location (and so does the map).</p><p>As you move the slider back and forth, you can see the shadow of the moon shift across the map. That really is the whole game - what is the predicted cloud cover right when the shadow passes by one's location?</p><p>The slider controls two variables at once. Not only is the shadow of the moon moving but the cloud cover is also morphing.</p><p>The cloud cover is a choropleth, using darker <a href="https://www.junkcharts.com/tag/color" rel="noreferrer">colors</a> for denser clouds. That's intuitive.</p><p>All in all, I like this project a lot, and appreciate the fantastic effort that made it possible.</p><hr><p>A couple of decisions merit our revisiting.</p><p>Further down the article (<a href="https://www.nytimes.com/interactive/2024/science/solar-eclipse-cloud-cover-forecast-map.html?ref=junkcharts.com" rel="noreferrer">link</a>), I was really excited when I saw this <a href="https://www.junkcharts.com/tag/legend" rel="noreferrer">legend</a>:</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2025/09/nyt_solareclipse_clouds_cloudlegend_1.png" class="kg-image" alt="" loading="lazy" width="2000" height="623" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/2025/09/nyt_solareclipse_clouds_cloudlegend_1.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/2025/09/nyt_solareclipse_clouds_cloudlegend_1.png 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1600/2025/09/nyt_solareclipse_clouds_cloudlegend_1.png 1600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2025/09/nyt_solareclipse_clouds_cloudlegend_1.png 2208w" sizes="(min-width: 720px) 720px"></figure><p>This legend encapsulates "show, don't tell". I expected to find it on the dynamic map. It is one of those informative legends that not just describes a classification scheme but conveys even more information to the reader.</p><p>Alas, that was not to be. The actual legend looks like this:</p>
<!--kg-card-begin: html-->
<img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2025/09/nyt_solareclipse_clouds_cloudlegend_2.png" style="width: 300px; height: auto;">
<!--kg-card-end: html-->
<p>The plain-English legend labels of "Less clouds", "More clouds" work well. I'd prefer to use the same grouping of &lt;10%, 1-25%, 25-50%, 50-90% and 90-100%. Presumably, those bins are chosen to align with human perception.</p><hr><p>The following screenshot shows the view from Montreal.</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2025/09/nyt_solareclipse_clouds_montreal.png" class="kg-image" alt="" loading="lazy" width="2000" height="1164" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/2025/09/nyt_solareclipse_clouds_montreal.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/2025/09/nyt_solareclipse_clouds_montreal.png 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1600/2025/09/nyt_solareclipse_clouds_montreal.png 1600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2025/09/nyt_solareclipse_clouds_montreal.png 2238w" sizes="(min-width: 720px) 720px"></figure><p>Notice where the moon's shadow is relative to the slider's position. I believe they are showing the same standard time window for all locations (expressed in local time). As a result, at any location, much of the slider is nonfunctional - for those periods, the shadow is far away, and off the page.</p><p>Instead, an optimal viewing window can be established for each location, and the  slider's labels customized to indicate it.</p><hr><p>Elsewhere in the article (<a href="https://www.nytimes.com/interactive/2024/science/solar-eclipse-cloud-cover-forecast-map.html?ref=junkcharts.com" rel="noreferrer">link</a>), they also show another slightly different design that replaces forecasted cloud cover with historical data. </p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2025/09/nyt_solareclipse_clouds_historical.png" class="kg-image" alt="" loading="lazy" width="2000" height="1108" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/2025/09/nyt_solareclipse_clouds_historical.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/2025/09/nyt_solareclipse_clouds_historical.png 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1600/2025/09/nyt_solareclipse_clouds_historical.png 1600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2025/09/nyt_solareclipse_clouds_historical.png 2242w" sizes="(min-width: 720px) 720px"></figure><p>I suppose they produced it prior to when the latest forecasts became available. In reality, the real-time element is more decorative than substantive: the reason I say this is that any predictive <a href="https://www.junkcharts.com/tag/model" rel="noreferrer">model</a> of cloud cover must be based on historical data!</p><p>I prefer this two-<a href="https://www.junkcharts.com/tag/color" rel="noreferrer">color</a> <a href="https://www.junkcharts.com/tag/legend" rel="noreferrer">legend</a> to the grayscale above. The yellow parts stand out as locations where people have great visibility for the full solar eclipse. Through this, you can clearly see why a static map like this one has limits - the yellow visibility may not be there when the moon's shadow passes by, and on a static map, you can't show different places at different times.</p>
          ]]></content:encoded>
          <description><![CDATA[ A beautiful NYT project masterfully handling time and space dynamics ]]></description>
        </item>
        <item>
          <title><![CDATA[ Notes on vibe coding 4 ]]></title>
          <link>https://www.junkcharts.com/notes-on-vibe-coding-4/</link>
          <guid isPermaLink="false">68d5f4fc8ad60c00013668ba</guid>
          <category><![CDATA[ vibe coding ]]></category>
          <dc:creator><![CDATA[ Kaiser Fung ]]></dc:creator>
          <pubDate>Fri, 26 Sep 2025 14:21:15 -0400</pubDate>
          <content:encoded><![CDATA[
            <style type="text/css">
              * { box-sizing: border-box !important; }
              img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                display: block !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              figure {
                margin: 0 !important;
                padding: 0 !important;
                max-width: 100% !important;
                width: 100% !important;
                overflow: hidden !important;
              }
              .kg-image, .kg-image-card img, .kg-gallery-image img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                object-fit: contain !important;
              }
              .kg-image-card, .kg-gallery-card {
                max-width: 100% !important;
                width: 100% !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              .post-content, .gh-content {
                max-width: 100% !important;
                overflow-x: hidden !important;
              }
              div, p, span {
                max-width: 100% !important;
              }
            </style>
            <p>I wasn't expecting to be writing about <a href="https://www.junkcharts.com/tag/vibe-coding" rel="noreferrer">vibe coding</a> so soon again. But here I am.</p><p>Two blog posts ago (<a href="https://www.junkcharts.com/notes-on-vibe-coding-2" rel="noreferrer">here</a>), I felt quite satisfied that I have managed to migrate some blog posts from Typepad to Ghost using a piece of code written completely by <a href="https://www.junkcharts.com/tag/AI" rel="noreferrer">AI</a> (albeit with my steering). I haven't read the code itself.</p><p>As I geared up to move larger chunks of the blog over, I started to notice previously-unknown problems. These issues all necessitated updates to the code. In my vibe-coding experiment, I'd feed the anomalies back to the AI coder, and when it gets stuck, I'd steer it out of trouble.</p><p>It actually got stuck more often than I'd like it to. It seemed to perform better if it's given full reign to write on a blank slate but it is quite ineffective when the starting point is a functional program, and it is asked to make small tweaks to fix specific issues. </p><hr><p>A small side track to think about the nature of bug fixing. Since the previous code is functional, an important objective is "if it ain't broke, don't fix it". Touch it lightly, to reduce the chance of creating even more problems. So, I cringe every time the reasoning steps include unsolicited "optimization": stuff like "I see that your current code for doing X is not efficient, and I am going to fix it by..." makes my stomach churn.</p><p>A particularly epilectic moment occurred when the AI coder decided to change the parameter of my main function from "force-tags" to "forced-tags." As a result, when I ran the corrected function with the previous set of parameters, it popped a syntax error. Why on earth - or in the multiverse - would it do that? (Ironically, when the original AI coder wrote "force-tags" instead of the more grammatical "forced-tags", I cringed but suppressed my urge to "fix" it, by which I mean, to break it.)</p><hr><p>The first big problem I encountered was missing posts. As it turned out, those posts weren't actually missing; they were lost in the crowd, so to speak.</p><p>I'm going to migrate 19 years of posts in stages. That's because I'm pretty sure there are unknown problems that would pop up so I am starting with small batches; at some point, when I have sufficient confidence, I will move large chunks of posts all at once.</p><p>Step 1 is to use some criteria to extract a subset of posts to migrate over. Today, I selected about 200 posts. Step 2 is to find all the images on those 200 posts, rename those images using my indexing strategy (covered <a href="https://www.junkcharts.com/notes-on-vibe-coding-2/" rel="noreferrer">here</a>), and upload these images to Ghost. </p><p>The heretofore satisfactory code failed to find all 200 blog posts. I was missing images from about 10 posts (after excluding those posts that did not contain images). That's very odd since when I look into the input files, I definitely see the 10 posts and the associated images, with their customized file names.  </p><p>I will spare you this journey because I wasted a few hours while GPT5 came up with seemingly endless, useless ideas, after which it still had not a clue what was going on. It is one inexhaustible fount of throw-at-the-wall stuff. Nothing sticks. </p><p>During this slow crawl, I discovered that OpenAI sneakily switched my model to "GPT5 not thinking". I'm calling them out here. I was using GPT5 Thinking from the start. I suppose they didn't like the amount of work I was throwing at it recently, and decided to quietly unburden themselves. This, I believe, explains some of the incompetency I encountered today, versus previous work.</p><p>At this point, the AI coder and I had become a team. I was running diagnosis tests on the side. What I discovered: if I pulled those unmatched URLs from the error log, putting them in their own file, and ran the same code on this much smaller set of links, the AI code managed to find those 10 posts, and pull the images out.</p><p>This is terribly confusing. But that little test gave me life. I abandoned the effort to fix the code. I just divided the posts into two groups, and processed them successively. Problem solved.</p><p>In the meantime, the AI coder wasn't giving up. It threw out even more suggestions for further fixes. Any bets on whether those fixes would work? </p><p>For giggles, these were GPT5's famous last words before I jumped off the ship. </p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2025/09/gpt5_tirelessworker.png" class="kg-image" alt="" loading="lazy" width="1288" height="1022" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/2025/09/gpt5_tirelessworker.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/2025/09/gpt5_tirelessworker.png 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2025/09/gpt5_tirelessworker.png 1288w" sizes="(min-width: 720px) 720px"></figure><p>It "smelled" a rat here, but it smelled raccoon, hyena and Labubu before, none of which was sighted.</p><hr><p>In the next post, I'll cover another unexpected problem.</p><p></p><p></p>
          ]]></content:encoded>
          <description><![CDATA[ Further adventures into the world of vibe coding ]]></description>
        </item>
        <item>
          <title><![CDATA[ Proof by absurdity ]]></title>
          <link>https://www.junkcharts.com/proof-by-absurdity/</link>
          <guid isPermaLink="false">68d2e4578ad60c0001365d42</guid>
          <category><![CDATA[ evidence ]]></category>
          <dc:creator><![CDATA[ Kaiser Fung ]]></dc:creator>
          <pubDate>Wed, 24 Sep 2025 08:25:46 -0400</pubDate>
          <content:encoded><![CDATA[
            <style type="text/css">
              * { box-sizing: border-box !important; }
              img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                display: block !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              figure {
                margin: 0 !important;
                padding: 0 !important;
                max-width: 100% !important;
                width: 100% !important;
                overflow: hidden !important;
              }
              .kg-image, .kg-image-card img, .kg-gallery-image img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                object-fit: contain !important;
              }
              .kg-image-card, .kg-gallery-card {
                max-width: 100% !important;
                width: 100% !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              .post-content, .gh-content {
                max-width: 100% !important;
                overflow-x: hidden !important;
              }
              div, p, span {
                max-width: 100% !important;
              }
            </style>
            <p>One of Andrew's readers ridiculed a paper published in JAMA, one of the top peer-reviewed journals of medical science in the world, that reported some disarming statistics (<a href="https://statmodeling.stat.columbia.edu/2025/09/22/its-jama-time-junk-science-presented-as-public-health-research/?ref=junkcharts.com#comment-2403779" rel="noreferrer">link</a>).</p><p>The authors claimed that 7% of American adults have been present at a mass shooting involving at least four victims. Further, they estimated that 2% of American adults have been <em>injured</em> at such a shooting.</p><p>Really?</p><p>There are roughly 200 million adults in the U.S. So they say with a straight face that 2% of 200 million = 4 million people have been <em>injured</em> during mass shootings involving 4 or more victims.</p><p>Last year, there were roughly 500 such shootings. If the average such event injured 100 people (follow along just for laughs, now), that's 50,000 injuries in a year.  We'd have to accumulate 80 years of numbers to reach 4 million.</p><p>This type of thinking helps data analysts get rid of fringe hypotheses quickly so that they can focus on more promising ones. I don't have a better name for this style of argument. A proof by absurdity? </p>
          ]]></content:encoded>
          <description><![CDATA[ When top journals publish absurd data ]]></description>
        </item>
        <item>
          <title><![CDATA[ Notes on vibe coding 3 ]]></title>
          <link>https://www.junkcharts.com/notes-on-vibe-coding-3/</link>
          <guid isPermaLink="false">68c9bfedfa93ee000107112d</guid>
          <category><![CDATA[ vibe coding ]]></category>
          <dc:creator><![CDATA[ Kaiser Fung ]]></dc:creator>
          <pubDate>Mon, 22 Sep 2025 13:23:26 -0400</pubDate>
          <content:encoded><![CDATA[
            <style type="text/css">
              * { box-sizing: border-box !important; }
              img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                display: block !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              figure {
                margin: 0 !important;
                padding: 0 !important;
                max-width: 100% !important;
                width: 100% !important;
                overflow: hidden !important;
              }
              .kg-image, .kg-image-card img, .kg-gallery-image img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                object-fit: contain !important;
              }
              .kg-image-card, .kg-gallery-card {
                max-width: 100% !important;
                width: 100% !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              .post-content, .gh-content {
                max-width: 100% !important;
                overflow-x: hidden !important;
              }
              div, p, span {
                max-width: 100% !important;
              }
            </style>
            <p>In this third post of the series about vibe coding, I reflect on my experiment, and speculate about its future. (The previous two posts are <a href="https://www.junkcharts.com/numbersruleyourworld/2025/09/notes-on-vibe-coding-1.html" rel="noreferrer">here</a>, and <a href="https://www.junkcharts.com/numbersruleyourworld/2025/09/notes-on-vibe-coding-2.html" rel="noreferrer">here</a>.)</p><p>In particular, will non-coders be able to "write code"?</p><p>It's obvious that for certain projects, it is already possible for a non-coder to obtain functional code via an <a href="https://www.junkcharts.com/tag/AI" rel="noreferrer">AI</a>. The prerequisites are an ability to articulate what one wants, and ample patience, because for now, some degree of steering is required. For truly complicated applications, I'm not sure it's there yet.</p><p>It is indeed possible to imagine a world requiring less steering, which implies that the AI coders will have developed even better sense of what the user might be looking to do. For example, there may come a day when the AI would devise an image indexing strategy on its own, obviating the need for me to prompt it.</p><p>Let's ponder what that world would look like. The user asks to archive all the blog posts at a website, informing that AI coder that the posts are filled with images, and it's important to match images to their respective posts. The AI coder figures out the image indexing strategy, plus the directory structure, plus the anti-blocking techniques, and produces functional code that requires no further steering.</p><p>This future world looks very familiar! It's the world of <a href="https://www.junkcharts.com/tag/software" rel="noreferrer">software</a> as we know it. When we execute a find and replace within a Word document, what happens? Behind the scenes, the application executes code that finds the word and replaces it, repeating these operations until the entire document is read through. The key words are "behind the scenes." When we use Word, we don't think about the code that forms Word.</p><p>All of software is code but most of the time, users don't see or notice any code.</p><p>I think that's the world we're heading towards. Right now, the framing of the issue is a bit off-kilter. Non-coders don't want to write code, read code, or think about code. They just want to things done.</p><p>The ideal <a href="https://www.junkcharts.com/tag/UI" rel="noreferrer">interface</a> for this future is not a chatbot. It's something that accepts <a href="https://www.junkcharts.com/tag/natural-language" rel="noreferrer">natural language</a> prompts, and then delivers the results the user is seeking. This user experience is similar to running any command within an application like Word or Excel. It isn't one in which the user takes an action, expecting to receive a piece of code that the user then executes in order to obtain outputs.</p><p>***</p><p>This future world is also different from the world of Word, Excel, etc. in two fundamental ways.</p><p>First, the software is constructed in <a href="https://www.junkcharts.com/tag/real-time" rel="noreferrer">real time</a>. In the old world, Microsoft engineers have written the find-and-replace code once, and every time a user clicks the command, that same code runs. In the new world, when the user issues the prompt, the AI composes the code on demand, and then executes it.</p><p>This shift to real-time has major implications. Software becomes more flexible and customizable. In the old world, the find-and-replace function only admits minor variations, such as whether to match case or not. The user can't ask for some wrinkle that wasn't pre-conceived and suggested by the software developer. In this <a href="https://www.junkcharts.com/tag/AI" rel="noreferrer">AI</a> world that I imagine, the user can request a find-and-replace operation for "apple" that only applies if the "apple" in a sentence refers to the fruit. This is possible because the code is written in real time at the user's prompting.</p><p>My find-and-replace code will be different from yours because we issue different requirements.</p><p>This flexibility comes at a cost. The behavior of <a href="https://www.junkcharts.com/tag/software" rel="noreferrer">software</a> will become more variable. Even if we both want the same find-and-replace, the AI code will likely be somewhat different, which means there is a good chance that the outputs might vary. In the old world, by contrast, the outputs must be the same since it's the same piece of code. I suspect that the loss in <a href="https://www.junkcharts.com/tag/reliability" rel="noreferrer">reliability</a> will be tolerated in many applications.</p><p>Another change in this new world is how users communicate with their software. In the old world, it's all buttons, menus, and links. To accommodate customizable software, the new <a href="https://www.junkcharts.com/tag/UI" rel="noreferrer">interface</a> must let users articulate what they want to get done. A <a href="https://www.junkcharts.com/tag/natural-language" rel="noreferrer">natural-language</a> interface is the answer, and large-language models are perfect for this purpose.</p><hr><p>If the point of vibe coding is to let AI do all of the coding, then it's inevitable the AI has to take control over our computers. We would effectively have to make the AI a "super-user" on our computers, with rights to edit, create and delete files; install software; etc. This inevitably creates risks over <a href="https://www.junkcharts.com/tag/privacy" rel="noreferrer">privacy</a> and <a href="https://www.junkcharts.com/tag/cybersecurity" rel="noreferrer">security</a>.</p><p>In my experiment, the AI didn't directly run any code on my computer. I downloaded each script and ran it myself. Even in this mode, I assumed some risk because I didn't read the code. It'd have been better to pass the code first through some kind of malware detector. Besides, the potential harm could also become from bad code, rather than malice, which is even harder to prevent.</p><p>***</p><p>In conclusion, vibe coding places the attention on coding but what is really innovative about this new AI world of coding is that we are coming closer and closer to software that can be customized and written in real time, and then executed behind the scenes to deliver outputs to users. The key difference in user experience we'll feel is the ability to use natural language to describe what we want to get done, and because of the new flexibility, the scope of what can be done is vastly expanded.</p><p>Meanwhile, expect the software to be less reliable, and even more insecure.</p>
          ]]></content:encoded>
          <description><![CDATA[ Will non-coders be able to &quot;write code&quot;? ]]></description>
        </item>
        <item>
          <title><![CDATA[ Blog Migration Update Thread (ongoing) ]]></title>
          <link>https://www.junkcharts.com/blog-migration-update-thread/</link>
          <guid isPermaLink="false">68c97dd8d53a670008695fe0</guid>
          <category><![CDATA[ News ]]></category>
          <dc:creator><![CDATA[ Kaiser Fung ]]></dc:creator>
          <pubDate>Mon, 22 Sep 2025 11:10:00 -0400</pubDate>
          <content:encoded><![CDATA[
            <style type="text/css">
              * { box-sizing: border-box !important; }
              img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                display: block !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              figure {
                margin: 0 !important;
                padding: 0 !important;
                max-width: 100% !important;
                width: 100% !important;
                overflow: hidden !important;
              }
              .kg-image, .kg-image-card img, .kg-gallery-image img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                object-fit: contain !important;
              }
              .kg-image-card, .kg-gallery-card {
                max-width: 100% !important;
                width: 100% !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              .post-content, .gh-content {
                max-width: 100% !important;
                overflow-x: hidden !important;
              }
              div, p, span {
                max-width: 100% !important;
              }
            </style>
            <p>Yes! You have found the new location of <strong>Junk Charts</strong>, the long-running blog by <a href="https://www.kaiserfung.com/?ref=junkcharts.com" rel="noreferrer">Kaiser Fung</a> on data, graphs, and AI. Things will be up and running here shortly, but you can <a href="#/portal/">subscribe</a> in the meantime if you'd like to stay up to date and receive emails when new content is published.</p><p>Please use the comments section to report any issues you encounter.</p><p>If there are specific posts that you'd like to be migrated sooner, please either mention them below in a comment, or <a href="https://www.kaiserfung.com/kaiser-fung-contact?ref=junkcharts.com" rel="noreferrer">contact</a> me directly.</p><hr><p><em>List of known, not-yet-resolved issues (10/5)</em>: primary tag associated with each post, image size in RSS, migrating old comments, broken links and images inside posts because of partial migration.</p><hr><p><strong>Nov 6, 2025</strong></p><p>It appears that someone has now "parked" my old Typepad URL, and falsely put up a message saying Junk Charts is "closed for business." The old Typepad site is closed but I'm alive and well here.</p><p>Moved some blog posts for Ray Vella's class (<a href="https://www.junkcharts.com/tag/ray-vella/" rel="noreferrer">here</a>). </p><p>Belatedly realized that my own <a href="https://www.kaiserfung.com/?ref=junkcharts.com" rel="noreferrer">website</a> has outdated links to Typepad so I moved those blog posts as well. You can browse the collection <a href="https://www.kaiserfung.com/kaiser-fung-in-the-press?ref=junkcharts.com" rel="noreferrer">here</a>.</p><hr><p><strong>Oct 20, 2025</strong></p><p>Posts from 2024 have arrived. </p><hr><p><strong>Oct 8, 2025</strong></p><p>All 2025 posts on the book blog have now been replicated on Ghost. </p><hr><p><strong>Oct 7, 2025</strong></p><p>Created a new <a href="https://www.junkcharts.com/what-happened-to-junkcharts-typepad-com/" rel="noreferrer">page</a> that documents how to turn old Typepad links to new Ghost links. </p><hr><p><strong>Oct 5, 2025</strong></p><p>All Junk Charts posts from 2025 have now been replicated on Ghost. </p><p>Unlike Typepad, Ghost selects one of the tags on each post as the "primary" tag. The primary tag is set to the first tag on the post. As a result, on the many posts, the primary tag may look fishy. Fixing this aspect will take a long time.</p><hr><p><strong>Oct 3, 2025</strong></p><p>A couple of bug fixes on the <a href="https://www.junkcharts.com/blogging-since-2006" rel="noreferrer">Archive</a> and <a href="https://www.junkcharts.com/posts-by-keywords" rel="noreferrer">Keywords</a> pages, including broken images, and the pagination footer.</p><p>Added a default image for posts without images. </p><p>Previously, links to Typepad posts are redirected to Typepad because only a small subset of posts have been replicated in Ghost. Now that Typepad pages no longer exist, those links are transformed into Ghost-style URLs. There should be no more links to Typepad posts. This means that there may be some broken links because some old blog posts have not yet been migrated.</p><hr><p><strong>Sept 27, 2025</strong></p><p>More posts migrated. About 220 posts on the new site now, still a lot more to come.</p><hr><p><strong>Sept 26, 2025</strong></p><p>Disabled infinite scrolling for non-mobile devices. It now shows six posts per page, and you must click to see other pages. On mobile, infinite scrolling is enabled, as per usual practice.</p><p>Launched <a href="https://www.junkcharts.com/posts-by-keywords/" rel="noreferrer">Posts by Keywords</a> collection. You can click on any keyword and see all posts about that topic. The link to it is on the top navigation, to the left of the blog name, and may be hidden in the ... menu.</p><hr><p><strong>Sept 25, 2025</strong></p><p>Migrated about 200 posts today. It's going at a deliberately slow pace because I'm still refining the migration code to make sure the regenerated posts require as little manual modification as possible. Some posts will have broken links if they point to other posts that haven't yet been posted. </p><p>While broken links may be expected, broken images indicate problems. If the original post contains images, and it has reappeared, the images should have followed the post. Please report any broken image under comments.</p><p>I'll blog about today's work in more detail. Some of the unanticipated issues that forced fixes to the blog migration code included: unexpected Typepad elements such as "PING" (which is an abandoned community feature from the past); scraped images that were html pages with embedded images; working around default author names; and the weaknesses of current AI models when asked to fix specific bits of a large piece of code.</p><p>If you've been here for a few days, you'll notice I'm also tweaking the top banner.</p><p>In addition, in the top nav bar, on the left side, possibly hidden under the ... menu, you will find a <a href="https://www.junkcharts.com/blogging-since-2006/" rel="noreferrer">page</a> of all blog posts arranged by year.</p><hr><p><strong>Sept. 22, 2025</strong></p><p>About a week till Typepad shuts down. Ghost site goes live. Only some test posts have migrated using the AI-generated code obtained from my vibe-coding experiment (documented <a href="https://www.junkcharts.com/numbersruleyourworld/2025/09/notes-on-vibe-coding-1.html" rel="noreferrer">here</a>). With the same code, I should be able to migrate all old posts in bulk once I get other issues sorted out. Am enjoying my experience working with Ghost so far. </p><p>For those readers who reached out, there is already an <a href="https://www.junkcharts.com/rss/" rel="noreferrer">RSS</a> feed. The sizing of the images in the feed is a known issue. If you see other problems, let me know in the comments.</p><p>The current routing scenarios: any link to Typepad is still served by Typepad; any link to www.junkcharts.com is served by Ghost. Any link to www.junkcharts.com that starts with /junk_charts/ or /numbersruleyourworld/ is re-directed from Ghost to Typepad (because most posts have not yet been moved). </p><p>If an old post has been migrated (e.g. <a href="https://junkcharts.typepad.com/junk_charts/2025/08/reflection-on-two-design-quirks.html?ref=junkcharts.com" rel="noreferrer">https://junkcharts.typepad.com/junk_charts/2025/08/reflection-on-two-design-quirks.html</a>), the corresponding post is found at <a href="https://www.junkcharts.com/reflection-on-two-design-quirks/" rel="noreferrer">https://www.junkcharts.com/reflection-on-two-design-quirks/</a>.  So the rule is take the name of the post (the part prior to .html after the date of the post), add it to the end of www.junkcharts.com.</p><p></p><p></p>
          ]]></content:encoded>
          <description><![CDATA[ Updates on the status of blog migration ]]></description>
        </item>
        <item>
          <title><![CDATA[ A second look at that axis ]]></title>
          <link>https://www.junkcharts.com/a-second-look-at-that-axis/</link>
          <guid isPermaLink="false">68cd87398c1b4f00016aebd7</guid>
          <category><![CDATA[ Axis ]]></category>
          <dc:creator><![CDATA[ Kaiser Fung ]]></dc:creator>
          <pubDate>Fri, 19 Sep 2025 08:01:00 -0400</pubDate>
          <content:encoded><![CDATA[
            <style type="text/css">
              * { box-sizing: border-box !important; }
              img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                display: block !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              figure {
                margin: 0 !important;
                padding: 0 !important;
                max-width: 100% !important;
                width: 100% !important;
                overflow: hidden !important;
              }
              .kg-image, .kg-image-card img, .kg-gallery-image img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                object-fit: contain !important;
              }
              .kg-image-card, .kg-gallery-card {
                max-width: 100% !important;
                width: 100% !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              .post-content, .gh-content {
                max-width: 100% !important;
                overflow-x: hidden !important;
              }
              div, p, span {
                max-width: 100% !important;
              }
            </style>
            <p>In the book blog (<a href="https://www.junkcharts.com/numbersruleyourworld/2025/09/startling-stat-of-the-day.html">link</a>), I wrote about this Bloomberg chart. Found <a href="https://www.bloomberg.com/news/articles/2025-09-16/top-10-of-earners-drive-a-growing-share-of-us-consumer-spending?ref=junkcharts.com">here</a>.</p><p>At the end of that post, I noted the unusual labeling in the time <a href="https://www.junkcharts.com/junk_charts/axis">axis</a>. I had to take a second look.&nbsp;</p><p>What first unsettled me was the sudden gap in labels at the right side between Q3 2020 and Q2 2025. Then, I looked back on the rest of the labels, and on first glance, it didn't seem like those time intervals were even.</p><p>Now, I have pulled out my ruler, and measured everything - and phew, they spaced the tick marks appropriately. Not too surprising, given it's a Bloomberg graphics piece.</p><p>The graph has 13 ticks and 12 intervals between ticks. Half (6) of those intervals span 11 quarters, three of them 12, and one of them 10. That leaves the last and widest interval which accounts for 19 quarters.</p><p>I can't figure out why they wouldn't use evenly spaced labels. The wiggles in the line suggest that they have the data for every quarter.&nbsp;</p><p>I'm also mystified by the decision to omit labels between 2000 and now.</p><p>Maybe it's because there isn't enough space because 19 = 11 + 8. Why not extend the axis line a little so there is a little additional whitespace but then the omitted label will fit?</p><p>I like this theory - now I think the reason why some intervals are 11 quarters and some are 12 quarters is that tick placement is dictated by getting the labels to fit.&nbsp;</p><p>Do you have a better theory?</p>
          ]]></content:encoded>
          <description><![CDATA[ Why are there gaps in the labels? ]]></description>
        </item>
        <item>
          <title><![CDATA[ Notes on vibe coding 2 ]]></title>
          <link>https://www.junkcharts.com/notes-on-vibe-coding-2/</link>
          <guid isPermaLink="false">68c9a3c6421edc00015760b7</guid>
          <category><![CDATA[ AI ]]></category>
          <dc:creator><![CDATA[ Kaiser Fung ]]></dc:creator>
          <pubDate>Tue, 16 Sep 2025 14:52:17 -0400</pubDate>
          <content:encoded><![CDATA[
            <style type="text/css">
              * { box-sizing: border-box !important; }
              img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                display: block !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              figure {
                margin: 0 !important;
                padding: 0 !important;
                max-width: 100% !important;
                width: 100% !important;
                overflow: hidden !important;
              }
              .kg-image, .kg-image-card img, .kg-gallery-image img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                object-fit: contain !important;
              }
              .kg-image-card, .kg-gallery-card {
                max-width: 100% !important;
                width: 100% !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              .post-content, .gh-content {
                max-width: 100% !important;
                overflow-x: hidden !important;
              }
              div, p, span {
                max-width: 100% !important;
              }
            </style>
            <p>This post continues the prior post (<a href="https://www.junkcharts.com/numbersruleyourworld/2025/09/notes-on-vibe-coding-1.html">link</a>) about my blog archive project.</p><p>In view of the impending shutdown of Typepad, I want to "scrape" my own blog so that I can keep a complete archive of several thousands of image-heavy blog posts going back almost 20 years. It seems like the right project to test "<strong>vibe coding</strong>," which is an AI hype of the week. Vibe coding promises to make it possible for businesses to replace human coders with AI coders, and also promises to make it possible for non-coders to write code.</p><p>At the end of my previous post (<a href="https://www.junkcharts.com/numbersruleyourworld/2025/09/notes-on-vibe-coding-1.html">link</a>), I ran the first piece of code written by GPT. I had read GPT's description of what its code does, and hadn't seen anything troubling. Notably, I did not read the code before running it. That, to me, is the essence of vibe coding.</p><hr><p>You have come across that AI magic story if you are on any social media. Someone writes down a prompt, and then magically, AI delivers a perfect piece of code, one that works out of the box.</p><p>Did my GPT5 code work just like that? Funny you asked.</p><p>The code ran without errors but it didn't produce anything useful. What does this mean? It created the entire file structure with one folder per blog post, as intended. All folders were found empty. Hmmm.</p><p>I relayed this discovery to the AI coder. It pinpointed the problem: it had mistakenly assumed that the Typepad export file references each post's URL as "URL" but in fact, the name of the reference is "UNIQUE URL". It then fixed its own code, and offered a revised file.</p><p>I ran the revised code; it finished without error, and this time, the folders were populated with data.</p><hr><p>At some point during the above process, I concocted a different way of organizing the data. Instead of having thousands of folders in the directory, I'd set up a single folder to hold all the images. The key is to assign a unique number to each image, and also to associate each image number to the pertinent blog post.</p><p>I sketched out how I'd like to set up the image indexing scheme and the new directory structure, and issued a new prompt. GPT responded with a new script that implements these ideas.</p><p>This script also ran without errors. Again, the first attempt was only partially successful. When I opened the process tracker, I found that only about half of the blog images were successfully captured.</p><p>I learned that some of the image links grabbed from the HTML code were not really what they appeared to be. For example, some links pointed to Amazon-generated pages for my books, which had expired, but in any case, not images that I want to keep. There were also other links that encountered various HTTP error codes.</p><p>At this point, I explicitly asked GPT to contend with blocking technology as indicated by the HTTP 403 errors (forbidden). Even though the AI knew from the start that 403s could be an issue, the initial code did not include any counter-measures. With each new report of blocked URLs, the AI codes now added another layer of code that executed a specific anti-blocking tactic.</p><p>Other refinement was necessary. At first, the AI coder ignored my instruction to set each image's name to the image index - it sometimes retained the original name. Next, when it switched the name to the image index, it dropped the suffix (.jpg, .png, etc.). The chatbot interface proves very convenient for steering the AI coder and fixing these minor issues.</p><hr><p>At one point, I jumped ship to another AI coder, Claude. That was when GPT got twisted around like a pretzel. I was then starting to encounter coding errors. As usual, I relayed the errors to GPT: it kept telling me it had fixed the problem when the offending code was still there. Now, I had two AIs running side by side. GPT is still the main code generator; I no longer took the GPT code and ran it directly - I passed it to Claude, which checked for the same coding error, and if present, fixed it.</p><p>It turns out that current AI coders may have a habit of falling into such traps. For a different project, for which I used Claude as the main code generator, it got stranded in a corner where it kept telling me an offending line of code has been removed, when the new file clearly still contains it. So I had to fire up GPT to get a lift out of that dark corner.</p><hr><p>I'm still amazed by how much working code was produced. At the end, I obtained code that ran through the process of setting up the directory structure, and populating it the way I wanted to. The image index worked as expected, tying each image to the blog post it belonged to.</p><p>And I haven't read a single line of code.</p><p>That's <strong>vibe coding</strong>. The user does have to steer the AI coder in the right direction, and correct the course as needed but as demonstrated here, I didn't have to rewrite any code myself.</p><p>In the next post, I'll discuss where I think this is all heading. Is it true that non-coders will use AI to write code?</p>
          ]]></content:encoded>
          <description><![CDATA[ I continue a vibe-coding experiment. Did the AI-written code run? ]]></description>
        </item>
        <item>
          <title><![CDATA[ Notes on vibe coding 1 ]]></title>
          <link>https://www.junkcharts.com/notes-on-vibe-coding-1/</link>
          <guid isPermaLink="false">68d30b2f8ad60c0001365dac</guid>
          <category><![CDATA[ vibe coding ]]></category>
          <dc:creator><![CDATA[ Kaiser Fung ]]></dc:creator>
          <pubDate>Wed, 10 Sep 2025 10:10:00 -0400</pubDate>
          <content:encoded><![CDATA[
            <style type="text/css">
              * { box-sizing: border-box !important; }
              img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                display: block !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              figure {
                margin: 0 !important;
                padding: 0 !important;
                max-width: 100% !important;
                width: 100% !important;
                overflow: hidden !important;
              }
              .kg-image, .kg-image-card img, .kg-gallery-image img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                object-fit: contain !important;
              }
              .kg-image-card, .kg-gallery-card {
                max-width: 100% !important;
                width: 100% !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              .post-content, .gh-content {
                max-width: 100% !important;
                overflow-x: hidden !important;
              }
              div, p, span {
                max-width: 100% !important;
              }
            </style>
            <p>Large language models (LLMs) have evolved to the point where they become useful tools for writing computer code.</p><p>Recently, as a result of Typepad's impending shutdown (<a href="https://www.junkcharts.com/numbersruleyourworld/2025/09/important-announcement.html">link</a>), I found a perfect opportunity to explore "vibe coding" using LLMs. Vibe coding can mean many things to many people so let me just define what I mean by vibe coding. It is a hands-off process of coding, in which the human's role is limited to steering and guiding. The human coder is not actually writing code; in fact, the coder isn't even reading code. In the end, the code is entirely written by AI.</p><p>(The million-dollar question: does this mean vibe coding can be done by someone who knows no coding at all? I'll come back to this question at the end.)</p><p>What is vibe coding&nbsp;<u>not</u>? My definition deliberately excludes LLMs as "StackExchange on steriods". StackExchange has been a super-useful Q&amp;A website in which developers ask questions to other developers who supply answers, frequently filled with insights, extensions, code fragments, and commentary. Not surprisingly, StackExchange data were used to train LLMs (<a href="https://www.businessinsider.com/chatgpt-developers-stack-overflow-upset-knowledge-improve-chatbot-openai-altman-2024-5?ref=junkcharts.com">link</a>). Therefore, if one has a coding question today, one can ask the LLM, instead of searching for the answer on StackExchange. The AI has effectively "read" the relevant StackExchange posts, and responded with key information. Tools like Co-Pilot makes it possible to do the above without leaving the code editor.</p><p>That's not what I'm exploring in my vibe coding experiment. In this StackExchange on steroids mode, the coder is still in control of the code; the coder has probably written a good portion of it, and the coder most definitely has read everything through. While this path is viable, and valuable, it certainly won't lead to the promised land of enabling a non-coder to produce code.</p><hr><p>Now, let me define my experiment. In view of Typepad's imminent shutdown, I want to archive all my posts, stretching back nearly 20 years. There are several thousand image-heavy posts. The general idea is to "scrape" my own blog. In the Big Data era, scraping has become an everyday skill: this is how Google, and ChatGPT collect data to power its search engine and its AI chatbot respectively. The process of scraping is the sequential loading of large numbers of webpages in order to extract and store the relevant data from each page.</p><p>Scraping code is annoying to write because it requires a deep understanding of the structure of these web pages. Website design differ: consider where the navigation column is placed, how images are interspersed with text, whether there are buttons, forms, popups, etc. etc. For example, if I want to save every image of every blog post, I'd need to delve into the HTML code to decipher how the image tags are organized, and then write custom code to navigate such structure. I'd also be hoping that Typepad hasn't altered this structure during the last 20 years, or else my scraping code has to know what these different structures are, and then try guessing which particular one applies to any given page.</p><p>Web scraping is also somewhat controversial. Many website operators try to block it. Scraping generates fake traffic to websites; scraped pages are loaded but aren't actually read by humans; thus, websites pay service providers, who deliver the pages to visitors across the networks, for fake traffic that does not produce any revenues (ads, product sales, etc.). For this reason, and also to protect their data, most websites impose limits on scraping; some operators even attempt to stop all scrapers by predicting whether a page load request comes from scrapers. (As a general rule, though, the louder company X complains about other people scraping its website, the more likely Company X is actively scraping other websites, busily working around these blocking tactics. Looking at you: Google, Facebook, OpenAi, etc.)</p><p>Why do I think vibe coding might fit this project well? First, since AI models have shown great ability parsing the structure of human speech, they should be able to dissect the HTML code, which adheres to rules that are even more rigid than grammar. Second, AI coders probably have seen a lot of scraping code, since it's such a common activity, so that it should know how to handle blocking adversaries. Both hypotheses would come true; but my journey is just getting started.</p><p>Here is the first prompt I sent to ChatGPT (at the start, I used the recently released GPT 5 Thinking model):</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2025/09/fung_gpt5_blogarchive_firstprompt.png" class="kg-image" alt="" loading="lazy" width="1438" height="1132" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/2025/09/fung_gpt5_blogarchive_firstprompt.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/2025/09/fung_gpt5_blogarchive_firstprompt.png 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2025/09/fung_gpt5_blogarchive_firstprompt.png 1438w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">First vibe-coding prompt</span></figcaption></figure><p>The prompt mostly describes the high-level objective of my project, offering key context that must not be missed (e.g. I run two blogs under one domain name). I didn't mention countering potential blocking or parsing HTML structure because I expect that any self-respecting AI coder knows about these challenges. I include context specific to my scraping request. One such detail is the need to associate images with each post. It would be a nightmare if I end up with an image folder containing thousands of files, untethered from the text, so I suggested in the prompt to make a new folder for each blog post. I'm curious whether the AI coder will heed this advice, or will it recommend a better way of linking up images and text after divining the motivation for this special request?</p><p>The response looks very promising. GPT returns a file with code inside (a python "script"), and also provides instructions for how to run the script.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2025/09/fung_gpt4_blogarchive_howtorun.png" class="kg-image" alt="" loading="lazy" width="1280" height="688" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/2025/09/fung_gpt4_blogarchive_howtorun.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/2025/09/fung_gpt4_blogarchive_howtorun.png 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2025/09/fung_gpt4_blogarchive_howtorun.png 1280w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">How to run code</span></figcaption></figure><p>Time out to think about where we are.</p><p>If you don't know any coding at all, you'd have some difficulty following those instructions. Where are you supposed to type in those commands? What is this "bash"? You probably don't have the "pip" program installed either. If the program succeeded, from where do you fetch the output? GPT actually tries to help by saying "./archive/junk_charts/..." but a non-coder would be able to decipher those words! It's not hard to pick up these concepts but you do have to learn them.</p><hr><p>I'd like to return your attention to the first prompt shown above, in which I also clipped the top part of the AI's response. The section you can see explains how the scraper will nevigate my blog. Remember that it has to visit every blog post sequentially. While researching my prompt, the AI visited my blog and discovered that old posts are grouped by month, with all posts published in a given month aggregated on the same monthly page. There exists also a top-level index page, called archive.html, that contains links to each monthly page. Thus, the scraper first visits that index page, and using it as a map, it loads each monthly page, and on each monthly page, it extracts the required text and images. This scraping strategy makes sense to me.</p><p>Elsewhere in that GPT response, I noticed mention of "rate limiting" and a possible "retry" mechanism, so the AI is definitely "aware" of potential blocking. Therefore, both my hypotheses came true - I didn't have to include these "obvious" items in my prompt.</p><p>In the first prompt, I asked GPT to build a testing mode so I can run the code on one month's worth of posts before rolling it out to thousands of posts. GPT made this testing mode as requested.</p><p>After reading the rationale of the GPT response, I don't have any complaints. So I downloaded the script, and ran the code.</p><p>[to be continued]</p><p>P.S. [9/23/2025]&nbsp;<a href="https://www.junkcharts.com/notes-on-vibe-coding-2.html">Part 2</a>&nbsp;and&nbsp;<a href="https://www.junkcharts.com/notes-on-vibe-coding-3.html">Part 3</a>&nbsp;are now posted. Remember to change your bookmarks to https://www.junkcharts.com.</p>
          ]]></content:encoded>
          <description><![CDATA[ Kaiser tries vibe coding to archive his entire blog. How did it go? ]]></description>
        </item>
        <item>
          <title><![CDATA[ MTA lowers revenues while upping stress on some commuters ]]></title>
          <link>https://www.junkcharts.com/mta-lowers-revenues-while-upping-stress-on-some-commuters/</link>
          <guid isPermaLink="false">68e697c1a6f93c000172b3d7</guid>
          <category><![CDATA[ Analytics-business interaction ]]></category>
          <dc:creator><![CDATA[ Kaiser Fung ]]></dc:creator>
          <pubDate>Thu, 28 Aug 2025 20:00:00 -0400</pubDate>
          <content:encoded><![CDATA[
            <style type="text/css">
              * { box-sizing: border-box !important; }
              img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                display: block !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              figure {
                margin: 0 !important;
                padding: 0 !important;
                max-width: 100% !important;
                width: 100% !important;
                overflow: hidden !important;
              }
              .kg-image, .kg-image-card img, .kg-gallery-image img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                object-fit: contain !important;
              }
              .kg-image-card, .kg-gallery-card {
                max-width: 100% !important;
                width: 100% !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              .post-content, .gh-content {
                max-width: 100% !important;
                overflow-x: hidden !important;
              }
              div, p, span {
                max-width: 100% !important;
              }
            </style>
            <p>In the previous <a href="https://www.junkcharts.com/omnys-math-problem/">post</a>, I looked into the new "fare-capping" scheme offered by OMNY for frequent public transit commuters in NYC. It's a mind-blowingly complicated solution to a math problem. The previous 7-day pass is much simpler.</p><p>I believe the switch from swipe cards to OMNY leaks revenues, while also incurring costs of implementation. Therefore, it's a weird decision on the MTA's part.</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/11284_001.jpg" class="kg-image" alt="Omny-mta-ny" loading="lazy" title="Omny-mta-ny" width="500" height="400"></figure><p>In this post, I explore the economics.</p><p>I will focus on the subgroup of frequent commuters. If you aren't going to do 12 or more rides per week, this issue is moot.</p><p>For the super-frequent commuters, who <em>reliably</em> make more than 12 trips every week, getting the 7-day pass is a no brainer. The switch to fare capping means they don't pay upfront. This isn't much of a benefit though since by the end of the week, they would have paid $34 in both cases. Because of the terrible user interface (see previous post <a href="https://www.junkcharts.com/omny-needs-a-facelift/">here</a>), these commuters would have to check their transaction logs to confirm that fares were capped at $34. In fact, verifying the cap requires an accounting degree, as it is hard to establish the start and end of each 7-day period (see the last <a href="https://www.junkcharts.com/one-solution-to-omnys-math-problem/">post</a>). I'd argue that the experience for these super-frequent commuters has worsened slightly.</p><p>A segment of these super-frequent commuters enjoys an unexpected pleasant surprise: they will see their commuting expenses decrease under fare-capping. They are those who didn't care enough to get the 7-day passes in the past; now, their weekly spend is automatically capped at $34. Thus, in the super-frequent commuters segment, the MTA collects less revenues after switching to fare-capping.</p><p>***</p><p>The most interesting group is the commuters whose average ride frequency is around 12. Under the 7-day pass system, they make a buy-or-not decision every seven days. åAfter they purchase passes, they likely will adjust their behavior, using the bus or subway for shorter trips, in order to maximize the value of the passes. This optimizing behavior enhances the perceived value of the 7-day pass. In some weeks, if they fail to hit the 13-ride minimum, they may overpay relative to pay-per-ride.</p><p>Under fare capping, these commuters don't pay upfront. If they end up taking fewer than 13 rides, the total charge will be the same as pay-per-ride. If the ride frequency exceeds 12, the total is capped at $34. So, the risk of overpaying is eliminated. The other side of the coin is that the MTA is denied these overpayment revenues.</p><p>What they give on one hand, they take from the other. The anxiety over whether or not to buy a pass is replaced by the anxiety over whether or not the next ride is free. For someone who only occasionally exceed 12 rides, it's hard to know when the cap has been exceeded, and if so, when the pertinent 7-day window ends. If commuters don't know for sure they have enough to get the free rides, they won't change their behavior and start taking extra short rides. (These extra rides only enhances the perceived value of the frequent commuter discounting; they don't represent incremental revenues for the MTA.)</p><p>These commuters don't have to make any decisions under fare capping. This can be described as "convenience" but it is served with a dose of poor customer experience. Even those commuters who have benefited are unaware when the cap has kicked in, nor do they get the satisfaction of benefits building up as they take more rides.</p><p>For the MTA, the collected revenues will certainly decline for two reasons: a) super-frequent commuters who didn't take advantage of 7-day passes are now given automatic fare caps; and b) the borderline 7-day pass users have their fares capped during those weeks when they unexpectedly take fewer than 13 trips.</p><p>I'm coming up empty when trying to think of a group of commuters from which the OMNY system generates incremental revenues.</p><p>***</p><p>Ironically, the old way is less stressful. After paying upfront, it is stress-free. Under fare capping, you have to constantly worry about whether you've hit the fare cap or not, and when the 7-day window resets. Even after you've hit the cap, you have to worry about when the current 7-day window ends.</p><p>That's without accounting for the money invested in the OMNY infrastructure. So, the MTA reduces its profits while making lives more complicated for the frequent commuters.</p>
          ]]></content:encoded>
          <description><![CDATA[ An economic analysis of MTA&#39;s switch to OMNY cards ]]></description>
        </item>
        <item>
          <title><![CDATA[ Reflection on two design quirks ]]></title>
          <link>https://www.junkcharts.com/reflection-on-two-design-quirks/</link>
          <guid isPermaLink="false">68ccb7a363f3a70001f6ac16</guid>
          <category><![CDATA[ Bar chart ]]></category>
          <dc:creator><![CDATA[ Kaiser Fung ]]></dc:creator>
          <pubDate>Mon, 25 Aug 2025 05:18:49 -0400</pubDate>
          <content:encoded><![CDATA[
            <style type="text/css">
              * { box-sizing: border-box !important; }
              img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                display: block !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              figure {
                margin: 0 !important;
                padding: 0 !important;
                max-width: 100% !important;
                width: 100% !important;
                overflow: hidden !important;
              }
              .kg-image, .kg-image-card img, .kg-gallery-image img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                object-fit: contain !important;
              }
              .kg-image-card, .kg-gallery-card {
                max-width: 100% !important;
                width: 100% !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              .post-content, .gh-content {
                max-width: 100% !important;
                overflow-x: hidden !important;
              }
              div, p, span {
                max-width: 100% !important;
              }
            </style>
            <p>This post is for paid members only. Log in or subscribe to see it. Thank you members for supporting my work. </p><!--members-only--><p>When I first saw this slanted <a href="https://www.junkcharts.com/junk_charts/bar_chart">column chart</a> on Visual Capitalist (<a href="https://www.visualcapitalist.com/sp/ter01-the-rising-age-of-first-time-home-buyers/?ref=junkcharts.com">link</a>), I feel this may be another case of questionable design distorting data representation.</p><figure class="kg-card kg-image-card"><a href="https://www.junkcharts.com/.a/6a00d8341e992c53ef0303ee30f275200d-pi"><img src="https://www.junkcharts.com/.a/6a00d8341e992c53ef0303ee30f275200d-320wi" class="kg-image" alt="Visualcapitalist_First-Time-Home-Buyers_sm" loading="lazy" title="Visualcapitalist_First-Time-Home-Buyers_sm" width="320" height="420"></a></figure><p>It isn't so simple.</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/2025/09/vc_homeowner_medianage_triangle.png" class="kg-image" alt="" loading="lazy" width="274" height="1192"></figure><figure class="kg-card kg-image-card"><a href="https://www.junkcharts.com/.a/6a00d8341e992c53ef02e860f1e390200b-pi"><img src="https://www.junkcharts.com/.a/6a00d8341e992c53ef02e860f1e390200b-120wi" class="kg-image" alt="Vc_homeowner_medianage_triangle" loading="lazy" title="Vc_homeowner_medianage_triangle" width="62" height="270"></a></figure><p>Slanting the columns does <em>not</em> actually distort the encoding of the data. Take a look at the last column on the right- where the designer drops a perpendicular from the top rung of the ladder to the "floor" of the chart. In so doing, a right-angled triangle has been outlined. The length of the slanted side is the hypothenuse, the length of which is (height)*sin θ where θ is the angle of the slant at the floor. Thus, the ratio of lengths of two slanted sides (x1/x2) = (h1/h2) after the sin θ factor cancels out as each column is given the same slant.</p><p>For this chart, readers are mostly interested in the year-on-year change: on a conventional column chart, this is reflected in the difference in heights between successive columns. Now, (x2-x1) = (h2-h1)*sin θ so the measured difference in the lengths of successive slanted sides is proportional to the measured difference in heights of the columns. The observed ratio is a constant multiple of the actual ratio of the data. For this usage, there is an absolute distortion but not a relative distortion.</p><p>In sum, data distortion is not a strong enough reason to disapprove of the slanting feature.</p><p>***</p><p>I'm also fascinated by the designer's end run around the <a href="https://www.junkcharts.com/junk_charts/axis">start-at-zero rule</a> for <a href="https://www.junkcharts.com/junk_charts/bar_chart">column charts</a>. While not explicitly stated, the floor of each ladder can be thought of as starting at zero. The use of the broken scale essentially resets the scale to start at 18 so the chart in reality starts at 18 rather than 0.</p><figure class="kg-card kg-image-card"><a href="https://www.junkcharts.com/.a/6a00d8341e992c53ef02c8d3db358e200c-pi"><img src="https://www.junkcharts.com/.a/6a00d8341e992c53ef02c8d3db358e200c-320wi" class="kg-image" alt="Vc_homeowner_medianage_brokenaxis" loading="lazy" title="Vc_homeowner_medianage_brokenaxis" width="266" height="191"></a></figure><p>(Such distortion of the data encoding impacts the calculation I did in the section above, because I have assumed starting at zero. But the culprit would then be violating the start-at-zero rule.)</p>
          ]]></content:encoded>
          <description><![CDATA[ Slanting and breaking columns ]]></description>
        </item>
        <item>
          <title><![CDATA[ They won&#x27;t tell you why they did it ]]></title>
          <link>https://www.junkcharts.com/they-wont-tell-you-why-they-did-it/</link>
          <guid isPermaLink="false">68e2cf4d04c09e00018ee8f3</guid>
          <category><![CDATA[ openai ]]></category>
          <dc:creator><![CDATA[ Kaiser Fung ]]></dc:creator>
          <pubDate>Tue, 19 Aug 2025 20:00:00 -0400</pubDate>
          <content:encoded><![CDATA[
            <style type="text/css">
              * { box-sizing: border-box !important; }
              img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                display: block !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              figure {
                margin: 0 !important;
                padding: 0 !important;
                max-width: 100% !important;
                width: 100% !important;
                overflow: hidden !important;
              }
              .kg-image, .kg-image-card img, .kg-gallery-image img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                object-fit: contain !important;
              }
              .kg-image-card, .kg-gallery-card {
                max-width: 100% !important;
                width: 100% !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              .post-content, .gh-content {
                max-width: 100% !important;
                overflow-x: hidden !important;
              }
              div, p, span {
                max-width: 100% !important;
              }
            </style>
            <p>My friend Alberto is cited in this Washington Post article about AI companies committing "chart crimes" (<a href="https://www.washingtonpost.com/technology/2025/08/12/gpt5-chart-crimes-claude-graphs/?ref=junkcharts.com">link</a>; paywalled).</p><p>Let's run through these examples from OpenAI's presentation, from when they launched their GPT5 foundational model.</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/1560_001.png" class="kg-image" alt="Wpost_openai_deceptionchart" loading="lazy" title="Wpost_openai_deceptionchart" width="1132" height="1202" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/archives/1560_001.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/archives/1560_001.png 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/1560_001.png 1132w" sizes="(min-width: 720px) 720px"></figure><p>Why is 50.0 lower than 47.4? The answer is simple!</p><p>It's because lower is better since the metric is "deception rate". (The pink columns represent the latest version of GPT while the white columns represent a prior generation.)</p><p>Our story is: the new GPT5 is much better than our older models, and that's what our chart shows. Is there anything wrong with that?</p><p>***</p><p>Seriously though, I don't buy the idea that this is a screwup by AI. I don't buy that this is vibe graphing.</p><p>To buy that official line, you'd have to accept that no staff member reviewed the slides before this huge announcement, that the CEO of the most famous AI company did not walk through the slides even once before going on camera, that there were no rehearsals for this event, that those people who are responsible for metrics did not double check what they put out to the public, and if anyone even flipped through these slides once, they failed to notice the howler(s).</p><p>That last point. What does it tell you when a company with a boatload of PhDs on staff cannot detect this howler when within seconds of it being shown to the rest of the world, people noticed and mocked it on social media?</p><p>(As far as I can tell, that event was a livestream that presented a scripted demo possibly delivered live but without a live audience.)</p><p>Is it hubris? Is it deliberate? Is it deceptive? I don't know but it's hard to believe it's innocent. It's also distracting as the conversation is focused on the design of the chart, rather than its contents.</p><p>***</p><p>The more notorious example from the same event is this one:</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/1560_002.png" class="kg-image" alt="Wpost_openai_accuracy_chart" loading="lazy" title="Wpost_openai_accuracy_chart" width="1150" height="1272" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/archives/1560_002.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/archives/1560_002.png 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/1560_002.png 1150w" sizes="(min-width: 720px) 720px"></figure><p>Don't worry, the pink parts are definitely higher than the white columns.</p><p>The corrected version is found on OpenAI's blog post <a href="https://openai.com/index/introducing-gpt-5/?ref=junkcharts.com">here</a>.</p><p>***</p><p>Why did they put those howlers out there?</p><p>My best guess? It's an extreme version of "tasting your own medicine". Extreme, in the sense that developers are forbidden from editing the vibe code that came out of GPT.</p>
          ]]></content:encoded>
          <description><![CDATA[ Why do they do the crime? ]]></description>
        </item>
        <item>
          <title><![CDATA[ Simple is not always easy ]]></title>
          <link>https://www.junkcharts.com/simple-is-not-always-easy/</link>
          <guid isPermaLink="false">68e2cf4d04c09e00018ee8f4</guid>
          <category><![CDATA[ Bar chart ]]></category>
          <dc:creator><![CDATA[ Kaiser Fung ]]></dc:creator>
          <pubDate>Thu, 14 Aug 2025 20:00:00 -0400</pubDate>
          <content:encoded><![CDATA[
            <style type="text/css">
              * { box-sizing: border-box !important; }
              img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                display: block !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              figure {
                margin: 0 !important;
                padding: 0 !important;
                max-width: 100% !important;
                width: 100% !important;
                overflow: hidden !important;
              }
              .kg-image, .kg-image-card img, .kg-gallery-image img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                object-fit: contain !important;
              }
              .kg-image-card, .kg-gallery-card {
                max-width: 100% !important;
                width: 100% !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              .post-content, .gh-content {
                max-width: 100% !important;
                overflow-x: hidden !important;
              }
              div, p, span {
                max-width: 100% !important;
              }
            </style>
            <p>This chart is as simple as it gets. It can't get any simpler.</p><p>It's a <a href="https://www.junkcharts.com/tag/bar-chart/">column chart</a> showing a single series of numbers (same store sales growth rate) over successive quarters.</p><p>Somehow, it's not as <em>easy</em> as it gets.</p><p>***</p><p>The designer did a great job channeling my attention to the far right column, which shows the most recent quarter of Q2, 2025. That's because the chart's trying to say something... something about the contrast of the tall gray column and the midget black column.</p><p>I don't see a legend. My first instinct is to think of the gray column as the expected value, and the black column as the realized value so that the gray part is the gap between expectation and reality. But this can't be true.</p><p>It's not true because after that one stellar quarter in Q2, 2021, all the subsequent values have been much lower. It's inconceivable that management would have predicted a return to that earlier performance level for the current quarter.</p><p>Is it possible that the black portion is a partial number while the gray part represents the excess yet to materialize? A common such situation is associated with part-year (realized) versus full-year (projected) values. This can't apply to our chart, either.</p><p>Then, I noticed that the gray column is level with the Q2, 2021 column, which represents the "high water mark" for Cava's historical same-store sales growth (at least for the time window of the chart). Perhaps the point is the comparison of the current quarter to the historical maximum. This theory is usurped when I pull out a ruler to discover that the top of the gray column is in fact a little higher than the Q2, 2021 column!</p><p>To show a reference level, I prefer a line or a symbol. For example:</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/1559_002.png" class="kg-image" alt="Junkcharts_cava_reference1" loading="lazy" title="Junkcharts_cava_reference1" width="355" height="224"></figure><p>or</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/1559_003.png" class="kg-image" alt="Junkcharts_cava_reference2" loading="lazy" title="Junkcharts_cava_reference2" width="1734" height="1056" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/archives/1559_003.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/archives/1559_003.png 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1600/archives/1559_003.png 1600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/1559_003.png 1734w" sizes="(min-width: 720px) 720px"></figure><p>***</p><p>It's odd they chose Q2, 2021 as the reference point. The data from the last few years should have made clear that Cava isn't likely to replicate that level of growth. Indeed, the news that caused a crash in Cava's stock price the other day is that:</p><p>Shares of Cava Group crashed in premarket trading after the Mediterranean fast-casual restaurant chain slashed its full-year same-store sales growth forecast to a maximum of 6%, versus the previous estimate of 8%.</p><p>This line suggests a different reference level: the projected growth previously communicated by Cava's management. Another possibility is the average growth in the same quarter over the last few quarters.</p><p>***</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/1559_004.png" class="kg-image" alt="Junkcharts_cava_axislabels" loading="lazy" title="Junkcharts_cava_axislabels" width="368" height="832"></figure><p>The inclusion of the "outlier" Q2, 2021 value of ~120% made it harder to differentiate the data for the other quarters, all of which were under 50%.</p><p>Also, note the axis labels being placed above, instead of next to, the tick marks. This small design flaw increases the reader's cognitive load significantly. Try figuring out the value of the two columns on the right.</p>
          ]]></content:encoded>
          <description><![CDATA[ Simple design is not always easy ]]></description>
        </item>
        <item>
          <title><![CDATA[ Story-first governing ]]></title>
          <link>https://www.junkcharts.com/story-first-governing/</link>
          <guid isPermaLink="false">68e697c1a6f93c000172b3d8</guid>
          <category><![CDATA[ story-first ]]></category>
          <dc:creator><![CDATA[ Kaiser Fung ]]></dc:creator>
          <pubDate>Mon, 11 Aug 2025 20:00:00 -0400</pubDate>
          <content:encoded><![CDATA[
            <style type="text/css">
              * { box-sizing: border-box !important; }
              img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                display: block !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              figure {
                margin: 0 !important;
                padding: 0 !important;
                max-width: 100% !important;
                width: 100% !important;
                overflow: hidden !important;
              }
              .kg-image, .kg-image-card img, .kg-gallery-image img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                object-fit: contain !important;
              }
              .kg-image-card, .kg-gallery-card {
                max-width: 100% !important;
                width: 100% !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              .post-content, .gh-content {
                max-width: 100% !important;
                overflow-x: hidden !important;
              }
              div, p, span {
                max-width: 100% !important;
              }
            </style>
            <p>In the last decade, "data story-telling" first became a thing, then came the backlash. Some of my colleagues complain that data story-telling becomes more about stories than data. In other words, it morphs into what I've been calling the "<a href="https://www.junkcharts.com/numbersruleyourworld/story-first-1">story-first</a>" mentality towards data. The story-first people decides on a story, then finds the data to support it. That's the opposite of "data-first". The world is over-run by story-firsters.</p><p>The current U.S. government is run by story-firsters. Maybe we shouldn't single them out. It seems like the U.S. government over the last decades have increasingly become more story-first. The most recent actions announced by the President are the most extreme yet.</p><p>First, he fired the head of the Bureau of Labor Statistics (<a href="https://www.nbcnews.com/business/economy/trump-orders-firing-bls-commissioner-weak-jobs-report-rcna222531?ref=junkcharts.com">link</a>), the federal agency that collects and publishes various key official statistics, including the widely-disseminated inflation and unemployment rates. The gravest thing about this firing is the stated reason: the unproven accusation that she "manipulated" the data in order to make the current administration look bad.</p><p>With such reasoning, the next BLS head has to be someone who will publish only data that please the administration! Otherwise, his/her head is next on the chopping block.</p><p>(The story-firsters will say: since this administration's policies are self-evidently beneficial to the U.S. economy, any data not showing this result are flawed, and thus, the BLS head is incompetent! Or think of it this way - the firing is based on someone knowing what the "right" numbers are, and how do they know those?)</p><p>***</p><p>Second, the President announced his wishlist for "reforms" to the U.S. Census, in so doing disclosing that he has little other than surface knowledge about a census.</p><p>His biggest want is to stop counting "illegals". Every time someone wants to stop counting something, you know they have unpure intentions, because for story-firsters, no fate is worse than having to face inconvenient data. (By contrast, for data-firsters, the worst fate is to have no data.)</p><p>The entire problem of illegal immigration goes away when there is no data measuring it. Similarly, if the government don't keep statistics on crime, we can be told there is no crime.</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/11282_002.jpg" class="kg-image" alt="_numbersense_bookcover" loading="lazy" title="_numbersense_bookcover" width="150" height="226"></figure><p>Here too, the story-firsters have gradually gained ground. Chapter 6 of <strong>Numbersense</strong> (<a href="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/11282_002.jpg?ref=junkcharts.com">link</a>) covers details of how the U.S. government computes the unemployment rate. Since the time of Clinton, more and more citizens have entered the rank of the uncounted: they can neither be employed or unemployed, according to the BLS. Nevertheless, none of these dropouts have jobs, they are in fact unemployed (in the everyday sense), so by removing them from both the numerator and the denominator, the unemployment rate improves. It looks better, but it's not because those uncounted people found jobs.</p><p>Not counting illegals means we can't size the problem. We therefore can't properly allocate resources to deal with it, including hiring enough ICE agents to snatch people off the streets, if that is your desired policy.</p><p>***</p><p>There may also be specific metrics that the current U.S. government wants to modify. Any change to a long-running instrument, whether it's altering underlying populations being measured, or changing specificiations, wreaks serious long-term damage. I discussed this issue in Chapter 2 of <strong>Numbersense (</strong><a href="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/11282_002.jpg?ref=junkcharts.com"><strong>link</strong></a><strong>), </strong>as it relates to the movement to replace BMI as the obesity metric.</p><p>An immediate casualty is historical comparison. The power of the Census comes from its history. Because we are measuring the same thing using the same method for a long time, we can describe trends, and anomalies. A sudden shift in the definition of a metrics literally and figuratively breaks the time series, effectively devaluing the currency of all prior work.</p><p>The inside joke is: the new metric is certainly not unbiased, nor above accusation of manipulation, because all metrics are built on top of assumptions, and those who disagree with the assumptions have grounds for bias complaints. It's like dropping everything you own to buy the new house only to discover, after you move in, that while the roof doesn't leak like the old house, the new house is infested with ants.</p><p>***</p><p>The reason for the surge of the story-firsters is covered in Chapter 2 of my book <strong>Numbersense</strong> (<a href="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/11282_002.jpg?ref=junkcharts.com">link</a>) : the perversity of measurement.</p><p>Statistics are great at reflecting the health of something, such as public health, public security, educational achievement, and the economy. It is then tempting to link statistics to performance. This is usually labeled pay-for-performance, and in some quarters, treated as an axiom, something so eminently reasonable that its adoption is beyond skepticism.</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/11282_003.jpg" class="kg-image" alt="_nryw_bookcover" loading="lazy" title="_nryw_bookcover" width="800" height="1288" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/archives/11282_003.jpg 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/11282_003.jpg 800w" sizes="(min-width: 720px) 720px"></figure><p>Anyone who has experienced pay-for-performance knows the issue: there are many ways to "manipulate" the statistics without making real change. In Chapter 1 of <strong>Numbers Rule Your World (</strong><a href="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/11282_003.jpg?ref=junkcharts.com"><strong>link</strong></a><strong>)</strong>, I reported an bottomless pit of methods used by enterprising university administrators to dress up their numbers, leading to better school rankings, without changing the quality of education (and in the worst cases, probably causing a drop in quality).</p><p>The "value-added" movement in education is a poster child for the perversion of measurement. When the salaries and bonuses of teachers and administrators are tied to standardized testing results, there are strong incentives to cheat. These policies are highly effective at spreading the cheating culture from students to staff.</p><p>Machines cheat too. If machines are told to maximize the number of clicks on a display ad, they learn to push a popup that interferes with what users are trying to do, thus generating many unintended clicks. The click metric dutifully reports these "fake" clicks as if they are real. Some humans notice this trickery, and seek to end it by requiring the user to remain on the ad for at least three seconds. The optimizing machines respond by withholding the "skip" button on the popup for three seconds.</p><p>...</p><p>Now, over the years, the Federal Reserve has drifted toward a "pay for performance" posture. Since the 1990s, when Alan Greenspan was Chair, the "dual mandate" of employment and stable prices is being managed by "targets". In recent years, the inflation target of 2% has been repeatedly mentioned.</p><p>The markets then interpret deviations from those targets as bad news. Lately, the administrations view stock prices as a barometer of their economic policies. The Fed is doing a lousy job, we are told, because inflation is higher than 2%, or that the market indices are reacting badly to the latest figures. And now, the President is saying the data collectors are failing when the statistics don't meet his expectations. In addition to firing the BLS head, he has been agitating to remove the Fed Chair.</p><p>I said earlier it's not just this administration. Once the "pay for performance" posture is adopted, the statistics are not just passive observers but active participants. The story-first instinct then rises to the top, encouraged by these incentives. It's a matter of time before the economic indicators point in an undesirable direction (if they never do, they aren't good metrics), and that's when the numbers get warped out of reality. It doesn't have to be blatant cheating. One can always come up with reasonable arguments to support changing assumptions or definitions. Somehow, these changes would always move the statistics in the favored direction!</p>
          ]]></content:encoded>
          <description><![CDATA[ Pay for performance, story-firsters, and trust in statistics ]]></description>
        </item>
        <item>
          <title><![CDATA[ Clear and confused states ]]></title>
          <link>https://www.junkcharts.com/clear-and-confused-states/</link>
          <guid isPermaLink="false">68e2cf4d04c09e00018ee8f5</guid>
          <category><![CDATA[ interpretation ]]></category>
          <dc:creator><![CDATA[ Kaiser Fung ]]></dc:creator>
          <pubDate>Mon, 04 Aug 2025 20:00:00 -0400</pubDate>
          <content:encoded><![CDATA[
            <style type="text/css">
              * { box-sizing: border-box !important; }
              img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                display: block !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              figure {
                margin: 0 !important;
                padding: 0 !important;
                max-width: 100% !important;
                width: 100% !important;
                overflow: hidden !important;
              }
              .kg-image, .kg-image-card img, .kg-gallery-image img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                object-fit: contain !important;
              }
              .kg-image-card, .kg-gallery-card {
                max-width: 100% !important;
                width: 100% !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              .post-content, .gh-content {
                max-width: 100% !important;
                overflow-x: hidden !important;
              }
              div, p, span {
                max-width: 100% !important;
              }
            </style>
            <p>Long-time contributor Howard H. appreciates this data visualization project in the Washington Post about the emergency preparedness of counties in southern states as floods and hurricanes pound the region. (<a href="https://www.washingtonpost.com/climate-environment/2025/07/07/hurricane-helene-evacuation-north-carolina-warnings/?ref=junkcharts.com">link</a>)</p><p>He and I both like the first graph:</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/1558_001.png" class="kg-image" alt="Wp_ncflood_1" loading="lazy" title="Wp_ncflood_1" width="374" height="241"></figure><p>When the trend is clear, the graph does not need more embellishment.</p><p>Howard said: "Great title, clear visual, nice annotation to drive the point home.  The pink/blue shading isn’t strictly 'Tuftian,' but it emphasizes the chart’s main message, so while it’s a little extra I think I like it."</p><p>There are two minor visual issues:</p><ul><li>I'd have drawn the line in gray so it doesn't get associated with just the red shaded area.</li><li>Also, I'd have extended the negative side of the vertical axis to -8% so that the top and bottom halves have equal heights.</li></ul><p>***</p><p>The second chart in the series perplexes us both.</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/1558_002.png" class="kg-image" alt="Wp_ncflood_2" loading="lazy" title="Wp_ncflood_2" width="1088" height="1174" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/archives/1558_002.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/archives/1558_002.png 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/1558_002.png 1088w" sizes="(min-width: 720px) 720px"></figure><p>Howard is concerned about the story behind the <a href="https://www.junkcharts.com/tag/map/">map</a>. Why did more residents evacuate in the green counties?</p><p>I'm confused about what the <a href="https://www.junkcharts.com/tag/color/">pink</a> part of the data encodes. In the <a href="https://www.junkcharts.com/tag/legend/">legend</a> labeling, pink represents counties in which "more [people] are staying in the storm's path". The legend's title suggests a comparison of the current week to "a normal week".</p><p>They have obtained cell phone data that tracked people's movements. For each county, let's assume they are able to compute both the in-flow and the out-flow of people in any given week. They define something called the "normal week": we assume this to mean the average week for some historical period of time. So for each county, they have the average in and average out. They also have the current week's in and out.</p><p>This is where things get murky. We are comparing the current week to the normal week. Because the in and out metrics are separate counts, they can go in opposite directions. If so, the data could not be tamed by a one-dimensional <a href="https://www.junkcharts.com/tag/color/">color</a> <a href="https://www.junkcharts.com/tag/scale/">scale</a>.</p><p>Maybe they don't track in- and out-flows separately, combining the two metrics to yield the net flow, defined as out minus in. Given the direction I've chosen, let's call it the net outflow.</p><p>If the net outflow is 0%, that means the same volume of movement was observed in the current week relative to normal. Notice that this does not mean the volume of in-flow equals the volume of out-flow. For example, if a net of 10,000 people move out of a county in a normal week, then a net outflow of 0% represents 10,000 people leaving the county in the current week. To make this point clearer, let's rename the metric the relative net outflow.</p><p>If the relative net outflow is a positive percentage, that means relatively more people moved out of the county during the current week than normal. If normally there is a net outflow, then the current week's net outflow is even larger. If normally there is a net inflow, then the current week's net inflow is smaller, or it could even flip from net in to net out.</p><p>In the <a href="https://www.junkcharts.com/tag/color/">color</a> <a href="https://www.junkcharts.com/tag/legend/">legend</a>, positive relative net outflow is shown in green and described as "more [people] leaving storm's path [than in a normal week]".</p><p>The pink part is described as "more [people] staying in the storm's path [than in a normal week]".  This section of the <a href="https://www.junkcharts.com/tag/scale/">scale</a> corresponds to a negative relative net outflow, i.e. relatively fewer people than usual moved out of the county during the current week. In the first case, if normally there is a net outflow, then the current week's net outflow is smaller. To me, this is unexpected. If the county's residents choose to ignore the potential storms, they'd have gone on their business as usual, and I'd have expected the relative net outflow to stay within the normal range, rather than moving in a negative direction.</p><p>In the second case, if normally there is a net inflow, then the current week's net inflow is larger. This is counterintuitive in the same way. Could it be the case that residents from other counties who are evacuating decide to move to these counties?</p><p>In my discussion with Howard, we both feel that most counties probably experience neutral movements over a period of time, i.e. average net outflow is close to zero. This assumption doesn't help with the interpretation; it just suggests that the comparison to the "normal week" is a moot point.</p><p>***<br>Howard brings up an alternative <a href="https://www.junkcharts.com/tag/scale/">scaling</a> scheme: compute the evacuation rates of all counties, take the average evacuation rate as the midpoint, so that the scale represents a specific county's evacuation rate relative to the average county.</p><p>Since the map directly references a "normal week", it's probably not what they did.</p>
          ]]></content:encoded>
          <description><![CDATA[ What does pink mean on this map? ]]></description>
        </item>
        <item>
          <title><![CDATA[ One solution to OMNY&#x27;s math problem ]]></title>
          <link>https://www.junkcharts.com/one-solution-to-omnys-math-problem/</link>
          <guid isPermaLink="false">68e697c1a6f93c000172b3d9</guid>
          <category><![CDATA[ Algorithms ]]></category>
          <dc:creator><![CDATA[ Kaiser Fung ]]></dc:creator>
          <pubDate>Wed, 30 Jul 2025 20:00:00 -0400</pubDate>
          <content:encoded><![CDATA[
            <style type="text/css">
              * { box-sizing: border-box !important; }
              img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                display: block !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              figure {
                margin: 0 !important;
                padding: 0 !important;
                max-width: 100% !important;
                width: 100% !important;
                overflow: hidden !important;
              }
              .kg-image, .kg-image-card img, .kg-gallery-image img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                object-fit: contain !important;
              }
              .kg-image-card, .kg-gallery-card {
                max-width: 100% !important;
                width: 100% !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              .post-content, .gh-content {
                max-width: 100% !important;
                overflow-x: hidden !important;
              }
              div, p, span {
                max-width: 100% !important;
              }
            </style>
            <p>In a prior <a href="https://www.junkcharts.com/omnys-math-problem/">post</a>, I attempted to divine how OMNY determines the 7-day window for doing fare capping. There just doesn't seem to be an easy way to do the math, if we take their promotional copy seriously, or even semi-seriously.</p><p>To recap, for commuters using the OMNY card, they should only see charges top out at $34 during a seven-day period, no matter how many rides were taken. For a ride to be free, it must be ride #13 or higher inside some 7-day counting window. But it's not clear, given a sequence of prior taps, which tap is the first tap of the currently active 7-day window?</p><p>I can't find further details on the OMNY <a href="https://omny.info/fares?ref=junkcharts.com">website</a>, though. The key issue with the official description is that the "first tap" is hard to nail down.</p><p>Given a series of taps, let's imagine allowing each tap to initiate its own 7-day window. The only windows that should concern us are those that overlap with the present time (shown in blue below).</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/11280_002.png" class="kg-image" alt="Kfung_omny_firsttaps" loading="lazy" title="Kfung_omny_firsttaps" width="1496" height="1176" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/archives/11280_002.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/archives/11280_002.png 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/11280_002.png 1496w" sizes="(min-width: 720px) 720px"></figure><p>But as you can see in the diagram, we expect multiple such windows to include the present moment. The count of prior rides in each window is different, and so is the sequence number of the next ride! This is why our heads explode if we try to process their imprecise description.</p><p>***</p><p>There is a way around this mess. Instead of counting forwards, we count backwards.</p><p>Imagine a series of ride times associated with a commuter. That's the dataset I'm working with. The first thing I do is to drop most of this history; I only care about the rides that occurred within the last seven days.</p><p>To make things concrete, it's Monday 9 am sharp, and a commuter is tapping. Taking a 7-day backward window, I pull out this commuter's entire sequence of rides from last Monday 9:01 am up to and including 9 am. My goal is to determine if this next ride (at 9 am) should be free.</p><p>If the number of rides in that counting window is 13 or more, then this ride, i.e. the 13th ride in a 7-day period, should be free. If it's 12 or fewer, this ride will be charged.</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/11280_003.png" class="kg-image" alt="Kfung_omny_countbackwards" loading="lazy" title="Kfung_omny_countbackwards" width="1218" height="1076" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/archives/11280_003.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/archives/11280_003.png 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/11280_003.png 1218w" sizes="(min-width: 720px) 720px"></figure><p>Now, I will roll this window forward every time this commuter taps again. When the commuter rides again after work, say at 6 pm on Monday, I shift forward the counting window, to start at 6:01 pm last Monday ending at 6 pm today.</p><p>How does the number of rides change in the counting window relative to the morning? The count decreases by the number of rides that happened from 9:01 am to 6 pm the previous Monday, as this time segment drops out of the counting window. Simultaneously, the count increases by the number of rides that happened from 9:01 am to 6 pm this Monday.</p><p>The change in the ride count is the net of those two values. If more rides are added than dropped, the ride count goes up; conversely, the count decreases.</p><p>If the commuter does not leave the office during the work day, then there should be exactly one ride in the decrement window (occurring precisely at 6 pm the previous Monday), while there will be a single ride in the increment window, occurring at 6 pm today. The net change in the count is zero.</p><p>We still can't tell if the 6 pm ride today should be charged because we are missing information. We have to know how many rides were recorded in the prior counting window from 9:01 am previous Monday to 9 am this Monday. Let's say there were 13, meaning that the last ride, that is to say, the ride at 9 am today, should have been a free ride.</p><p>Free rides don't contribute to the fare cap. So, the increment window contains one ride, but zero paid rides. The net change in the count is -1. This commuter is now one ride shy of getting another free ride. Thus, the next ride will be charged.</p><p>***</p><p>This implementation of the 7-day fare capping does not square with the promotional language. For one thing, the "first tap" does not matter at all. We are counting backwards from the current time, not counting forwards from some "first tap".</p><p>In addition, each new tap refreshes the seven-day counting window. There is no such thing as "the rest of the 7-day period" because the window is continuously shifting forwards. Therefore, this is not what OMNY said they implemented, if we trust the promotional language.</p><p>My algorithm can be described simply and precisely: the next ride is free if it's <em>paid</em> ride #13 or higher within the last seven days. However, it's still not easily audited by commuters. You typically can't recall how many rides you've taken in the prior seven days, down to the minute. (It's even worse I imagine for those tapping their credit cards, as the OMNY transactions are dispersed among your other charges.)</p><p>You can be someone who just trusts authority. In that case, they can do whatever they want, because you aren't checking. You'd also praise whatever it is they do as effortless and convenient.</p><p>In reality, you've outsourced the auditing task to other commuters who care, or watchdogs. Your trust derives from people like me. What I'm finding out is there isn't even enough information out there to verify their implementation.</p>
          ]]></content:encoded>
          <description><![CDATA[ Here&#39;s one way to implement fare capping ]]></description>
        </item>
        <item>
          <title><![CDATA[ OMNY&#x27;s mind-blowing solution to a math problem ]]></title>
          <link>https://www.junkcharts.com/omnys-math-problem/</link>
          <guid isPermaLink="false">68d85efb8ad60c0001366d27</guid>
          <category><![CDATA[ Algorithms ]]></category>
          <dc:creator><![CDATA[ Kaiser Fung ]]></dc:creator>
          <pubDate>Mon, 28 Jul 2025 20:00:00 -0400</pubDate>
          <content:encoded><![CDATA[
            <style type="text/css">
              * { box-sizing: border-box !important; }
              img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                display: block !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              figure {
                margin: 0 !important;
                padding: 0 !important;
                max-width: 100% !important;
                width: 100% !important;
                overflow: hidden !important;
              }
              .kg-image, .kg-image-card img, .kg-gallery-image img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                object-fit: contain !important;
              }
              .kg-image-card, .kg-gallery-card {
                max-width: 100% !important;
                width: 100% !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              .post-content, .gh-content {
                max-width: 100% !important;
                overflow-x: hidden !important;
              }
              div, p, span {
                max-width: 100% !important;
              }
            </style>
            <p>There is another baffling thing about the new OMNY system. It's their new approach to frequent-use discounting.</p><p>In the old swipe-card system (<a href="https://www.mta.info/fares-tolls/subway-bus/metrocard?ref=junkcharts.com">link</a>), frequent commuters buy weekly and monthly passes to save money.</p><p>The price for a single ride is currently $2.90. The seven-day pass costs $34 while the 30-day pass is $132.</p><p>The math is simple. You pay upfront for unlimited rides. After paying $34, you just hop on and off buses and subways without a care for the next 7 days. On the twelfth ride, the weekly pass pays off compared to the alternative of pay per ride because 12*$2.90 = $34.80 &gt; $34.</p><p>Since rides usually are taken in pairs (going out, coming back), if you expect to use the subway or bus once a day for six out of seven days, you should get the 7-day pass. To be sure, you can keep track of whether there is a day in which you didn't take a ride. More than one such day and your weekly pass will likely not pay out. Meanwhile, if there is a day with two round trips, it's almost certain that the pass will come good.</p><p>A similar math applies to the 30-day pass. You'd want to avoid more than seven days of no trips. (The monthly pass is not offered in the new OMNY system, so I'll focus on the 7-day pass from now on.)</p><p>***</p><p>OMNY changes everything, and tells commuters the new way is much easier. Don't believe it.</p><p>The new frequent commuter discount is promoted as: "You keep tapping--let us do the math!" This sounds like a great convenience to commuters, only if you trust them with the math. This is especially so because the OMNY people make it strenuous for any commuter to follow the trail of charges. (See my prior <a href="https://www.junkcharts.com/omny-needs-a-facelift/">post</a> on their communications fiasco.).</p><p>Alright, they didn't really say let us do the math in those exact words. As the following subway ad shows, the actual words are "Start tapping any day and $34 is the most you'll pay in a week for unlimited rides."</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/11279_002.jpg" class="kg-image" alt="Omny_7day_ad sm" loading="lazy" title="Omny_7day_ad sm" width="726" height="533" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/archives/11279_002.jpg 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/11279_002.jpg 726w" sizes="(min-width: 720px) 720px"></figure><p>At first sight, this sounds simple. You just keep tapping, and when the amount exceeds $34, it should stop adding extra charges. Beyond that point, all rides are free. OMNY calls this "fare capping".</p><p>If you give it another moment, you'd realize that something doesn't click. Let's assume you've gone beyond the cap, and all future rides are free. Now ask when will your rides stop being free? (The OMNY tablets never say how much is charged as you go through the turnstile. Hence this question.)</p><p>To answer that, you'd have to know when the clock started. Indeed, when did the seven-day window begin? In the official materials (<a href="https://omny.info/fares?ref=junkcharts.com">link</a>), this moment is called the "first tap".</p><p>The subway ad hides a little big issue under the rug. It says you can start "any time" but other than the first time you ever use the OMNY card, which of your other taps is a first tap? Is the second lifetime tap the second tap of the same counting window, or is it the "first tap" of a new counting window?</p><p>You might think that is a silly question. So, let's walk through a scenario. Let the "first tap" start the clock, and they look at the seven days from that moment. Assume you did not ride enough to meet the cap. Supposedly, after seven days, the counting window resets--but it probably doesn't until your next tap. It's highly unlikely that you magically tap exactly seven days from the previous first tap. Therefore, if the window resets exactly seven days from the first tap, it would no longer start with a tap. For the next window to start on a "first tap", it has to wait till your next tap.</p><p>Is your head hurting as much as mine? This is cognitive overkill.</p><p>A literal interpretation of fare capping: the very first time you use the OMNY card, it establishes your personal first-tap time (say, Monday 9 am). This then divide your future into 7-day windows, all starting on Monday at 9 am. If you commit this time to memory, then your approach to using fare capping is similar to the previous 7-day pass: just tap as many times as possible in the next seven days.</p><p>I don't think that's the right interpretation since only the first counting window starts with a first tap; none of the others will.</p><p>If we follow the other interpretation, each new counting window starts with a first tap, thus after one counting window ends, the next one does not start till your next tap. From a commuter's perspective, this is mindblowingly complex. Imagine you are about to tap, and you want to know where you are within the current 7-day window. You'd have to start from your very first tap, and then work out each counting window, one at a time!</p><p>That's why I called this fare-capping program "Don't ask questions, just trust us."</p><p>This is not how you treat your customers. The next mayor should fix this, presto.</p><p>***</p><p>In the next post, I'll discuss how I think this fare-capping scheme actually works. That is to say, how it would work if I were designing it.</p><p>P.S. It doesn't have to be this hard. I heard from someone familiar with one transit system in Australia. They have a daily fare cap. The day starts and ends the same way for everyone. Fares are capped to a maximum per day. It's really that simple.</p><p>Would love to hear how frequent commuter discounting works in your transit system!</p><p>[7/31/2025] The next post can be found <a href="https://www.junkcharts.com/one-solution-to-omnys-math-problem/">here</a>.</p>
          ]]></content:encoded>
          <description><![CDATA[ The unintuitive fare capping scheme for NYC commuters ]]></description>
        </item>
        <item>
          <title><![CDATA[ OMNY needs a facelift ]]></title>
          <link>https://www.junkcharts.com/omny-needs-a-facelift/</link>
          <guid isPermaLink="false">68e697c1a6f93c000172b3da</guid>
          <category><![CDATA[ Behavior ]]></category>
          <dc:creator><![CDATA[ Kaiser Fung ]]></dc:creator>
          <pubDate>Sun, 27 Jul 2025 20:00:00 -0400</pubDate>
          <content:encoded><![CDATA[
            <style type="text/css">
              * { box-sizing: border-box !important; }
              img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                display: block !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              figure {
                margin: 0 !important;
                padding: 0 !important;
                max-width: 100% !important;
                width: 100% !important;
                overflow: hidden !important;
              }
              .kg-image, .kg-image-card img, .kg-gallery-image img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                object-fit: contain !important;
              }
              .kg-image-card, .kg-gallery-card {
                max-width: 100% !important;
                width: 100% !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              .post-content, .gh-content {
                max-width: 100% !important;
                overflow-x: hidden !important;
              }
              div, p, span {
                max-width: 100% !important;
              }
            </style>
            <p>New York City is in the process of transitioning commuters to a chip card called OMNY. Switching to chip is a good move but the new user interface is terrible.</p><p>Commuters are supposed to tap the OMNY card on the above tablet, and then walks through the turnstile when the light turns green. (Commuters can use their phone or chip credit card to pay as well. I'll be talking about OMNY card users but the content also applies to these others.)</p><p>It's as simple as that.</p><p>Sadly, the user communication is simple to the point of useless. In my months of tapping, the only responses I have seen are the green light, the red light, and, on a few occasions, the blue light. It's the four corner neons that switch colors.</p><p>When it's green, the interface does not inform me how much was charged, nor how much money I have left on my card. This information is provided to commuters in the swipe-card interface on every trip.</p><p>In addition, the swipe-card interface tells me I'm using a free transfer, e.g. when transferring from bus to subway. With OMNY, the screen once again lights up green but it shows the same green light whether the trip is charged or free!</p><p>It doesn't get better from here. When I do get a red light, it does not indicate why. Is it because I didn't hold the card long enough? Is it because the balance is too low? Is it because of software malfunction? The swipe-card display tells me whether it's out of money, and sometimes, it just tells me to swipe again. With OMNY, it's become the commuter's responsibility to figure out what went wrong (they got rid of most humans many years ago so there is usually no one around to ask, nor would they have the tools to diagnose the problem anyway).</p><p>Once in a while, the screen shows blue - I think it's a different hue of blue. It also does not explain itself. On the last occasion this happened to me, I was able to walk through the turnstile as if it was green. Who knows?</p><p>***</p><p>As a result, it takes a lot more effort to track and audit the charges under OMNY compared to swipe cards. The previous system is more honest. You pay for a service, and your payment is immediately acknowledged.</p><p>The OMNY tablet offers a much larger screen, but this real estate is wasted.</p>
          ]]></content:encoded>
          <description><![CDATA[ OMNY has a bigger screen but displays less information ]]></description>
        </item>
        <item>
          <title><![CDATA[ Our digital breadcrumbs ]]></title>
          <link>https://www.junkcharts.com/our-digital-breadcrumbs/</link>
          <guid isPermaLink="false">68e697c1a6f93c000172b3db</guid>
          <category><![CDATA[ Big Data ]]></category>
          <dc:creator><![CDATA[ Kaiser Fung ]]></dc:creator>
          <pubDate>Thu, 24 Jul 2025 20:00:00 -0400</pubDate>
          <content:encoded><![CDATA[
            <style type="text/css">
              * { box-sizing: border-box !important; }
              img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                display: block !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              figure {
                margin: 0 !important;
                padding: 0 !important;
                max-width: 100% !important;
                width: 100% !important;
                overflow: hidden !important;
              }
              .kg-image, .kg-image-card img, .kg-gallery-image img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                object-fit: contain !important;
              }
              .kg-image-card, .kg-gallery-card {
                max-width: 100% !important;
                width: 100% !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              .post-content, .gh-content {
                max-width: 100% !important;
                overflow-x: hidden !important;
              }
              div, p, span {
                max-width: 100% !important;
              }
            </style>
            <p>The Cold Play jumbotron saga has been getting max play on my timelines the past week (for background, see <a href="https://apnews.com/article/jumbotron-coldplay-couple-privacy-de7dcc76a736d67b81c530d3186f5aed?ref=junkcharts.com">here</a>). To recap, during a recent Cold Play concert, a random couple was shown on the big screen being intimate. The couple and another friend noticed they were on camera, and instead of waving and smiling at it, the couple ducked out of view. The singer narrated the scene to the crowd, and speculated that the couple were caught having an affair or were just really shy.</p><p>In another world, one would just laugh at this awkward scene and move on. The reaction was worse than the act; they were touching but weren't making love or something.</p><p>We aren't in ordinary times. We have been thrusted into the <a href="https://www.junkcharts.com/numbersruleyourworld/big_data">Big Data</a> era, like it or not.</p><p>***</p><p>So what happened next? Someone decided to "dox" them. "Dox" is the slang for putting out the identity of someone without their consent.</p><p>Doxing is super easy because our photos and images have been stockpiled by countless, mostly secretive, businesses. We put data into these databases every time we upload a photo of ourselves to the cloud, or compliantly upload a headshot to verify our identity, or use our face to log into a device. Even if one is very careful never to put one's own photo on a remote server, one can't stop one's friends from uploading a photo from the last outing, and then tagging all participants. The act of tagging a person on a photo is to create an entry in a database that connects a name to an image of a face.</p><p>Your phone may automatically identify faces in your camera roll; it may even create folders for specific people that have been detected in multiple photos - without your explicit tagging. Once there exists a folder of photos of the same person, it's simple to now put a name to the folder. If your phone isn't doing it, it's because some years ago, a certain business - I recall it being Facebook - made face recognition a feature of the camera, and users pushed back against it. But, these service providers, whether it's Apple, Facebook, or any number of other players, can easily put name to face. Today, my sense is that the resistance to such technology has mostly dissipated.</p><p>Because of this type of technology, it's straightforward for anyone (who's willing to pay a subscription fee) to "dox" someone from a photo. Thus, the couple at the Cold Play concert was quickly found to be the CEO and head of HR at an AI startup. Then, strangers found their way to all their social media, including Linkedin profiles, and the world learned everything about anything that can be found publicly. Journalists are feasting on the situation too, which explains why my timelines won't stop pushing this content.</p><p>With AI, this type of content can be generated today without human intervention. The only possible barrier is the absence of a preexisting subscription to the doxing service. If this is in place, AI can dox the person, fetch all their social media content, and write any number of sensational articles.</p><p>***</p><p>This is yet another example of technologies that have useful applications but can be turned into something much more sinister. I refuse to believe that the disappearance of opposing voices means people accept these negative consequences.</p><p>The more salicious aspect about that embracing couple is that both the CEO and the head of HR are married - and not to each other. All the writers assume that they are cheating on their spouses. But do we really know? It's certainly possible that both couples are in open relationships. I have no idea, but neither do those who label them as cheaters.</p><p>We have seen similar crowd behavior before, but in a much graver setting. Remember the gruesome quadruple murders in a college town In Idaho. Incidentally, the PhD student in criminology recently accepted a plea deal to avoid potentially getting the death penalty (<a href="https://www.cnn.com/2025/06/30/us/bryan-kohberger-update-plea-deal?ref=junkcharts.com">link</a>). The convicted murderer wasn't apprehended for some time after the murders, and during this time, people obtained video footage from various places the victims visited that fateful night. The Internet sleuths doxed quite a few people, scoured their social media content, and pushed stories that accused them as likely quadruple-murder suspects. For example, this professor later filed a lawsuit against one of the Tiktok influencers for defamation (<a href="https://www.nbcnews.com/news/us-news/idaho-professor-sues-tiktoker-allegations-killing-4-university-student-rcna63149?ref=junkcharts.com">link</a>).</p><p>The technologies and culture that drove these false accusations are the same as those that doxed the pair at the Cold Play concert. In the Idaho murders case, we can now say for sure that those doxed individuals were definitely falsely accused. If the faces of these individuals weren't readily found in databases, they would not have been dragged through the mud.</p><p>Even if the couple were engaging in extramarital affairs, implying they also were flaunting workplace rules, and one holds moral values dearly, is this how we want society to handle such cases?</p>
          ]]></content:encoded>
          <description><![CDATA[ The Big Data phenomenon behind the Cold Play jumbotron saga ]]></description>
        </item>
        <item>
          <title><![CDATA[ Why is this chart confusing? ]]></title>
          <link>https://www.junkcharts.com/why-is-this-chart-confusing/</link>
          <guid isPermaLink="false">68e2cf4d04c09e00018ee8f6</guid>
          <category><![CDATA[ histogram ]]></category>
          <dc:creator><![CDATA[ Kaiser Fung ]]></dc:creator>
          <pubDate>Sun, 20 Jul 2025 20:00:00 -0400</pubDate>
          <content:encoded><![CDATA[
            <style type="text/css">
              * { box-sizing: border-box !important; }
              img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                display: block !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              figure {
                margin: 0 !important;
                padding: 0 !important;
                max-width: 100% !important;
                width: 100% !important;
                overflow: hidden !important;
              }
              .kg-image, .kg-image-card img, .kg-gallery-image img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                object-fit: contain !important;
              }
              .kg-image-card, .kg-gallery-card {
                max-width: 100% !important;
                width: 100% !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              .post-content, .gh-content {
                max-width: 100% !important;
                overflow-x: hidden !important;
              }
              div, p, span {
                max-width: 100% !important;
              }
            </style>
            <p>Significance Magazine has a fascinating <a href="https://academic.oup.com/jrssig/article-abstract/22/4/6/8151666?redirectedFrom=fulltext&ref=junkcharts.com">article</a> about the success rate of Broadway productions. The authors conclude that Broadway investors have about 20-25% chance of recouping their original investment. (Recoupment means breakeven, different from making tons of profits!)</p><p>That number is a bit higher than the folklore number of about 20%. The importance of their contribution is to put some data rigor behind their number.</p><p>Figure 4 in the article is the following chart:</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/1557_001.png" class="kg-image" alt="Significance_broadwayrecoupment_figure4_sm" loading="lazy" title="Significance_broadwayrecoupment_figure4_sm" width="418" height="271"></figure><p>The chart compares histograms for two groups of shows. The reference group (blue) are shows known to have recouped their investments with a known recoupment date, some weeks after opening. The histogram presents the time to recoupment. The comparison group (orange) are shows that closed but without public information that can be used to figure out if the investors recouped their money before closing. This histogram displays the time until closing. The chart was conceived in an effort to guess the "label" for the second group of shows; should they be called a success (recouped) or a failure in the analysis?</p><p>In the authors' own words, then:</p><blockquote>Figure 4 compares the total running times of these shows to the recoupment times of the 22 shows with known recoupment dates, suggesting that many likely did not last long enough to recoup.</blockquote><p>I'm confused as to how the histograms support this conclusion. The above statement suggests that the shows with unknown recoupment dates have generally closed earlier than those with known recoupment dates. In the chart, I counted 12 blue shows that closed within 30 weeks against 3 such orange shows. Considering the entire histograms, I also don't sense that the orange one is poised to the left of the blue one. (One possibility is the color labels were accidentally swapped.)</p><p>***</p><p>The above observation then leads me down the rabbit hole of investigating the source of confusion.</p><p>Since the authors clearly stated that there were 22 shows with known recoupment dates, I can see which columns sum up to 22. The blue columns: 7+5+1+4+1+4+2+2+1+1 = 28 shows while the orange columns: 2+1+4+5+3+1+1+2+2+1=22 shows. It'd seem that the orange histogram corresponds to the shows with known recoupment dates, confirming that the labels were swapped. I just have to check the number of shows that closed but with unknown recoupment status. In the article, they said "we were left with 28 final shows whose recoupment status had to be manually classified" so this made me feel better.</p><p>Here's a version of their chart with the right color <a href="https://www.junkcharts.com/tag/text/">labels</a>:</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/1557_002.png" class="kg-image" alt="Junkcharts_redo_sigbroadwayrecoup_v1" loading="lazy" title="Junkcharts_redo_sigbroadwayrecoup_v1" width="351" height="168"></figure><p>Note that I switched the <a href="https://www.junkcharts.com/tag/color/">colors</a> to blue and yellow so that the merged color is green, which is more easily understood than blue+orange = brown.</p><p>***</p><p>Back to the overlapping histograms, it's very confusing to have created three <a href="https://www.junkcharts.com/tag/color/">colors</a> for two groups.</p><p>It's clearer to stack them top and bottom:</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/1557_003.png" class="kg-image" alt="Junkcharts_redo_sigbroadwayrecoup_separate" loading="lazy" title="Junkcharts_redo_sigbroadwayrecoup_separate" width="349" height="341"></figure><p>Or just print the outline of the reference histogram:</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/1557_004.png" class="kg-image" alt="Junkcharts_redo_sigbroadwayrecoup_nofill" loading="lazy" title="Junkcharts_redo_sigbroadwayrecoup_nofill" width="518" height="247"></figure><p>***</p><p>They really should have used density histograms instead of count histograms, given that the two groups have different number of shows. Plotting proportions are fine too although density histograms have better statistical properties (as I explained <a href="https://www.junkcharts.com/what-is-plotted-on-a-histogram/">here</a>).</p>
          ]]></content:encoded>
          <description><![CDATA[ A confusing chart on Broadway shows ]]></description>
        </item>
        <item>
          <title><![CDATA[ Say goodbye to soccer ]]></title>
          <link>https://www.junkcharts.com/say-goodbye-to-soccer/</link>
          <guid isPermaLink="false">68e697c1a6f93c000172b3dc</guid>
          <category><![CDATA[ Analytics-business interaction ]]></category>
          <dc:creator><![CDATA[ Kaiser Fung ]]></dc:creator>
          <pubDate>Tue, 15 Jul 2025 20:00:00 -0400</pubDate>
          <content:encoded><![CDATA[
            <style type="text/css">
              * { box-sizing: border-box !important; }
              img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                display: block !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              figure {
                margin: 0 !important;
                padding: 0 !important;
                max-width: 100% !important;
                width: 100% !important;
                overflow: hidden !important;
              }
              .kg-image, .kg-image-card img, .kg-gallery-image img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                object-fit: contain !important;
              }
              .kg-image-card, .kg-gallery-card {
                max-width: 100% !important;
                width: 100% !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              .post-content, .gh-content {
                max-width: 100% !important;
                overflow-x: hidden !important;
              }
              div, p, span {
                max-width: 100% !important;
              }
            </style>
            <p>The current U.S. government has another idea for an executive order: it may decide to join the rest of the world and rename "soccer" to "football" (<a href="https://www.msn.com/en-us/sports/soccer/trump-considers-changing-us-soccer-to-football-in-hosting-world-cup/ar-AA1IE5Vj?ref=junkcharts.com">link</a>).</p><p>That's a fairly pointless name change. How about something more impactful?</p><ul><li>Using Celsius instead of Fahrenheit for temperatures</li><li>Using grams instead of ounces</li><li>Using metres instead of feet</li></ul><p>There is a crucial but subtle difference between these actions, though.</p><p>Changing soccer to football creates a collision because there is a different sport called American football, and now "football" becomes imprecise.</p><p>Switching scientific units to align with the rest of the world does not lead to confusion, as there is only one temperature, weight or length.</p>
          ]]></content:encoded>
          <description><![CDATA[ Say goodbye to soccer, hello football ]]></description>
        </item>
        <item>
          <title><![CDATA[ Color bomb ]]></title>
          <link>https://www.junkcharts.com/color-bomb/</link>
          <guid isPermaLink="false">68e2cf4d04c09e00018ee8f7</guid>
          <category><![CDATA[ Bar chart ]]></category>
          <dc:creator><![CDATA[ Kaiser Fung ]]></dc:creator>
          <pubDate>Sun, 13 Jul 2025 20:00:00 -0400</pubDate>
          <content:encoded><![CDATA[
            <style type="text/css">
              * { box-sizing: border-box !important; }
              img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                display: block !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              figure {
                margin: 0 !important;
                padding: 0 !important;
                max-width: 100% !important;
                width: 100% !important;
                overflow: hidden !important;
              }
              .kg-image, .kg-image-card img, .kg-gallery-image img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                object-fit: contain !important;
              }
              .kg-image-card, .kg-gallery-card {
                max-width: 100% !important;
                width: 100% !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              .post-content, .gh-content {
                max-width: 100% !important;
                overflow-x: hidden !important;
              }
              div, p, span {
                max-width: 100% !important;
              }
            </style>
            <p>I found a snapshot of the following leaderboard (<a href="https://openrouter.ai/rankings?ref=junkcharts.com">link</a>) in a newsletter in my inbox.</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/1556_001.png" class="kg-image" alt="Openrouter_leaderboard_stackedcolumns" loading="lazy" title="Openrouter_leaderboard_stackedcolumns" width="469" height="205"></figure><p>This chart ranks different AIs (foundational models) by token usage (which is the unit by which AI companies charge users).</p><p>It's a standard stacked <a href="https://www.junkcharts.com/tag/bar-chart/">column chart</a>, with data <a href="https://www.junkcharts.com/tag/aggregation/">aggregated</a> by week. The <a href="https://www.junkcharts.com/tag/color/">colors</a> represent different foundational models.</p><p>In the original webpage, there is a <a href="https://www.junkcharts.com/tag/table/">table</a> printed below, listing the top 20 model names, ordered from the most tokens used.</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/1556_002.png" class="kg-image" alt="Openrouter_leaderboard_table" loading="lazy" title="Openrouter_leaderboard_table" width="800" height="478" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/archives/1556_002.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/1556_002.png 800w" sizes="(min-width: 720px) 720px"></figure><p>Certain AI models have come and gone (e.g. the yellow and blue ones at the bottom of the chart in the first half). The model in pink has been the front runner through all weeks.</p><p>Total usage has been rising, although it might be flattening, which is the point made by the newsletter publisher.</p><p>***</p><p>A curiosity is the gray shaded section on the far right - it represents the projected total token usage for the days that have not yet passed during the current week. This is one of those additions that I like to see more often. If the developer had chosen to plot the raw data and nothing more, then they would have made the same chart except for the gray section. On that chart, the last column should not be compared to any other column as it is the only one that encodes a partial week.</p><p>This added gray section addresses the specific question: whether the total token usage for the current week is on pace with prior weeks, or faster or slower. (The accuracy of the projection is a different matter, which I won't discuss.)</p><p>This added gray section leaves another set of questions unanswered. The chart suggests that the total token usage is expected to exceed the values for the prior few weeks, at the time it was frozen. We naturally want to know which models are contributing to this projected growth (and which aren't). The current design cannot address this issue because the projected additional usage is <a href="https://www.junkcharts.com/tag/aggregation/">aggregated</a>, and not available at the model level.</p><p>While it "tops up" the weekly total usage using a projected value, the chart does not show how many days are remaining. That's an important piece of information for interpreting the projection.</p><p>***</p><p>Now, we come to the good part, for those of us who loves details.</p><p>A major weakness of these stacked <a href="https://www.junkcharts.com/tag/bar-chart/">column charts</a> is of course the dizzy set of colors required, one for each model. Some of the shades are so similar it's hard to tell if they repeated colors. Are these two different blues or the same blue?</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/1556_003.png" class="kg-image" alt="Openrouter_leaderboard_blues" loading="lazy" title="Openrouter_leaderboard_blues" width="191" height="215"></figure><p>Besides, the visualization software has a built-in feature that "softens" a color when it is clicked on. This feature introduces unpleasant surprises as that soft shade might have been used for another category.</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/1556_004.png" class="kg-image" alt="Openrouter_aimodels_ranking_mutedcolors" loading="lazy" title="Openrouter_aimodels_ranking_mutedcolors" width="1342" height="1082" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/archives/1556_004.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/archives/1556_004.png 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/1556_004.png 1342w" sizes="(min-width: 720px) 720px"></figure><p>It appears that the series is running sideways (following the superimposed gray line) when in fact the first section is a softened red associated with the series that went higher (following the white line).</p><p>It's near impossible to work with so many colors. If you extract the underlying data, you find that they show 10 values per day across 24 weeks. Because the AI companies are busy launching new models, the dataset contains 40 unique model names, which imply they needed 40 different shades on this one chart. (Double that to 80 shades if we add the colors on click variations.)</p><p>***</p><p>I hope some of you have noticed something else. Earlier, I mentioned the model in pink as the most popular AI model but if you take a closer look, this pink section actually represents a mostly useless catch-all category called "Others," that presumably aggregates the token usages of a range of less popular models. In this design, the Others category is catching an undeserved amount of attention.</p><p>It's unclear how the models are <a href="https://www.junkcharts.com/tag/sorting/">ordered</a> within each column. The developer did not group together different generations of models by the same developer. Anthropic Claude has many entries: Sonnet 4 [green], Sonnet 3.5 [blue], Sonnet 3.5 (self-moderated) [yellow], Sonnet 3.7 (thinking) [pink], Sonnet 3.7 [violet], Sonnet 3.7 (self-moderated) [cyan], etc. The same for OpenAI, Google, etc.</p><p>This graphical decision may reflect how users of large language models evaluate performance. Perhaps at this time, there is no brand loyalty, or lock-in effect, and users see all these different models as direct substitutes. Therefore, our attention is focused on the larger number of individual models, rather than the smaller set of AI developers.</p><p>***</p><p>Before ending the post, I must point out that the publisher of this set of rankings offers a platform that allows users to switch between models. They are visualizing their internal data. This means the dataset only describes what customers of Openrouter.ai do on this platform. There should be no expectation that this company's user base is representative of all users of LLMs.</p>
          ]]></content:encoded>
          <description><![CDATA[ Color bomb in AI analytics ]]></description>
        </item>
        <item>
          <title><![CDATA[ Will AI make cheaters of us all? ]]></title>
          <link>https://www.junkcharts.com/will-ai-make-cheaters-of-us-all/</link>
          <guid isPermaLink="false">68e697c1a6f93c000172b3dd</guid>
          <category><![CDATA[ artificial intelligence ]]></category>
          <dc:creator><![CDATA[ Kaiser Fung ]]></dc:creator>
          <pubDate>Tue, 08 Jul 2025 20:00:00 -0400</pubDate>
          <content:encoded><![CDATA[
            <style type="text/css">
              * { box-sizing: border-box !important; }
              img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                display: block !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              figure {
                margin: 0 !important;
                padding: 0 !important;
                max-width: 100% !important;
                width: 100% !important;
                overflow: hidden !important;
              }
              .kg-image, .kg-image-card img, .kg-gallery-image img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                object-fit: contain !important;
              }
              .kg-image-card, .kg-gallery-card {
                max-width: 100% !important;
                width: 100% !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              .post-content, .gh-content {
                max-width: 100% !important;
                overflow-x: hidden !important;
              }
              div, p, span {
                max-width: 100% !important;
              }
            </style>
            <p>Andrew wrote an amusing <a href="https://statmodeling.stat.columbia.edu/2025/07/07/chatbot-prompts?ref=junkcharts.com">post</a> about mischief using AI in peer reviewing for academic journals.</p><p>It emerged that authors of scientific papers have resorted to embedding secret prompts inside their text to instruct large language models (LLMs) to give their papers positive reviews. These prompts may be printed in white, or tiny font, so they are intended to evade humans. Some prompts are quite elaborate, carrying instructions for what to say about strengths as well as what to say about weaknesses. For example:</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/11275_001.png" class="kg-image" alt="Llm_reviewer_prompt_via_agelman" loading="lazy" title="Llm_reviewer_prompt_via_agelman" width="411" height="226"></figure><p>Be sure to check out the comments section, as readers fuss over which group is worse: the authors who instruct LLMs to give positive reviews only; or the reviewers who rely on LLMs to submit their reports. As Andrew told the story, one author who admitted to inserting these prompts argued that they did it only to deal with cheating reviewers who deploy LLMs. So, we are witnessing the classic two kids in a playground scenario - he's the one who started it!</p><p>We can take this blame game one step further. The cheating reviewers should blame it on the authors because some authors are using LLMs to write bogus papers!</p><p>***</p><p>Unfortunately, this is the world of AI we find ourselves in. At an event recently, I chatted with an instructor who is throwing his hands up, complaining that he is spending time correcting code submitted by his students who are obviously using AI to do the work. Meanwhile, there are students complaining that their instructors use AI to set or mark assignments. They can of course blame each other.</p><p>Would one begrudge instructors who ask AI to mark assignments if the work were generated by AI? Would one judge the students who use AI to do their homework if said assignments were created by AI?</p><p>Is this a race to the bottom? Eventually, will humans do any work?</p>
          ]]></content:encoded>
          <description><![CDATA[ Fake reviews by fake reviewers of fake papers by fake authors ]]></description>
        </item>
        <item>
          <title><![CDATA[ Know your data 46: using our data to set pricing ]]></title>
          <link>https://www.junkcharts.com/know-your-data-46-using-our-data-to-set-pricing/</link>
          <guid isPermaLink="false">68e697c1a6f93c000172b3de</guid>
          <category><![CDATA[ Algorithms ]]></category>
          <dc:creator><![CDATA[ Kaiser Fung ]]></dc:creator>
          <pubDate>Sun, 06 Jul 2025 20:00:00 -0400</pubDate>
          <content:encoded><![CDATA[
            <style type="text/css">
              * { box-sizing: border-box !important; }
              img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                display: block !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              figure {
                margin: 0 !important;
                padding: 0 !important;
                max-width: 100% !important;
                width: 100% !important;
                overflow: hidden !important;
              }
              .kg-image, .kg-image-card img, .kg-gallery-image img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                object-fit: contain !important;
              }
              .kg-image-card, .kg-gallery-card {
                max-width: 100% !important;
                width: 100% !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              .post-content, .gh-content {
                max-width: 100% !important;
                overflow-x: hidden !important;
              }
              div, p, span {
                max-width: 100% !important;
              }
            </style>
            <p>Google's former CEO Eric Schmidt infamously said something along the lines of "If you don't want others to know, you shouldn't be doing it in the first place". It showed the hubris of Silicon Valley at the time, and a certain deceitfulness. Because the truth is if they have your data, they can use the data to harm you, even if you haven't done anything wrong!</p><p>Finally, we have some evidence of what's been going on behind closed doors for a long time. They use our data to price discriminate. Same product, different prices, based on analyzing our data.</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/11274_001.jpg" class="kg-image" alt="Carl-tronders-singapore-cameras-sm" loading="lazy" title="Carl-tronders-singapore-cameras-sm" width="340" height="234"></figure><p>This practice, known as "surveillance pricing", caught notice because an industry body is suing New York State about a new law that requires companies to disclose that they are using algorithms (and data) to set different prices for different people (<a href="https://www.reuters.com/legal/litigation/new-york-sued-by-national-retail-federation-over-surveillance-pricing-law-2025-07-02/?ref=junkcharts.com">link</a>). Look, the state is not banning surveillance pricing; they are requiring notification.</p><p>The industry doesn't want us to know.</p><p>The pushback from industry follows the usual script:</p><p>They bring up alternative scenarios of potential benefit to dismiss scenarios of harm. The state alleges that prices are raised on those who can afford them. In this case, they claim that the same algorithms are used to lower prices by offering discounts to selected customers.</p><p>I don't doubt that algos target special deals at specific customers. But I fail to understand why customers who love receiving coupons would object to the required disclosure of surveillance pricing - in fact, it would be great marketing to inform these customers that algos found them good deals; they might learn to like algorithms!</p><p>Surely, no consumers should object to such disclosure.</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/11274_002.jpg" class="kg-image" alt="_numbersense_bookcover" loading="lazy" title="_numbersense_bookcover" width="150" height="226"></figure><p><br>The industry apparently wants us to believe that the primary objective of surveillance pricing is to deliver discounts to customers. Anyone who read Chapters 3 to 5 (marketing data) of <strong>Numbersense</strong> (<a href="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/11274_002.jpg?ref=junkcharts.com">link</a>) should recognize it as hogwash. Discounts result in lower revenues, unless the business can somehow prevent existing customers from using them. Whatever "leakage" happens, the business has to make up for these "lost" revenues. Thus, the same algos are likely to raise prices on other customers. With the amount of data at their disposal, it's not hard to figure out which customers are less price-sensitive, or have higher "willingness to pay".</p><p>I have worked on such algos. What kind of reception do you think data scientists would get from the business teams if we present to them an algorithm that delivers discounts to selected customers, leading to the predicted outcome of lower total expected revenues (and lower profitability)?</p><p>***</p><p>The usual script from the industry bodies also includes the false claim that telling the truth is "misleading". This is the same script used to oppose non-GMO labels. In this instance, they assert that customers will interpret the mandated disclosure as evidence of "price gouging".</p><p>These industry honchos aren't alarmed when consumers falsely believe that prices are fixed for everyone - when such disclosure isn't required!</p><p>I must digress to complain about another industry practice that is gaining popularity by the day, at least in the U.S. Many stores don't even bother putting up price tags. Some restaurants and coffee shops put up menus without prices. I just walked into a Vietnamese diner this afternoon, hoping to get an iced coffee to combat the oppressive heat in New York - well, the menu of bahn mi and side dishes is reasonably priced but the beverage and dessert sections have no prices! I walked out, disgusted. The iced coffee, based on redacted prices, is probably outrageously expensive, either compared to their past pricing, or compared to their peers. I'm guessing at least $6 (plus tax and the delightful... tip screen).</p><p>In their war against disclosures, some businesses won't even put up their prices.</p><p>And yet, we should believe that they won't use our data to maximize their profits.</p>
          ]]></content:encoded>
          <description><![CDATA[ Know your data 46: how they manipulate prices based on analyzing your surveillance data ]]></description>
        </item>
        <item>
          <title><![CDATA[ Light entertainment: Acid Images ]]></title>
          <link>https://www.junkcharts.com/light-entertainment-acid-images/</link>
          <guid isPermaLink="false">68d5ee518ad60c00013660c5</guid>
          <category><![CDATA[ Food ]]></category>
          <dc:creator><![CDATA[ Kaiser Fung ]]></dc:creator>
          <pubDate>Wed, 02 Jul 2025 20:00:00 -0400</pubDate>
          <content:encoded><![CDATA[
            <style type="text/css">
              * { box-sizing: border-box !important; }
              img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                display: block !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              figure {
                margin: 0 !important;
                padding: 0 !important;
                max-width: 100% !important;
                width: 100% !important;
                overflow: hidden !important;
              }
              .kg-image, .kg-image-card img, .kg-gallery-image img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                object-fit: contain !important;
              }
              .kg-image-card, .kg-gallery-card {
                max-width: 100% !important;
                width: 100% !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              .post-content, .gh-content {
                max-width: 100% !important;
                overflow-x: hidden !important;
              }
              div, p, span {
                max-width: 100% !important;
              }
            </style>
            <p>A contact commented on the following chart circulating on Linkedin to promote Portugal:</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/1555_001.jpg" class="kg-image" alt="Linkedin_portugal_processedfood" loading="lazy" title="Linkedin_portugal_processedfood" width="800" height="800" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/archives/1555_001.jpg 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/1555_001.jpg 800w" sizes="(min-width: 720px) 720px"></figure><p>His main complaint: the flag of Portugal is wrong!</p><p>Imagine.</p><p>***</p><p>A couple of things to note about this image.</p><p>I clicked on the "CR" logo on the top left corner, and learned about something called Content Credentials. It tells me that the image was generated by ChatGPT.</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/1555_002.png" class="kg-image" alt="Linkedin_portugal_processedfood_contentcredentials" loading="lazy" title="Linkedin_portugal_processedfood_contentcredentials" width="1164" height="1148" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/archives/1555_002.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/archives/1555_002.png 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/1555_002.png 1164w" sizes="(min-width: 720px) 720px"></figure><p>I applaud this effort. Will it stop fraud? Probably not but at least it gives honest people a way to label the work.</p><p>***</p><p>The second thing is, there are many errors throughout this <a href="https://www.junkcharts.com/tag/map/">map</a>. Let's make a list...</p><p>I'll get us started.</p><p>There are two French flags: one is linked to the second highest value while the other one is linked to the second lowest value.</p>
          ]]></content:encoded>
          <description><![CDATA[ Light entertainment: Acid Images ]]></description>
        </item>
        <item>
          <title><![CDATA[ Students demonstrate how analytics underlie strong dataviz ]]></title>
          <link>https://www.junkcharts.com/students-demonstrate-how-analytics-underlie-strong-dataviz/</link>
          <guid isPermaLink="false">68e2cf4d04c09e00018ee8f8</guid>
          <category><![CDATA[ ray vella ]]></category>
          <dc:creator><![CDATA[ Kaiser Fung ]]></dc:creator>
          <pubDate>Mon, 30 Jun 2025 20:00:00 -0400</pubDate>
          <content:encoded><![CDATA[
            <style type="text/css">
              * { box-sizing: border-box !important; }
              img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                display: block !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              figure {
                margin: 0 !important;
                padding: 0 !important;
                max-width: 100% !important;
                width: 100% !important;
                overflow: hidden !important;
              }
              .kg-image, .kg-image-card img, .kg-gallery-image img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                object-fit: contain !important;
              }
              .kg-image-card, .kg-gallery-card {
                max-width: 100% !important;
                width: 100% !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              .post-content, .gh-content {
                max-width: 100% !important;
                overflow-x: hidden !important;
              }
              div, p, span {
                max-width: 100% !important;
              }
            </style>
            <p>In today's post, I'm delighted to feature work by several students of <a href="https://www.linkedin.com/posts/rayvella_data-visualization-for-business-data1-ce9006-activity-7082482821520347136-Fk0C/?ref=junkcharts.com">Ray Vella</a>'s data visualization class at NYU. They have been asked to improve the following Economist chart entitled "The Rich Get Richer".</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/1554_001.png" class="kg-image" alt="Economist_richgetricher" loading="lazy" title="Economist_richgetricher" width="210" height="271"></figure><p>In my guest lecture to the class, I emphasized the importance of upfront analytics when constructing data visualizations.</p><p>One of the key messages is pay attention to definitions. How does the Economist define "rich" and "poor"? (it's not what you think). Instead of using percentiles (e.g. top 1% of the income distribution), they define "rich" as people living in the richest region by average GDP, and "poor" as people living in the poorest region by average GDP. Thus, the "gap" between the rich and the poor is measured by the difference in GDP between the average persons in those two regions.</p><p>I don't like this metric at all but we'll just have to accept that that's the data available for the class assignment.</p><p>***</p><p>Shulin Huang's work is notable in how she clarifies the underlying algebra.</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/1554_002.png" class="kg-image" alt="Shulin_rvella_economist_richpoorgap" loading="lazy" title="Shulin_rvella_economist_richpoorgap" width="415" height="284"></figure><p>The middle section classifies the countries into two groups, those with widening vs narrowing gaps. The side panels show the two components of the gap change. The gap change is the sum of the change in the richest region and the change in the poorest region.</p><p>If we take the U.S. as an example, the gap increased by 1976 units. This is because the richest region gained 1777 while the poor region lost 199. Germany has a very different experience: the richest region regressed by 2215 while the poorest region improved by 424, leading to the gap narrowing by 2638.</p><p>Note how important it is to keep the <a href="https://www.junkcharts.com/tag/sorting/">order</a> of the countries fixed across all three panels. I'm not sure how she decided the order of these countries, which is a small oversight in an otherwise excellent effort.</p><p>Shulin's <a href="https://www.junkcharts.com/tag/text/">text</a> is very thoughtful throughout. The chart title clearly states "rich regions" rather than "the rich". Take a look at the bottom of the side panels. The label "national AVG" shows that the zero level is the national average. Then, the label "regions pulled further ahead" perfectly captures the positive direction.</p><p>Compared to the original, this chart is much more easily understood. The secret is the clarity of thought, the deep understanding of the nature of the data.</p><p>***</p><p>Michael Unger focuses his work on elucidating the indexing strategy employed by the Economist. In the original, each value of regional average GDP is indexed to the national average of the relevant year. A number like 150 means the region has an average GDP for the given year that is 50% higher than the national average. It's tough to explain how such indices work.</p><p>Michael's revision goes back to the raw data. He presents them in two panels. On the left, the absolute change over time in the average GDPs are presented for each of the richest/poorest region while on the right, the relative change is shown.</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/1554_003.png" class="kg-image" alt="Mungar_rvella_economist_richpoorgap" loading="lazy" title="Mungar_rvella_economist_richpoorgap" width="341" height="240"></figure><p>(Some of the country labels are incorrect. I'll replace with a corrected version when I receive one.)</p><p>Presenting both sides is not redundant. In France, for example, the richest region improved by 17K while the poorest region went up by not quite 6K. But 6K on a much lower base represents a much higher proportional jump as the right side shows.</p><p>***</p><p>Related to Michael's work, but even simpler, is Debbie Hsieh's effort.</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/1554_004.png" class="kg-image" alt="Debbiehsieh_rayvella_economist_richpoorgap" loading="lazy" title="Debbiehsieh_rayvella_economist_richpoorgap" width="1134" height="918" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/archives/1554_004.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/archives/1554_004.png 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/1554_004.png 1134w" sizes="(min-width: 720px) 720px"></figure><p>Debbie reduces the entire exercise to one message - the relative change over time in average GDP between the richest and poorest region in each country. In this simplest presentation, if both columns point up, then both the richest and the poorest region increased their average GDP; if both point down, then both regions suffered GDP drops.</p><p>If the GDP increased in the richest region while it decreased in the poorest region, then the gap widened by the most. This is represented by the blue column pointing up and the red column pointing down.</p><p>In some countries (e.g. Sweden), the poorest region (orange) got worse while the richest region (blue) improved slightly. In Italy and Spain, both the best and worst regions gained in average GDPs although the richest region attained a greater relative gain.</p><p>While Debbie's chart is simpler, it hides something that Michael's work shows more clearly. If both the richest and poorest regions increased GDP by the same percentage amount, the average person in the richest region actually experienced a higher absolute increase because the base of the percentage is higher.</p><p>***</p><p>The numbers across these charts aren't necessarily well aligned. That's actually one of the challenges of this dataset. There are many ways to process the data, and small differences in how each student handles the data lead to differences in the derived values, resulting in differences in the visual effects.</p>
          ]]></content:encoded>
          <description><![CDATA[ Students demonstrate the value of analytics to data visualization ]]></description>
        </item>
        <item>
          <title><![CDATA[ Decluttering charts ]]></title>
          <link>https://www.junkcharts.com/decluttering-charts/</link>
          <guid isPermaLink="false">68e2cf4d04c09e00018ee8f9</guid>
          <category><![CDATA[ Clustering ]]></category>
          <dc:creator><![CDATA[ Kaiser Fung ]]></dc:creator>
          <pubDate>Mon, 23 Jun 2025 20:00:00 -0400</pubDate>
          <content:encoded><![CDATA[
            <style type="text/css">
              * { box-sizing: border-box !important; }
              img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                display: block !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              figure {
                margin: 0 !important;
                padding: 0 !important;
                max-width: 100% !important;
                width: 100% !important;
                overflow: hidden !important;
              }
              .kg-image, .kg-image-card img, .kg-gallery-image img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                object-fit: contain !important;
              }
              .kg-image-card, .kg-gallery-card {
                max-width: 100% !important;
                width: 100% !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              .post-content, .gh-content {
                max-width: 100% !important;
                overflow-x: hidden !important;
              }
              div, p, span {
                max-width: 100% !important;
              }
            </style>
            <p><a href="https://x.com/filwd?ref=junkcharts.com">Enrico</a> posted about the following chart, addressing the current assault on scientific research funding, and he's worried that poor communications skills are hurting the cause.</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/1553_001.jpg" class="kg-image" alt="Bertini_tiretracks" loading="lazy" title="Bertini_tiretracks" width="396" height="305"></figure><p>He's right. You need half an hour to figure out what's going on here.</p><p>Let me write down what I have learned so far.</p><p>The designer only cares about eight research areas - all within the IT field - listed across the bottom.</p><p>Paired with each named research area are those bolded blue labels that run across the top (but not quite). I think they represent the crowning achievement within each field but I'm just guessing here.</p><p>It appears that each field experiences a sequence of development stages. Typically, universities get things going, then industry R&amp;D teams enter the game, and eventually, products appear in the market. The orange, blue and black lines show this progression. The black line morphs into green, and may even expand in thickness - indicating progressive market adoption and growth.</p><p>For example, the first field from the left, digital communications, is shown to have begun in 1965 at universities. Then in early 1980s, industry started investing in this area. It was not until the 1990s when products became available, and not until the mid 2000s when the market exceeded $10 billion.</p><p>Even now, I haven't resolved all its mysteries. It's not explained the difference between a solid black line and a dotted black line. Further, it appears possible to bypass $1 billion and hit $10 billion right away.</p><p>***</p><p>Next, we must decipher the strange web of gray little arrows.</p><p>It appears that the arrows can go from orange to blue, blue to orange, blue to black, orange to black. Under digital communications, I don't see black or green back to blue or orange. However, under computer architecture, I see green to orange; under parallel &amp; distributed systems, I see green to blue. I don't see any black to orange or black to blue, so black is a kind of trapping state (things go in but don't come out). Sometimes, it's better to say which direction is not possible - in this case, I think other than nothing comes out of black, every other direction is possible.</p><p>It remains unclear what sort of entity each arrow depicts. Each arrow has a specific start and end time. I'm guessing it has to do with a specific research item. Taking the bottom-most arrow for digital communications, I suppose something begun in academia in 1980 and then attracted industry investment around 1982. An arrow that points backwards from industry to academia indicates that universities pick up new research ideas from industry. Digital communications things tend to have short arrows, suggesting that it takes only a few years to bring a product to market.</p><p>To add to this mess, some arrows cross research areas. These are shown as curved arrows, rather than straight arrows. For these curved arrows, the "slope" of the arrow no longer holds any meaning.</p><p>The set of gray arrows are trying too hard. They are overstuffed with purposes. On the one hand, the web of arrows - and I'm referring to those between research areas - portray the synergies between different research areas. On the other hand, the arrows within each research area show the development trajectories of anonymized subjects. The arrows going back and forth between the orange and blue bars show the interplay between universities and industry research groups.</p><p>***</p><p>Lastly, we look at those gray text labels at the very top of the page. That's a grab-bag of corporate names (Motorola, Intel, ...) and product names (iPhone, iRobot, ...). Some companies span several research areas. I'm amused and impressed that apparently a linear sequence can be found for the eight research areas such that every single company has investments in only contiguous areas, precluding the need to "leapfrog" certain research areas!</p><p>Actually, no, that's wrong. I do notice Nvidia and HP appearing twice. But why is Google not part of digital communications next to iPhone?</p><p>Given that no universities are listed, the company and product labels are related to only the blue, black or green lines below. It might be only related to black and/or green. I'm not sure.</p><p>***</p><p>So far, <em>I've expended energy only to tease out the structure of the underlying dataset. I haven't actually learned anything about the data!</em></p><p><em>***</em></p><p>The designer has to make some decisions because the different potential questions that the dataset can address impose conflicting graphical requirements.</p><p>If the goal is to surface a general development process that repeats for every research area, then the chart should highlight commonality, rather than difference. By contrast, if one's objective is to illustrate how certain research areas have experiences unique to themselves, one should choose a graphical form that brings out the differences.</p><p>If the focus is on larger research areas, then the relevant key dates are really the front ends of each vertical line; nothing else matters. By contrast, if one wants to show individual research items, then many more dates become pertinent.</p><p>A linear arrangement of the research areas will not perform if one's goal is to uncover connections between research areas. By contrast, if one attempts to minimize crossovers in a network design, it would be impossible to keep all elements belonging to each research area in close proximity.</p><p>A layering approach that involves multiple charts to tell the whole story may be the solution. See for example Gelman's <a href="https://statmodeling.stat.columbia.edu/2025/05/31/the-ladder-of-abstraction-in-statistical-graphics/?ref=junkcharts.com">post</a> on ladder of abstraction.</p>
          ]]></content:encoded>
          <description><![CDATA[ Decluttering charts ]]></description>
        </item>
        <item>
          <title><![CDATA[ Nonlinear thinking in marketing ]]></title>
          <link>https://www.junkcharts.com/nonlinear-thinking-in-marketing/</link>
          <guid isPermaLink="false">68e697c1a6f93c000172b3df</guid>
          <category><![CDATA[ marketing analytics ]]></category>
          <dc:creator><![CDATA[ Kaiser Fung ]]></dc:creator>
          <pubDate>Sun, 15 Jun 2025 20:00:00 -0400</pubDate>
          <content:encoded><![CDATA[
            <style type="text/css">
              * { box-sizing: border-box !important; }
              img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                display: block !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              figure {
                margin: 0 !important;
                padding: 0 !important;
                max-width: 100% !important;
                width: 100% !important;
                overflow: hidden !important;
              }
              .kg-image, .kg-image-card img, .kg-gallery-image img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                object-fit: contain !important;
              }
              .kg-image-card, .kg-gallery-card {
                max-width: 100% !important;
                width: 100% !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              .post-content, .gh-content {
                max-width: 100% !important;
                overflow-x: hidden !important;
              }
              div, p, span {
                max-width: 100% !important;
              }
            </style>
            <p>I recently had the pleasure of listening to <a href="https://www.kaushik.net/avinash/?ref=junkcharts.com">Avinash</a> live at an event sponsored by <a href="https://www.precise.tv/?ref=junkcharts.com">Precise TV</a>. Avinash is someone who gets marketing analytics, as well as a great communicator.</p><p>The talk is centered on his "See, Think, Do, Care" framework, which is posited as a challenger to the dominant schematic of a "marketing funnel".</p><p>For those readers unfamiliar how marketers think, they think <em>linearly</em> not unlike the rest of the world. A marketing funnel is a classical way of organizing the marketing function.</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/11273_001.png" class="kg-image" alt="Amazonadvertising_marketingfunnel_final._TTW_" loading="lazy" title="Amazonadvertising_marketingfunnel_final._TTW_" width="2000" height="1723" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/archives/11273_001.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/archives/11273_001.png 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1600/archives/11273_001.png 1600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w2400/archives/11273_001.png 2400w" sizes="(min-width: 720px) 720px"></figure><p>In the example above, the universe of the potential customers of a business is divided into four groups: the first group consists of people who are aware of the company's products but not yet considering a purchase; the second group are those thinking about buying; the third group are those who decide to buy for the first time; and the last group are those who have purchased more than once.</p><p>In this funnel setup, a marketing team can be split into four sub-teams. The first team focuses on driving awareness; the second team's goal is to get people from aware to interested; the third team - which is the core of marketing - wants to "convert" interested prospects into first-time customers; lastly, the loyalty team's job is customer retention, indicated by return purchases.</p><p>The funnel describes a linearized world. Each person enters the top of the funnel, and marketing's job is to push them down the funnel as far as possible, as quickly as possible, and keep them there.</p><p>Avinash opposes this linear view of the world. In his "See, Think, Do, Care" framework, he also sets up four groups but people can move from any to any. He calls these groups "audience intent clusters".</p><p>The "See" group consists of people who are just looking around. The "Think" group are those who have expressed some interest - in the digital world, interest is evidenced by specific behaviors (such as clicking on some link). The "Do" group are those who are close to buying, for example, those who have moved an item to their shopping cart. The "Care" group are the return customers. Unlike traditional funnel users, Avinash wisely sets the bar higher. Someone who has made just one purchase isn't a target; someone who has made two or more purchases is worth cultivating. It's common sense, yet he's right - most marketers see anyone who has bought something as potentially "loyal". The problem with such an approach is that most of the loyalty marketing dollars would be wasted on people with no intent of returning. Why not focus the spending on those with a higher chance of future business?</p><p>Avinash points out the lack of "care" in how many businesses deal with the "care" segment. This is particularly true of technology companies. Tech support FAQs, and a support phone number that's hidden from view show return customers not love but indifference.</p><p>The key idea of the talk: any person is not trapped in one of four stages until the marketers shove them one step below. A "loyal" customer might be browsing at the brand's Instagram channel, and her intent might be "see", not "do". So, the content shown to her should incite curiosity, rather than hard selling.</p><p>There's more here (<a href="https://www.kaushik.net/avinash/see-think-do-care-win-content-marketing-measurement/?ref=junkcharts.com">link</a>) in Avinash's own words.</p>
          ]]></content:encoded>
          <description><![CDATA[ Reporting from Avinash&#39;s talk on see think do care ]]></description>
        </item>
        <item>
          <title><![CDATA[ Out of line ]]></title>
          <link>https://www.junkcharts.com/out-of-line/</link>
          <guid isPermaLink="false">68e2cf4d04c09e00018ee8fa</guid>
          <category><![CDATA[ Dot plot ]]></category>
          <dc:creator><![CDATA[ Kaiser Fung ]]></dc:creator>
          <pubDate>Tue, 10 Jun 2025 20:00:00 -0400</pubDate>
          <content:encoded><![CDATA[
            <style type="text/css">
              * { box-sizing: border-box !important; }
              img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                display: block !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              figure {
                margin: 0 !important;
                padding: 0 !important;
                max-width: 100% !important;
                width: 100% !important;
                overflow: hidden !important;
              }
              .kg-image, .kg-image-card img, .kg-gallery-image img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                object-fit: contain !important;
              }
              .kg-image-card, .kg-gallery-card {
                max-width: 100% !important;
                width: 100% !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              .post-content, .gh-content {
                max-width: 100% !important;
                overflow-x: hidden !important;
              }
              div, p, span {
                max-width: 100% !important;
              }
            </style>
            <p>This simple chart showing life expectancies in 10 countries raises one's eyebrows.</p><p>The first curiosity is the deliberate placement of Pakistan behind India and China. Every nation is <a href="https://www.junkcharts.com/tag/sorting/">sorted</a> from lowest to highest, except for Pakistan. Is the reason politics? I have no idea. If you have an explanation, please leave a comment.</p><p>***<br>This graphic is an example of <strong>data visualization that does not actually show the data</strong>.</p><p>The positions of the flags do not in fact encode the data! For example, the Indian flag is closer to the Chinese flag than to the Pakistani flag even though the gap between India and China (7) is more than double the gap between India and Pakistan (3).</p><p>Here is what it looks like if the gaps encode the data. With this selection of countries, Pakistan and India are separated from the rest.</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/1552_002.png" class="kg-image" alt="Junkcharts_redo_indiatvlifeexpectancy" loading="lazy" title="Junkcharts_redo_indiatvlifeexpectancy" width="1440" height="1160" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/archives/1552_002.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/archives/1552_002.png 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/1552_002.png 1440w" sizes="(min-width: 720px) 720px"></figure><p>In the original chart, the readers must read the data labels to understand it, and resist interpreting the visual elements.</p><p>I removed the flag poles because they have the unintended consequence of establishing a zero level (where the cartoon characters stand) but the positions of the flags don't reflect a start-at-zero posture.</p><hr><p>Returning to our first topic for a second. If the message of the chart is to single out Pakistan, it actually works! If all other countries are sorted by value, with Pakistan inserted out of order, it draws our attention.</p><p>In a conventional layout, Pakistan is shoved to the left side in the bottom corner. See below:</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/1552_003.png" class="kg-image" alt="Junkcharts_redo_indiatvlifeexpectancy_2" loading="lazy" title="Junkcharts_redo_indiatvlifeexpectancy_2" width="359" height="288"></figure>
          ]]></content:encoded>
          <description><![CDATA[ Out of line ]]></description>
        </item>
        <item>
          <title><![CDATA[ Interpreting margins of error in tennis calls ]]></title>
          <link>https://www.junkcharts.com/interpreting-margins-of-error-in-tennis-calls/</link>
          <guid isPermaLink="false">68e697c1a6f93c000172b3e0</guid>
          <category><![CDATA[ sports analytics ]]></category>
          <dc:creator><![CDATA[ Kaiser Fung ]]></dc:creator>
          <pubDate>Sun, 08 Jun 2025 20:00:00 -0400</pubDate>
          <content:encoded><![CDATA[
            <style type="text/css">
              * { box-sizing: border-box !important; }
              img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                display: block !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              figure {
                margin: 0 !important;
                padding: 0 !important;
                max-width: 100% !important;
                width: 100% !important;
                overflow: hidden !important;
              }
              .kg-image, .kg-image-card img, .kg-gallery-image img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                object-fit: contain !important;
              }
              .kg-image-card, .kg-gallery-card {
                max-width: 100% !important;
                width: 100% !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              .post-content, .gh-content {
                max-width: 100% !important;
                overflow-x: hidden !important;
              }
              div, p, span {
                max-width: 100% !important;
              }
            </style>
            <p>A commentator of the French Open recently complained that the human line judge made a mistake: "Hawkeye's error is 3 cm and the ball was out by 4 cm. So the line judge is wrong to call it in."</p><hr><p>The commentator got this all wrong. The divergence of opinion should reduce one's confidence in Hawkeye's estimate. Let me explain why.</p><p>Hawkeye's goal when it comes to judging line calls can be stylistically described as determining the center of the landing spot of the tennis ball. It's helpful to first look at what happens without a margin of error. From this estimated location, we draw a ball given the diameter of a tennis ball, and figure out if the ball overlaps with the line on the court. If it doesn't overlap, then the computer decides that the ball is outside the line.</p><p>The idea of a margin of error of 3 cm is visualized as drawing a circle of radius 3 cm around the estimated location. Now, from any point inside this circle, we draw tennis balls as before; and if none of these balls overlap with the line on the court, then the computer decides that the ball is outside the line. By involving the margin of error, we explicitly embrace the uncertainty of estimation. For sure, fewer out calls will be issued relative to the case when we just use a single location estimate.</p><p>The fundamental problem in statistics is that Hawkeye gets one chance (one sample) to get this right. If Hawkeye used a deterministic process, then given the same inputs (videos, etc.), it would always generate the same estimated ball location. In real-world systems, whether it's because of noise in the system, or some stochastic element in Hawkeye's process, the same inputs lead to different estimates. The margin of error describes how much these estimates vary.</p><p>The reported margin of error holds that Hawkeye's estimate is unlikely to be off by more than 3 cm. In other words, that expanded circle of radius 3 cm is expected to capture the "true" center of the ball's landing location.</p><p>The word is "unlikely" rather than "impossible". All margins of error comes with a confidence level; usually it is 95% confidence. This means there is a 5 percent chance that Hawkeye may be off by more than 3 cm from the true location.</p><p>Hawkeye's estimate is not error-free as the commentators assumed, even after allowing for the margin of error.</p><p>I'm curious about the margin of error associated with humans inspecting ball marks on the clay - I suspect it's small. (The error of judging the balls in flight, by contrast, is certainly much higher.)</p><hr><p>At tournaments that use Hawkeye, the players are forbidden from challenging calls. Let's subvert the process and exchange the roles of humans and machines.</p><p>Assume Hawkeye makes the first call, in or out. If the player disagrees with the call, he or she raises a challenge, and the umpire (and/or line judge) goes to inspect the mark on the clay. Now, the umpire's word is final, no complaints allowed.</p><p>As an example, Hawkeye determines that the ball is 2.5 cm outside the line, which is less than 3 cm, thus the machine rules it "in". A player protests. The umpire decides that the mark on the ground is wholly outside the line, and changes the call to out. How will the commentators react?</p><p>If their reaction is not colored by a preference for machines over humans, they will say that the machine has made a mistake - and to accord with their current behavior (in reverse), they should then recommend that the tournament removes line-calling machines because they are not accurate.</p><p>This is an instance in which reversing the players makes clear one's biases.</p><hr><p>If we take a Bayesian view of this, we should combine the evidence. In the first step, we have one estimate. Now, if the second estimate conforms with the first, then the evidence becomes stronger. But if the second estimate contradicts the first, then the evidence weakens. This is why I said at the start that the divergence of opinion causes me to lower my confidence in Hawkeye's estimate.</p><p>Even more, I believe that the human estimate derived from the mark on the ground is more accurate anyway so I'd give that even more weight.</p><p>P.S. Outside of clay courts, the situation is more complicated as there are no ball marks to look at. I'm not against the technology. I'm against the illusion of perfection, and I'm against black-box technology that stifles dissent. Both these issues can be addressed by how technology is applied.</p>
          ]]></content:encoded>
          <description><![CDATA[ When two sources of evidence disagree: pick a winner or combine ]]></description>
        </item>
        <item>
          <title><![CDATA[ Electronic line calling vs ground truth in tennis ]]></title>
          <link>https://www.junkcharts.com/electronic-line-calling-vs-ground-truth-in-tennis/</link>
          <guid isPermaLink="false">68e697c1a6f93c000172b3e1</guid>
          <category><![CDATA[ Big Data ]]></category>
          <dc:creator><![CDATA[ Kaiser Fung ]]></dc:creator>
          <pubDate>Thu, 05 Jun 2025 20:00:00 -0400</pubDate>
          <content:encoded><![CDATA[
            <style type="text/css">
              * { box-sizing: border-box !important; }
              img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                display: block !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              figure {
                margin: 0 !important;
                padding: 0 !important;
                max-width: 100% !important;
                width: 100% !important;
                overflow: hidden !important;
              }
              .kg-image, .kg-image-card img, .kg-gallery-image img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                object-fit: contain !important;
              }
              .kg-image-card, .kg-gallery-card {
                max-width: 100% !important;
                width: 100% !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              .post-content, .gh-content {
                max-width: 100% !important;
                overflow-x: hidden !important;
              }
              div, p, span {
                max-width: 100% !important;
              }
            </style>
            <p>The American commentators at the French Open have been making a fuss over the tournament's decision to favor human line judges over electronic line calling. Their whining centers on two arguments:</p><ul><li>Certain cases in which the electronic call differed from the umpire's decision, for which they claim one of the player was robbed of a point</li><li>The process of letting players dispute close calls wastes too much time</li></ul><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/11271_001.png" class="kg-image" alt="Tennis_electroniclinecalling" loading="lazy" title="Tennis_electroniclinecalling" width="600" height="381" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/11271_001.png 600w"></figure><hr><p>These commentators treat computers as infallible. The attitude is apparent when they say things like "The umpire ruled the ball in even though Hawkeye [a brand name of such technology] says it's out by 4 cm."</p><p>Here, we have two sources of opinion about where the ball landed. Hawkeye issues its opinion, based on collating video images, and some modeling. The umpire's opinion comes primarily from the evidence on the ground (on a clay court, the ball leaves a mark on the ground), aided by the prior call of the line judge. In the view of these commentators, when the two opinions differ, the computer wins.</p><p>In effect, the computer animation is taken as ground truth against which the umpire's calls are evaluated. The computer animation is never wrong. So, it's not the evidence that favors the computer, but a presumption of its superiority.</p><p>And yet, on a clay court, the truth is literally on the ground. The ball leaves a mark on striking the surface. Before the computer age, a player can dispute a close call, the umpire will inspect the mark, and confirm or overturn the line judge's call. I don't see any problems with this arrangement. In fact, it uses the best evidence available - the ground truth.</p><p>Instead of the actual evidence, the commentators prefer a "modeled" truth - the abstract reconstruction of the ball's landing, and when they chastise the umpire for making the wrong call, they effectively invalidate the ground truth.</p><hr><p>What about the time saving argument? The current practice of the umpire inspecting the ground truth takes little time, almost always less than a minute. It may take a little longer if the player makes a scene, even though no umpire is going to take the player's words over their own eyes.</p><p>It's not that computer technology doesn't take time. It would have taken a similar amount of time to watch the animated video of the ball hitting the ground. The reason why electronic calls save time is because electronic line judges are designated as dictators - players are disallowed from contesting any call.</p><p>The time saving comes not from checking the call but from banning challenges! No player can make a scene since no challenges are allowed.</p><p>But, they could have achieved the same result by making line judges dictators. Or, if they want to allow the ground truth to be inspected, then make the umpires dictators. The umpire can be called to inspect the marks, but his/her decision is final, and any player making a scene would get a demerit. That would move the game along, as these commentators seem to want.</p><hr><p>See my previous <a href="https://www.junkcharts.com/illusion-of-perfection/">post</a> about the illusion of perfection in automated line calling.</p>
          ]]></content:encoded>
          <description><![CDATA[ Electronic line calling vs ground truth in tennis ]]></description>
        </item>
        <item>
          <title><![CDATA[ Metrics for evaluating passkeys ]]></title>
          <link>https://www.junkcharts.com/metrics-for-evaluating-passkeys/</link>
          <guid isPermaLink="false">68e697c1a6f93c000172b3e2</guid>
          <category><![CDATA[ Bias ]]></category>
          <dc:creator><![CDATA[ Kaiser Fung ]]></dc:creator>
          <pubDate>Mon, 02 Jun 2025 20:00:00 -0400</pubDate>
          <content:encoded><![CDATA[
            <style type="text/css">
              * { box-sizing: border-box !important; }
              img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                display: block !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              figure {
                margin: 0 !important;
                padding: 0 !important;
                max-width: 100% !important;
                width: 100% !important;
                overflow: hidden !important;
              }
              .kg-image, .kg-image-card img, .kg-gallery-image img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                object-fit: contain !important;
              }
              .kg-image-card, .kg-gallery-card {
                max-width: 100% !important;
                width: 100% !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              .post-content, .gh-content {
                max-width: 100% !important;
                overflow-x: hidden !important;
              }
              div, p, span {
                max-width: 100% !important;
              }
            </style>
            <p>Long-time reader Antonio sent me an <a href="https://www.ghacks.net/2025/05/02/microsoft-68-percent-of-users-who-sign-in-with-passwords-fail/?ref=junkcharts.com">article</a> that has the following intriguing line:</p><blockquote>Microsoft revealed today that 68 percent of all password sign ins fail. In other words, only 32 percent of all Microsoft users manage to sign in when they are prompted to do so when they use passwords.</blockquote><p>This statistics is shocking but this line for me raises more questions than it provides answers.</p><p>The claim that two out of three attempts to use a password failed seems unrealistically high. (The author does not reveal the basis of this number.)</p><p>In my experience, most people don't log out after they log in. They think it's more convenient. Developers also think it's more convenient, and thus many application interfaces actively hide the log-out buttons. These users will only attempt log-ins if they unexpectedly got kicked off. They are unlikely to remember any password.</p><p>Even though I try to log off frequently, I still can't retain all passwords. I tend to forget passwords for those accounts I don't use often. For those who don't log off, it's a miracle if they could remember their passwords. For this segment of users, the password success rate is very low.</p><p>For users like me, who log in and log out constantly, the success rate should be much higher. But I suspect we are a small minority.</p><p>So the 32% password success rate may be explained by selection bias - most of the attempts are made by people who don't care to remember passwords.</p><hr><p>What other group might contribute log-in attempts?</p><p>Of course, malicious actors. I have no idea whether the data analysts filtered those out, or, indeed, whether they are able to differentiate between a legitimate user mistyping a password, or a bad actor trying to guess the password. Let's assume they can't tell those apart. Then, it's not true that 32 percent of the time, legit users failed the password test. The real percentage depends on how much malicious activity there is.</p><hr><p>The article pitches an alternative to passwords, known as passkeys. It makes a further claim:</p><p>Users who sign in with passkeys manage to do so successfully 98 percent of the time.</p><p>Ironically, this statistic makes me nervous. The purpose of user authentication is to stop imposters from entering one's account. Analogously, we put locks on our front doors to prevent strangers from entering our homes.</p><p>If the door lock salesperson boasts that their lock lets in 98% of entry attempts, do you feel convenienced or insecure?</p><p>I feel insecure, because I believe all services face a healthy amount of malicious activities so I expect a lower success rate.</p><hr><p>I was chatting with Perplexity.ai about passkeys, and it offered another statistic - that attacks via stolen passwords have plunged as more users switch to passkeys. No kidding. Since passkeys don't use passwords, bad actors aren't going to need stolen passwords, so password stealing has crashed.</p><p>To properly measure this, the analysts must figure out how malicious actors would adjust their tactics. They can certainly try to steal passkeys, or session tokens. If these developers switched back to passwords, passkey theft would also collapse!</p><hr><p>The point is that it's important to define the right metrics. Passwords and passkeys are security measures, and the highlighted metric concerns convenience, which may be negatively correlated with security.</p>
          ]]></content:encoded>
          <description><![CDATA[ Metrics for evaluating passkeys ]]></description>
        </item>
        <item>
          <title><![CDATA[ The line-angle illusion ]]></title>
          <link>https://www.junkcharts.com/the-line-angle-illusion/</link>
          <guid isPermaLink="false">68e2cf4d04c09e00018ee8fb</guid>
          <category><![CDATA[ Area chart ]]></category>
          <dc:creator><![CDATA[ Kaiser Fung ]]></dc:creator>
          <pubDate>Wed, 28 May 2025 20:00:00 -0400</pubDate>
          <content:encoded><![CDATA[
            <style type="text/css">
              * { box-sizing: border-box !important; }
              img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                display: block !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              figure {
                margin: 0 !important;
                padding: 0 !important;
                max-width: 100% !important;
                width: 100% !important;
                overflow: hidden !important;
              }
              .kg-image, .kg-image-card img, .kg-gallery-image img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                object-fit: contain !important;
              }
              .kg-image-card, .kg-gallery-card {
                max-width: 100% !important;
                width: 100% !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              .post-content, .gh-content {
                max-width: 100% !important;
                overflow-x: hidden !important;
              }
              div, p, span {
                max-width: 100% !important;
              }
            </style>
            <p>In a recent presentation, Prof. Matthias Schonlau explains his "hammock plot." I wrote about it <a href="https://www.junkcharts.com/hammock-plots/">here</a>. During the talk, he used the hammock plot to illustrate an optical illusion found in plots requiring users to compare angular lines, known as the line-angle illusion. (Others prefer the name "sine" illusion.)</p><p>Here is a simple demonstration of the line-angle illusion, extracted from this <a href="https://www.tandfonline.com/doi/figure/10.1080/10618600.2014.951547?ref=junkcharts.com">paper</a>.</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/1551_001.png" class="kg-image" alt="Vanderplas_hofmann_sineillusion" loading="lazy" title="Vanderplas_hofmann_sineillusion" width="884" height="756" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/archives/1551_001.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/1551_001.png 884w" sizes="(min-width: 720px) 720px"></figure><p>Think of the two sine curves as time series, and we're comparing the differences between them. This requires us to assess trend in the vertical distances between the two lines. Weirdly, we perceive the vertical lines on the above chart to have varying lengths, even though they have equal lengths.</p><p>***<br>The link <a href="https://ieeevis.org/year/2024/program/paper_v-short-1081.html?ref=junkcharts.com">here</a> contains an example of how the line-angle illusion can lead to misreading of trends on line charts:</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/1551_002.png" class="kg-image" alt="Sineillusion_twolines" loading="lazy" title="Sineillusion_twolines" width="1200" height="675" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/archives/1551_002.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/archives/1551_002.png 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/1551_002.png 1200w" sizes="(min-width: 720px) 720px"></figure><p>Is there a bigger difference in revenue at Time 1 than Time 2? Many of us will think so but on careful judgment, I think all of us can agree that the difference at Time 2 is in fact larger.</p><p>***</p><p>Much of the interest in a hammock plot lies in the links between the vertical blocks, and this is where the line-angle illustion can distort our perception. Studies have shown that humans tend to read not the vertical gaps but the angular gaps. Again, this issue is illustrated in the first mentioned paper:</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/1551_003.png" class="kg-image" alt="Vanderplas_hofmann_sineillusion_distances" loading="lazy" title="Vanderplas_hofmann_sineillusion_distances" width="980" height="810" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/archives/1551_003.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/1551_003.png 980w" sizes="(min-width: 720px) 720px"></figure><p>Matthias explained that their implementation of the hammock plot uses a strategy to counteract this line-angle illusion.</p><p>I take this to mean they distort the data in such a way that after readers apply the line-angle illusion, the resulting view would convey correctly the correct trend. A kind of double negative strategy. The paper linked above offers one such counter-illusion strategy.</p><p>I imagine this is a bit controversial as we are introducing deliberate distortion to counteract an expected perceptual illusion.</p><p>I'm not aware of any software that offers built-in functions that perform this type of illusion-busting adjustments. Do you know any?</p><p>P.S. [5-30-2025] Andrew Gelman has some comments on this topic on his <a href="https://statmodeling.stat.columbia.edu/2025/05/30/statistical-graphics-when-does-it-make-sense-to-introduce-deliberate-distortion-to-counteract-an-expected-perceptual-illusion/?ref=junkcharts.com#comment-2398043">blog</a>. He said:</p><p>But to get closer to what Kaiser is asking: the analogy I’ve given is, suppose you’re building a wooden chair but using boards that are warped. In this case, the right thing to do is to incorporate the warp into the design, i.e. cut some pieces shorter than others and at different angles, etc., so that they fit together as is, rather than trying to go all rectilinear and then glue/nail everything together. The trouble with the latter strategy is that the wood will exert pressure on the joints and eventually the chair will break or distort itself in some way.</p>
          ]]></content:encoded>
          <description><![CDATA[ We fail to judge the distance between two lines on a chart. ]]></description>
        </item>
        <item>
          <title><![CDATA[ Hammock plots ]]></title>
          <link>https://www.junkcharts.com/hammock-plots/</link>
          <guid isPermaLink="false">68e2cf4d04c09e00018ee8fc</guid>
          <category><![CDATA[ hammock plot ]]></category>
          <dc:creator><![CDATA[ Kaiser Fung ]]></dc:creator>
          <pubDate>Mon, 26 May 2025 20:00:00 -0400</pubDate>
          <content:encoded><![CDATA[
            <style type="text/css">
              * { box-sizing: border-box !important; }
              img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                display: block !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              figure {
                margin: 0 !important;
                padding: 0 !important;
                max-width: 100% !important;
                width: 100% !important;
                overflow: hidden !important;
              }
              .kg-image, .kg-image-card img, .kg-gallery-image img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                object-fit: contain !important;
              }
              .kg-image-card, .kg-gallery-card {
                max-width: 100% !important;
                width: 100% !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              .post-content, .gh-content {
                max-width: 100% !important;
                overflow-x: hidden !important;
              }
              div, p, span {
                max-width: 100% !important;
              }
            </style>
            <p>Prof. Matthias Schonlau gave a presentation about "hammock plots" in New York <a href="https://www.meetup.com/DataVisualization/?ref=junkcharts.com">recently</a>.</p><p>Here is an example of a hammock plot that shows the progression of different rounds of voting during the 1903 papal conclave. (These are taken at the event and thus a little askew.)</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/1550_001.jpg" class="kg-image" alt="Hammockplot_conclave" loading="lazy" title="Hammockplot_conclave" width="600" height="509" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/1550_001.jpg 600w"></figure><p>The chart shows how Cardinal Sarto beat the early favorite Rampolla during later rounds of voting. The chart traces the movement of votes from one round to the next. The Vatican destroys voting records, and apparently, records were unexpectedly retained for this particular conclave.</p><p>The dataset has several features that brings out the strengths of such a plot.</p><p>There is a fixed number of votes, and a fixed number of candidates. At each stage, the votes are distributed across the subset of candidates. From stage to stage, the support levels for candidate shift. The chart brings out the evolution of the vote.</p><p>From the "marginals", i.e. the stacked columns shown at each time point, we learn the relative strengths of the candidates, as they evolve from vote to vote.</p><p>The links between the column blocks display the evolution of support from one vote to the next. We can see which candidate received more votes, as well as where the additional votes came from (or, to whom some voters have drifted).</p><p>The data are neatly arranged in successive stages, resulting in discrete time steps.</p><p>Because the total number of votes are fixed, the relative sizes of the marginals are nicely constrained.</p><p>The chart is made much more readable because of <a href="https://www.junkcharts.com/tag/aggregation/">binning</a>. Only the top three candidates are shown individually with all the others combined into a single category. This chart would have been quite a mess if it showed, say, 10 candidates.</p><p>How precisely we can show the intra-stage movement depends on how the data records were kept. If we have the votes for each person in each round, then it should be simple to execute the above! If we only have the marginals (the vote distribution by candidate) at each round, then we are forced to make some assumptions about which voters switched their votes. We'd likely have to rule out unlikely scenarios, such as that in which all of the previous voters for candidate X switched to someone other candidates while another set of voters switched their votes to candidate X.</p><p>***</p><p>Matthias also showed examples of hammock plots applied to different types of datasets.</p><p>The following chart displays data from course evaluations. Unlike the conclave example, the variables tied to questions on the survey are neither ordered nor sequential. Therefore, there is no natural sorting available for the vertical <a href="https://www.junkcharts.com/tag/axis/">axes</a>.</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/1550_002.jpg" class="kg-image" alt="Hammockplot_evals" loading="lazy" title="Hammockplot_evals" width="700" height="428" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/archives/1550_002.jpg 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/1550_002.jpg 700w"></figure><p><a href="https://www.junkcharts.com/tag/time-series/">Time</a> is a highly useful organizing element for this type of charts. Without such an organizing element, the designer manually customizes an <a href="https://www.junkcharts.com/tag/sorting/">order</a>.</p><p>The vertical <a href="https://www.junkcharts.com/tag/axis/">axes</a> correspond to specific questions on the course evaluation. Students are <a href="https://www.junkcharts.com/tag/aggregation/">aggregated</a> into groups based on the "profile" of grades given for the whole set of questions. It's quite easy to see that opinions are most aligned on the "workload" question while most of the scores are skewed high.</p><p>Missing values are handled by plotting them as a new category at the bottom of each vertical axis.</p><p>This example is similar to the conclave example in that each survey response is categorical, one of five values (plus missing). Matthias also showed examples of hammock plots in which some or all of the variables are numeric data.</p><p>***</p><p>Some of you will see some resemblance of the hammock plot with various similar charts, such as the profile chart, the alluvial chart, the parallel coordinates chart, and Sankey diagrams. Matthias discussed all those as well.</p><p>Matthias has a book out called "Applied Statistical Learning" (<a href="https://www.amazon.com/dp/3031333896/ref=nosim?tag=numrulyouwor-20&ref=junkcharts.com">link</a>).</p><p>Also, there is a Python package for the hammock plot on <a href="https://github.com/TianchengY/hammock_plot?ref=junkcharts.com">github</a>.</p>
          ]]></content:encoded>
          <description><![CDATA[ Matthias Schonlau introduces the hammock plot. ]]></description>
        </item>
        <item>
          <title><![CDATA[ Causal skimming ]]></title>
          <link>https://www.junkcharts.com/causal-skimming/</link>
          <guid isPermaLink="false">68e697c1a6f93c000172b3e3</guid>
          <category><![CDATA[ Assumptions ]]></category>
          <dc:creator><![CDATA[ Kaiser Fung ]]></dc:creator>
          <pubDate>Wed, 21 May 2025 20:00:00 -0400</pubDate>
          <content:encoded><![CDATA[
            <style type="text/css">
              * { box-sizing: border-box !important; }
              img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                display: block !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              figure {
                margin: 0 !important;
                padding: 0 !important;
                max-width: 100% !important;
                width: 100% !important;
                overflow: hidden !important;
              }
              .kg-image, .kg-image-card img, .kg-gallery-image img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                object-fit: contain !important;
              }
              .kg-image-card, .kg-gallery-card {
                max-width: 100% !important;
                width: 100% !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              .post-content, .gh-content {
                max-width: 100% !important;
                overflow-x: hidden !important;
              }
              div, p, span {
                max-width: 100% !important;
              }
            </style>
            <figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/11269_001.jpg" class="kg-image" alt="Bia-frenkel-high-line-sm" loading="lazy" title="Bia-frenkel-high-line-sm" width="500" height="346"></figure><p>The Wall Street Journal reports that all the rich people in New York City are moving downtown. The headline claims "New York's Wealthiest All Want to Live Downtown Now". (<a href="https://www.msn.com/en-us/money/real-estate/new-york-s-wealthiest-all-want-to-live-downtown-now/ar-AA1Fa2D9?ref=junkcharts.com">link</a>) I'm not surprised by this statement, I want to believe it, but as a data scientist, I'd like to know how they prove it.</p><p>I suppose they can run a survey and interview a bunch of rich people. I imagine that might be a tall order, as rich people might not be very responsive. Perhaps they found their way to a wonderful dataset that shows the movement of rich people. That's not out of the realm of the possible. We know that someone was able to trace private planes around the world (<a href="https://www.junkcharts.com/know-your-data-30-everythings-ok-until-it-happens-to-you/">link</a>). I imagine the dataset would be quite small, as uptown/downtown NYC are not the only desirable addresses for rich people. Maybe they have good data on housing inventory. Let's find out.</p><p>There are two classes of relevant evidence. First, I'm looking for evidence of the trend. Second, for any causal explanation they provide, I'd like to see evidence of those linkages.</p><p>***</p><p>Let's skim through this <a href="https://www.msn.com/en-us/money/real-estate/new-york-s-wealthiest-all-want-to-live-downtown-now/ar-AA1Fa2D9?ref=junkcharts.com">article</a>, and I'll mark down each piece of evidence I encounter.</p><blockquote>A financier sold a West Village apartment for $60 million recently, which the couple had purchased for $29 million nearly a decade ago. This is a "record" for downtown Manhattan.</blockquote><p>This is presented as evidence for the trend. What kind of evidence is this?</p><p>It's an anecdote. It's an extreme value of an anecdote. This anecdote uses house prices as a proxy for people movement - it assumes that if house price increases, then a wealthy person has moved from uptown to downtown. In fact, the $60 million record doesn't mean anything unless we also know that the $29 million was not extreme.</p><p>Let's keep going.</p><blockquote>The buyer worked at Jane Street Capital, a quant trading firm, which has recently expanded its headquarters located nearby.</blockquote><p>Already, the journalist has made a causal claim - that the downtown migration trend is due to rich folks wanting to live close to their work. The evidence for that is another anecdote. It'd have been more convincing had Jane Street moved its HQ from uptown but that's not what happened.</p><blockquote>Downtown real estate prices are starting to "rival the city's most expensive uptown enclaves".</blockquote><p>This assertion is not supported by any statistic. We don't know how wide the gap was, and how much it changed. Should there be data, we'd have an idea which districts are considered uptown and which downtown.</p><blockquote>"As finance and tech firms have migrated downtown—alongside new retail, parks, and cultural institutions—a new wave of luxury condo development along the West Side Highway is luring wealthy buyers to neighborhoods like the West Village, Tribeca and Chelsea."</blockquote><p>I left the above as a direct quote. This one citation contains a few more causal assertions, with zero data. Notably, the anecdote of Jane Street Capital should not even count because Jane Street did not "migrate" downtown. In subsequent paragraphs, they mention the names of some of these new developments but they never named any new retail, parks and cultural institutions.</p><p>Next:</p><p>A real-estate agent says that wealthy people are "buying up the West Side Highway".</p><p>Expert testimony. It's possible that these experts know what they are talking about, I'll give them that. Still no data or statistic in sight.</p><p>"There were more $30 million-plus home sales below 34th Street in the past five years than in the previous decade, according to data from Corcoran Sunshine Marketing Group. Since 2023, the area has seen more than $1 billion worth of home sales above $20 million."</p><p>Now arrives the first bit of data. The first sentence compares downtown sales across two periods of time. This can't prove a preference for downtown since we don't know the trend in uptown neighborhoods for those time periods - it's totally reasonable to believe that uptown prices may also have inflated a lot during the last 15 years. The second sentence gives a snapshot value of home sales in one time period, which does little to support the claimed downtown migration trend.</p><p>Since we are talking about economic value stretching back 15 years, those figures ought to have been inflation-adjusted. Another problem is the varying definition of downtown - two paragraphs back, a different sentence mentions "sales and listings below 14th Street".</p><p>Three more sales transactions are cited, two with just the current sales price, and one with both current and prior sales prices.</p><p>This raises the count of anecdotes to four. Since each transaction has a buyer as well as a seller, we'd also need to know where the seller moved to. If buying a home suggests preference for the neighborhood, does selling a home indicate preference for some other district?</p><p>A set of current listings with sky-high listing prices are mentioned.</p><p>Assuming these homes eventually find buyers at the listing prices, they would add to the anecdotal evidence.</p><p>"Downtown has long drawn wealthy buyers".</p><p>I'd label this anti-evidence. This statement undermines the claim of a recent trend.</p><p>The next few paragraphs run down a list of financial and tech firms that have set up shop in Hudson Yards.</p><p>The alleged link between employment and residence is questionable because we are talking about $50 million homes, not $5 million homes. Sure, Google is "one of the weathliest corporations" in the world but how many Google employees can afford a $50 million home?</p><blockquote>The supply of homes for the ultrawealthy is limited downtown. New developments are making larger homes with luxe amenities comparable to those uptown.</blockquote><p>This also works as counter-evidence. Are prices downtown increasing because of shifting patterns of demand, or is it because these homes are larger and have better amenities? This section further muddies the causal picture.</p><p>***</p><p>Let's summarize. The entire article contains one paragraph that cites two statistics, there are a bunch of anecdotes, and there are lots of unsupported story-telling.</p><p>Regarding the existence of a downtown migration trend, all the included evidence is anecdotal. We only learned that some rich people paid a lot of money to buy downtown homes. We don't know where they moved from. We don't know where the sellers moved to. We don't know the trend in real-estate prices uptown, with the unspoken assumption that they did not rise, or did not move as drastically. In the end, we're asked to trust the experts.</p><p>Regarding why rich people are preferring downtown, the main causal explanation is tech and financial firms relocating downtown. It would have been useful to know, for example, what proportion of these workers own $50 million homes. Several other causes are also mentioned, such as larger homes, homes with better amenities, new parks and cultural institutions. All evidence is anecdotal, and we again must just trust the experts.</p>
          ]]></content:encoded>
          <description><![CDATA[ Weathly people are moving downtown, says WSJ ]]></description>
        </item>
        <item>
          <title><![CDATA[ Scrambled egg ]]></title>
          <link>https://www.junkcharts.com/scrambled-egg/</link>
          <guid isPermaLink="false">68e2cf4d04c09e00018ee8fd</guid>
          <category><![CDATA[ Bar chart ]]></category>
          <dc:creator><![CDATA[ Kaiser Fung ]]></dc:creator>
          <pubDate>Thu, 15 May 2025 20:00:00 -0400</pubDate>
          <content:encoded><![CDATA[
            <style type="text/css">
              * { box-sizing: border-box !important; }
              img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                display: block !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              figure {
                margin: 0 !important;
                padding: 0 !important;
                max-width: 100% !important;
                width: 100% !important;
                overflow: hidden !important;
              }
              .kg-image, .kg-image-card img, .kg-gallery-image img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                object-fit: contain !important;
              }
              .kg-image-card, .kg-gallery-card {
                max-width: 100% !important;
                width: 100% !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              .post-content, .gh-content {
                max-width: 100% !important;
                overflow-x: hidden !important;
              }
              div, p, span {
                max-width: 100% !important;
              }
            </style>
            <p>Let's take a look at the central message this chart is aiming to convey: "U.S. egg prices hit a 10-year high in 2025 after avian flu killed 30 million egg-laying birds." (The original is found on <a href="https://www.visualcapitalist.com/charted-what-the-worlds-paying-for-eggs/?ref=junkcharts.com">Visual Capitalist</a>.)</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/1548_001.jpg" class="kg-image" alt="Visualcapitalist_eggs" loading="lazy" title="Visualcapitalist_eggs" width="1280" height="1600" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/archives/1548_001.jpg 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/archives/1548_001.jpg 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/1548_001.jpg 1280w" sizes="(min-width: 720px) 720px"></figure><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/1548_002.png" class="kg-image" alt="_trifectacheckup_image" loading="lazy" title="_trifectacheckup_image" width="241" height="209"></figure><p><br><br><a href="https://www.junkcharts.com/.a/6a00d8341e992c53ef02e8610144e6200d-pi">Using the Trifecta Checkup framework (<a href="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/1548_002.png?ref=junkcharts.com">link</a>), we ask how the data are aligned with this question. What do the data say?</a></p><p>The data give the average egg prices in 41 countries, sorted from highest to lowest, and arranged in a clockwise manner starting from the top.</p><p>The dataset does not address the question posed by the central message.</p><ul><li>With no history, it cannot show that U.S. egg prices is at a 10-year high.</li><li>With no explanatory variables, it cannot say why egg prices have increased in 2025.</li><li>Without context, it cannot address the avian flu.</li><li>The U.S. does not even stand out.</li><li>It also does not show the extreme magnitude of the recent increase in egg price in the U.S.</li></ul><p>Because of this mismatch, the graphic fails to deliver the intended message.</p><p>Notably, the dataset introduces the country dimension, which is unrelated to the central message, but nevertheless interesting. Yet the question of interest isn't the point-in-time comparison. I'd like to know if egg price inflation is a global trend, or an American exclusive. At some point, the inflation will flatten out, although the price of eggs would probably not return to the pre-inflation level. An international comparison across time would bring this insight out clearly.</p><p>***</p><p>Before ending, we'll make a quick stop at the Visual corner of the Trifecta Checkup. Since the designer uses an ellipse to represent the egg, the <a href="https://www.junkcharts.com/tag/bar-chart/">bars</a> sticking out of the ellipse are somewhat distorted. Do the bar lengths encode the data accurately?</p><p>I looked at Brazil vs Italy. The price in Italy $3.97 is basically twice that in Brazil $1.99. But the length of BRA bar is 40% that of the ITA bar.</p><p>Italy and Belgium, shown side by side, have the same egg price to the second decimal place. The bar lengths are not the same.</p><p>This observation suggests that the chart fails my<a href="https://www.junkcharts.com/tag/sufficiency/"> self-sufficiency</a> test. If the entire dataset were not printed on the chart, the reader can't interpret the bars.</p>
          ]]></content:encoded>
          <description><![CDATA[ Scrambled egg ]]></description>
        </item>
        <item>
          <title><![CDATA[ Who pays the tariff? ]]></title>
          <link>https://www.junkcharts.com/who-pays-the-tariff/</link>
          <guid isPermaLink="false">68e697c1a6f93c000172b3e4</guid>
          <category><![CDATA[ Business ]]></category>
          <dc:creator><![CDATA[ Kaiser Fung ]]></dc:creator>
          <pubDate>Wed, 07 May 2025 20:00:00 -0400</pubDate>
          <content:encoded><![CDATA[
            <style type="text/css">
              * { box-sizing: border-box !important; }
              img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                display: block !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              figure {
                margin: 0 !important;
                padding: 0 !important;
                max-width: 100% !important;
                width: 100% !important;
                overflow: hidden !important;
              }
              .kg-image, .kg-image-card img, .kg-gallery-image img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                object-fit: contain !important;
              }
              .kg-image-card, .kg-gallery-card {
                max-width: 100% !important;
                width: 100% !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              .post-content, .gh-content {
                max-width: 100% !important;
                overflow-x: hidden !important;
              }
              div, p, span {
                max-width: 100% !important;
              }
            </style>
            <p>In a previous <a href="https://www.junkcharts.com/doing-tariff-math-right/">post</a>, I covered how tariffs affect the cost of goods sold, and how they should flow into prices. There continues to be myths spread around in the media that a 10% tariff should increase prices by 10%. (For example, <a href="https://www.businessinsider.com/republican-business-owner-tariff-tax-line-item-price-tags-transparency-2025-5?ref=junkcharts.com">this article</a>.)</p><p>I hate to say any vendor raising prices by 10% because of a 10% tariff is doing dirty by the customers.</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/11268_001.jpg" class="kg-image" alt="Drew-beamer-plane-up-sm" loading="lazy" title="Drew-beamer-plane-up-sm" width="251" height="167"></figure><p>Let's review the tariff math again. We were talking about a $1,600 iPhone, for which Apple pays manufacturers $700. The other $900 covers Apple's profits but also a host of other costs like staff, marketing, etc.</p><p>Tariffs are levied on the cost of goods sold so a 10% tariff on $700 adds $70 to Apple's cost. If Apple completely absorbs the tariffs, it would keep the iPhone price at $1,600. In this case, the tariff reduces Apple's profit per phone sold by $70; in effect, Uncle Sam took $70 from Apple for each phone sold. Apple's stock price should fall in order to reflect this transfer of wealth from shareholders to the government, which is why Apple's management isn't going to like it. The net profit margin per phone expressed as a percentage of the selling price is reduced. The total profits earned from all phones sold is also reduced while the number of phones sold should not be affected.</p><p>In case Apple decides to increase the list price by 10% (the rate of tariff), the new selling price is $1,760. Of the extra revenue of $160 Apple gets from selling each phone, only $70 goes to paying the tariff to Uncle Sam. The other $90 represents additional profit extracted from customers. This is because the tariff is levied not on the selling price but on the cost of goods sold.</p><p>What's the impact on Apple's financial metrics? The net profit margin per phone expressed as a percentage of the selling price remains the same as pre-tariff since each component rises proportionally. The total profits earned from all phones sold will also increase because Apple effectively extracts more from each phone - however, this increase is not assured because higher list prices may cause demand to fall. If demand falls, the total profits may drop even if the profit margin is maintained.</p><p>If Apple wanted to pass the full cost of the tariff to the customer, it should increase the price from $1600 to $1670, which is a 4.4% increase, not 10%. In this case, the profit per phone that the vendor earns remains the same since the customers pay the tariffs. However, the net profit margin expressed as a percent of the selling price is reduced because the selling price has increased. The total profits may drop if the increased price causes demand to fall; if demand does not fall, then the total profits stay the same as pre-tariff.</p><p>Why would the vendor want to increase the list price by 10% instead of 4.4%? One possibility is general inflation. The vendor might argue that all other costs like staff, marketing, etc. will also rise as an indirect effect of tariffs. Another possibility is the supply-demand curve. If the 4.4% hike in list price causes a drop in demand, the vendor may try to increase prices further to maintain total profit. This move can trigger demand to drop further, so it's a delicate balance.</p><p>In the end, the key question is if the vendor is willing to share in the suffering. If the vendor insists on earning the same amount of profits post-tariff, then the list price would surely go up and beyond the direct impact of the tariff.</p><p>P.S. [5/8/2024] Minor rewriting of a few sentences for clarity.</p>
          ]]></content:encoded>
          <description><![CDATA[ Who pays the tariff? The seller or the customer? ]]></description>
        </item>
        <item>
          <title><![CDATA[ On the interpretability of log-scaled charts ]]></title>
          <link>https://www.junkcharts.com/on-the-interpretability-of-log-scaled-charts/</link>
          <guid isPermaLink="false">68e2cf4d04c09e00018ee8fe</guid>
          <category><![CDATA[ Axis ]]></category>
          <dc:creator><![CDATA[ Kaiser Fung ]]></dc:creator>
          <pubDate>Sun, 04 May 2025 20:00:00 -0400</pubDate>
          <content:encoded><![CDATA[
            <style type="text/css">
              * { box-sizing: border-box !important; }
              img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                display: block !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              figure {
                margin: 0 !important;
                padding: 0 !important;
                max-width: 100% !important;
                width: 100% !important;
                overflow: hidden !important;
              }
              .kg-image, .kg-image-card img, .kg-gallery-image img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                object-fit: contain !important;
              }
              .kg-image-card, .kg-gallery-card {
                max-width: 100% !important;
                width: 100% !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              .post-content, .gh-content {
                max-width: 100% !important;
                overflow-x: hidden !important;
              }
              div, p, span {
                max-width: 100% !important;
              }
            </style>
            <p>A previous <a href="https://www.junkcharts.com/logging-a-sleight-of-hand/">post</a> featured the following chart showing stock returns over <a href="https://www.junkcharts.com/tag/time-series/">time</a>:</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/1547_001.png" class="kg-image" alt="Gelman_overnightreturns_tsla" loading="lazy" title="Gelman_overnightreturns_tsla" width="257" height="259"></figure><p>Unbeknownst to readers,  the chart plots one thing but labels it something else.</p><p>The designer of the chart explains how to read the chart in a separate note, which I included in my previous post (<a href="https://www.junkcharts.com/logging-a-sleight-of-hand/">link</a>). It's a crucial piece of information. Before reading his explanation, I didn't realize the sleight of hand: he made a chart with one time series, then substituted the y-axis labels with another set of values.</p><p>As I explored this design choice further, I realize that it has been widely adopted in a common chart form, without fanfare. I'll get to it in due course.</p><p>***</p><p>Let's start our journey with as simple a chart as possible. Here is a <a href="https://www.junkcharts.com/tag/line-chart/">line chart</a> showing constant growth in the revenues of a small business:</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/1547_002.png" class="kg-image" alt="Junkcharts_dollarchart_origvalues" loading="lazy" title="Junkcharts_dollarchart_origvalues" width="275" height="265"></figure><p>For all the charts in this post, the horizontal <a href="https://www.junkcharts.com/tag/axis/">axis</a> depicts time (x = 0, 1, 2, ...). To simplify further, I describe discrete time steps although nothing changes if time is treated as continuous.</p><p>The vertical <a href="https://www.junkcharts.com/tag/scale/">scale</a> is in dollars, the original units. It's conventional to modify the scale to units of thousands of dollars, like this:</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/1547_003.png" class="kg-image" alt="Junkcharts_dollarchart_thousands" loading="lazy" title="Junkcharts_dollarchart_thousands" width="263" height="247"></figure><p>No controversy arises if we treat these two charts as identical. Here I put them onto the same plot, using dual <a href="https://www.junkcharts.com/tag/axis/">axes</a>, emphasizing the one-to-one correspondence between the two scales.</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/1547_004.png" class="kg-image" alt="Junkcharts_dollarchart_dualaxes" loading="lazy" title="Junkcharts_dollarchart_dualaxes" width="280" height="256"></figure><p>We can do the same thing for two <a href="https://www.junkcharts.com/tag/time-series/">time series</a> that are linearly related. The following chart shows constant growth in temperature using both Celsius and Fahrenheit <a href="https://www.junkcharts.com/tag/scale/">scales</a>:</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/1547_010.png" class="kg-image" alt="Junkcharts_tempchart_dualaxes" loading="lazy" title="Junkcharts_tempchart_dualaxes" width="272" height="268"></figure><p>Here is the chart displaying only the Fahrenheit axis:</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/1547_006.png" class="kg-image" alt="Junkcharts_tempchart_fahrenheit" loading="lazy" title="Junkcharts_tempchart_fahrenheit" width="278" height="268"></figure><p>This chart admits two interpretations: (A) it is a chart constructed using F values directly and (B) it is a chart created using C values, after which the axis labels were replaced by F values. Interpretation B implements the sleight of hand of the log-returns plot. The issue I'm wrestling with in this post is the utility of interpretation B.</p><p>Before we move to our next stop, let's stipulate that if we are exposed to that Fahrenheit-scaled chart, either interpretation can apply; readers can't tell them apart.</p><p>***</p><p>Next, we look at the following <a href="https://www.junkcharts.com/tag/line-chart/">line chart</a>:</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/1547_007.png" class="kg-image" alt="Junkcharts_trendchart_y" loading="lazy" title="Junkcharts_trendchart_y" width="247" height="237"></figure><p><br><br><a href="https://www.junkcharts.com/.a/6a00d8341e992c53ef02e861005a2f200d-pi">Notice the vertical <a href="https://www.junkcharts.com/tag/axis/">axis</a> uses a log10 <a href="https://www.junkcharts.com/tag/scale/">scale</a>. We know it's a log scale because the equally-spaced tickmarks represent different jumps in value: the first jump is from 1 to 10, the next jump is from 10, not to 20, but to 100.</a></p><p>Just like before, I make a dual-<a href="https://www.junkcharts.com/tag/axis/">axes</a> version of the chart, putting the log Y values on the left axis, and the original Y values on the right axis.</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/1547_008.png" class="kg-image" alt="Junkcharts_trendchart_dualaxes" loading="lazy" title="Junkcharts_trendchart_dualaxes" width="253" height="227"></figure><p><br>By convention, we often print the original values as the axis labels of a log chart. Can you recognize that sleight of hand? We make the chart using the log values, after which we replace the log value labels with the original value labels. We adopt this graphical trick because humans don't think in log units, thus, the log value labels are less "interpretable".</p><p>As with the temperature chart, we will attempt to interpret the chart two ways. I've already covered interpretation B. For interpretation A, we regard the line chart as a straightforward plot of the values shown on the right axis (i.e., the original values). Alas, this viewpoint fails for the log chart.</p><p>If the original data are plotted directly, the chart should look like this:</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/1547_009.png" class="kg-image" alt="Junkcharts_trendchart_y_origvalues" loading="lazy" title="Junkcharts_trendchart_y_origvalues" width="260" height="245"></figure><p>It's not a straight line but a curve.</p><p>What have I just shown? That, after using the sleight of hand, we cannot interpret the chart <em>as if</em> it were directly plotting the data expressed in the original scale.</p><p>To nail down this idea, we ask a basic question of any chart showing trendlines. What's the rate of change of Y?</p><p>Using the transformed log scale (left axis), we find that the rate of change is 1 unit per unit time. Using the original scale, the rate of change from t=1 to t=2 is (100-10)/1 = 90 units per unit time; from t=2 to t=3, it is (1000-100)/1 = 900 units per unit time. Even though the rate of change varies by time step, the log chart using original value labels sends the misleading picture that the rate of change is constant over time (thus a straight line). The decision to substitute the log value labels backfires!</p><p>This is one reason why I use log charts sparingly. (I do like them a lot for exploratory analyses, but I avoid using them as presentation graphics.) This issue of interpretation is why I dislike the sleight of hand used to produce those log stock returns charts, even if the designer offers a note of explanation.</p><p>Do we gain or lose "interpretability" when we substitute those axis labels?</p><p>***</p><p>Let's re-examine the dual-axes temperature chart, building on what we just learned.</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/1547_010.png" class="kg-image" alt="Junkcharts_tempchart_dualaxes" loading="lazy" title="Junkcharts_tempchart_dualaxes" width="272" height="268"></figure><p>The above chart suggests that whichever scale (axis) is chosen, we get the same line, with the same steepness. Thus, the rate of change is the same regardless of scale. This turns out to be an illusion.</p><p>Using the left axis, the slope of the line is 10 degrees Celsius per unit time. Using the right axis, the slope is 18 degrees Fahrenheit per unit time. 18 F is different from 10 C, thus, the slopes are not really the same! The rate of change of the temperature is given algebraically by the slope, and visually by the steepness of the line. Since two different slopes result in the same line steepness, the visualization conveys a lie.</p><p>This situation here is a bit better than that in the log chart. Here, in either scale, the rate of change is constant over time. Differentiating the temperature conversion formula, we find that the slope of the Fahrenheit line is always 9/5*the slope of the Celsius line. So a rate of 10 Celsius per unit time corresponds to 18 Fahrenheit per unit time.</p><p>What if the chart is presented with only the Fahrenheit axis labels although it is built using Celsius data? Since readers only see the F labels, the observed slope is in Fahrenheit units. Meanwhile, the chart creator uses Celsius units. This discrepancy is harmless for the temperature chart but it is egregious for the log chart. The underlying reason is the nonlinearity of the log transform - the slope of log Y vs time is not proportional to the slope of Y vs time; in fact, it depends on the value of Y.</p><p>***</p><p>The log chart is a sacred cow of scientists, a symbol of our sophistication. Are they as potent as we'd think? In particular, when we put original data values on the log chart, are we making it more intepretable, or less?</p><p>P.S. I want to tie this discussion back to my <a href="https://bitly.com/trifectacheckup?ref=junkcharts.com">Trifecta Checkup</a> framework. The design decision to substitute those axis labels is an example of an act that moves the visual (V) away from the data (D). If the log units were printed, the visual makes sense; when the original units were dropped in, the visual no longer conveys features of the data - the reader must ignore what the eyes are seeing, and focus instead on the brain's perspective.</p>
          ]]></content:encoded>
          <description><![CDATA[ Are log charts using original value labels more interpretable? ]]></description>
        </item>
        <item>
          <title><![CDATA[ Illusion of perfection ]]></title>
          <link>https://www.junkcharts.com/illusion-of-perfection/</link>
          <guid isPermaLink="false">68e697c1a6f93c000172b3e5</guid>
          <category><![CDATA[ Data ]]></category>
          <dc:creator><![CDATA[ Kaiser Fung ]]></dc:creator>
          <pubDate>Wed, 30 Apr 2025 20:00:00 -0400</pubDate>
          <content:encoded><![CDATA[
            <style type="text/css">
              * { box-sizing: border-box !important; }
              img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                display: block !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              figure {
                margin: 0 !important;
                padding: 0 !important;
                max-width: 100% !important;
                width: 100% !important;
                overflow: hidden !important;
              }
              .kg-image, .kg-image-card img, .kg-gallery-image img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                object-fit: contain !important;
              }
              .kg-image-card, .kg-gallery-card {
                max-width: 100% !important;
                width: 100% !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              .post-content, .gh-content {
                max-width: 100% !important;
                overflow-x: hidden !important;
              }
              div, p, span {
                max-width: 100% !important;
              }
            </style>
            <p>Automated detection technology has been invading the sports world. From the "out" call in tennis to the "offside" call in football (soccer), technology is gaining traction and replacing human judges who have traditionally made these calls.</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/11267_001.jpg" class="kg-image" alt="Hawk-eye-tennis" loading="lazy" title="Hawk-eye-tennis" width="103" height="73"></figure><p>Here is an example of an "out" call in tennis. The technology collates video footage to reconstruct the landing location of the ball. The simulation is animated and shown to spectators as evidence of an "out" call.</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/11267_002.jpg" class="kg-image" alt="Skysports-jesse-lingard-offside" loading="lazy" title="Skysports-jesse-lingard-offside" width="182" height="103"></figure><p>In football, a scoring player is judged "offside" if s/he is positioned behind the last defender (excluding the other team's goalie) when s/he receives the ball from a teammate. A goal is annulled if it is scored from an "offside" position. The spirit of the offside rule is to disencourage teams from parking an attacker in front of goal at all times. The "offside" call has an outsized impact on match results since football is a low-scoring game. It's a hard call, requiring accounting for the positions of three players - the attacker who scored the goal, the player from whom s/he received the ball, and the last defender. We freeze frame the moment the pass to the attacker is made, then a "calibrated line" is drawn to show the edge position of the last defender, finally, the attacker is called offside if any part of his/her body is over the line.</p><p>Defenders often move up and down as a unit, and so the positions of multiple defenders may need to be examined to find the last defender. Unlike the tennis line calls, in which the ball is found next to the line, there may be considerable distance separating the scorer from the last defender.</p><p>***</p><p>The advantages of using technology are clear. Technology applies a consistent, repeatable, testable process to come to decisions; and thus can be considered fairer. Vendors suggest that technology is also more "accurate". For example, the creator of Hawk Eye technology claims that a tennis ball's location can be determined to within 4 millimetres (the ball is about 67 millimetres in diameter). [link]</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/11267_003.png" class="kg-image" alt="Zverev-hawkeye" loading="lazy" title="Zverev-hawkeye" width="250" height="285"></figure><p>Nonetheless, some players have doubts about its accuracy. A few days ago, the German tennis star, Alexander Zverev, questioned a ball judged to be "in" by the automated detection technology. He pulled out his phone, mid-game, to take a picture of the mark on the ground, and later posted it to social media.</p><p>Because the tournament is played on clay, each ball leaves a mark on the ground. So in theory, every call made by the technology can be checked against ground truth. Zverev offered up an example of an error as the mark showed clearly that ball landed out.</p><p>This type of reality check is not possible on surfaces other than clay.</p><p>***</p><p>How does the margin of error factor into these calls?</p><p>The short answer appears to be it doesn't. The margin of error is treated like it is in a poll, and not like in air travel.</p><p>The animation shows a single ball location, without indicating the margin of error. Think of this circle as the technology's guessestimate for the true landing spot of the ball (such as that given by Zverev's snapshot). The "out" decision is based on the estimate while the margin of error adds information without threatening the decision. That's also how poll results are reported in the news media. Trump's approval rating would be reported as under 50%, even if it were at 49% with a margin of error of 3%.</p><p>In other scenarios, decision-makers let the information about the margin of error change their decision. Think about how airlines show the flight time to fly from one city to another. We all know it's heavily "padded". An hour-long flight may be shown as requiring two hours. One can use the margin of error around the average flight time to decide how much extra time to add. One would add some mutliple of that margin, calibrated to achieve a certain probability of coverage.</p><p>When applied to the tennis calls, one can add some multiple of the margin of error as another circle that envelopes the inferred ball location. If any part of the augmented ball touches the line, the ball is considered "in".</p><p>***</p><p>How to accommodate measurement uncertainty also troubles offside calls in football. Watching a bit of football lately, I am amazed at how many goals are overturned because the VAR says a finger or a toe was in an "offside" position.</p><p>Here is an example:</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/11267_004.png" class="kg-image" alt="Gol-annullato-a-kean-per-fuorigioco-millimetrico" loading="lazy" title="Gol-annullato-a-kean-per-fuorigioco-millimetrico" width="228" height="173"></figure><p>A goal was scored, but the striker was judged offside because according to the VAR, part of his ankle was beyond the last defender.</p><p>It's hard for me to believe that the VAR technology is capable of making such fine distinctions. Remember we are looking at a reconstruction of the scene, a composite image formed by fusing many images from different cameras. There are no sensors pasted to the fingers or toes of these players.</p><p>For me, these toetip decisions don't align with the spirit of the offside rule. Some type of margin of error should be incorporated into the decision rule. How much depends on whether they want more scoring or less.</p>
          ]]></content:encoded>
          <description><![CDATA[ The illusion of perfection ignores statistical uncertainty ]]></description>
        </item>
        <item>
          <title><![CDATA[ Charging more for less, and less for more ]]></title>
          <link>https://www.junkcharts.com/charging-more-for-less/</link>
          <guid isPermaLink="false">68e697c1a6f93c000172b3e6</guid>
          <category><![CDATA[ Analytics-business interaction ]]></category>
          <dc:creator><![CDATA[ Kaiser Fung ]]></dc:creator>
          <pubDate>Tue, 29 Apr 2025 20:00:00 -0400</pubDate>
          <content:encoded><![CDATA[
            <style type="text/css">
              * { box-sizing: border-box !important; }
              img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                display: block !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              figure {
                margin: 0 !important;
                padding: 0 !important;
                max-width: 100% !important;
                width: 100% !important;
                overflow: hidden !important;
              }
              .kg-image, .kg-image-card img, .kg-gallery-image img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                object-fit: contain !important;
              }
              .kg-image-card, .kg-gallery-card {
                max-width: 100% !important;
                width: 100% !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              .post-content, .gh-content {
                max-width: 100% !important;
                overflow-x: hidden !important;
              }
              div, p, span {
                max-width: 100% !important;
              }
            </style>
            <figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/11266_001.jpg" class="kg-image" alt="Tonia-kraakman-ice-sm" loading="lazy" title="Tonia-kraakman-ice-sm" width="330" height="186"></figure><p>There is this coffee shop near me that makes good coffee and provides a nice environment for doing some work. The other day I ordered a small iced latte with less ice.</p><p>Oops, I forgot about their $0.75 fee for "less ice". The small drink rings up to $6.80. It's $5.50 for the hot latte, $0.50 extra for ice, and $0.75 extra for less ice.</p><p>One time, I asked them why they charge more for less ice. They said less ice means they have to give me more coffee.</p><p>I don't know why that sounded sensible at the time. Because it doesn't make sense.</p><p>I should have asked them why they charge $0.50 for ice in the first place. By their own logic, the iced coffee has less coffee than the hot coffee, thus shouldn't there be a deduction for making it iced?</p>
          ]]></content:encoded>
          <description><![CDATA[ Charging more for less, and less for more ]]></description>
        </item>
        <item>
          <title><![CDATA[ Logging a sleight of hand ]]></title>
          <link>https://www.junkcharts.com/logging-a-sleight-of-hand/</link>
          <guid isPermaLink="false">68e2cf4d04c09e00018ee8ff</guid>
          <category><![CDATA[ Axis ]]></category>
          <dc:creator><![CDATA[ Kaiser Fung ]]></dc:creator>
          <pubDate>Sun, 20 Apr 2025 20:00:00 -0400</pubDate>
          <content:encoded><![CDATA[
            <style type="text/css">
              * { box-sizing: border-box !important; }
              img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                display: block !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              figure {
                margin: 0 !important;
                padding: 0 !important;
                max-width: 100% !important;
                width: 100% !important;
                overflow: hidden !important;
              }
              .kg-image, .kg-image-card img, .kg-gallery-image img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                object-fit: contain !important;
              }
              .kg-image-card, .kg-gallery-card {
                max-width: 100% !important;
                width: 100% !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              .post-content, .gh-content {
                max-width: 100% !important;
                overflow-x: hidden !important;
              }
              div, p, span {
                max-width: 100% !important;
              }
            </style>
            <p>Andrew puts up an interesting chart submitted by one of his readers (<a href="https://statmodeling.stat.columbia.edu/2025/04/19/for-15-years-tesla-stock-has-been-edging-down-during-the-day-and-shooting-up-overnight/?ref=junkcharts.com#comment-2396042">link</a>):</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/1546_001.png" class="kg-image" alt="Gelman_overnightreturns_tsla" loading="lazy" title="Gelman_overnightreturns_tsla" width="1016" height="1024" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/archives/1546_001.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/archives/1546_001.png 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/1546_001.png 1016w" sizes="(min-width: 720px) 720px"></figure><p>Bruce Knuteson who created this chart is pursuing a theory that there is some fishy going on in the stock markets over night (i.e. between the close of one day and the open of the next day). He split the price data into two interleaving parts: the blue <a href="https://www.junkcharts.com/tag/line-chart/">line</a> represents returns overnight and the green line represents returns intraday (from open of one day to the close of the same day). In this example related to Tesla's stock, the overnight "return" is an eyepopping 36850% while the intraday "return" is -46%.</p><p>This is an example of an average masking interesting details in the data. One typically looks at the entire sequence of values at once, while this analysis breaks it up into two subsequences. I'll write more about the data analysis at a later point. This post will be purely about the visualization.</p><p>***</p><p>It turns out that while the chart looks like a standard <a href="https://www.junkcharts.com/tag/time-series/">time series</a>, it isn't. Bruce wrote out the following essential explanation:</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/1546_002.png" class="kg-image" alt="Gelman_overnightreturns" loading="lazy" title="Gelman_overnightreturns" width="1536" height="827" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/archives/1546_002.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/archives/1546_002.png 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/1546_002.png 1536w" sizes="(min-width: 720px) 720px"></figure><p>The chart can't be interpreted without first reading this note.</p><p>The left chart (a) is the standard time-series chart we're thinking about. It plots the relative cumulative percentage change in the value of the investment over time. Imagine one buys $1 of Apple stock on day 1. It shows the cumulative return on day X, expressed as a percent relative to the initial investment amount. As mentioned above, the data series was split into two: the intraday return series (green) is dwarfed by the overnight return series (blue), and is barely visiable hugging the horizontal axis.</p><p>Almost without thinking, a graphics designer applies a log transform to the vertical <a href="https://www.junkcharts.com/tag/axis/">axis</a>. This has the effect of "taming" the extreme values in the blue line. This is the key design change in the middle chart (b). The other change is to switch back to absolute values. The day 1 number is now $1 so the day X number shows the cumulative value of the investment on day X if one started with $1 on day 1.</p><p>There's a reason why I emphasized the log transform over the switch to absolute values. That's because the relationship between absolute and relative values here is a linear one. If y(t) is the absolute cumulative value of $1 at time t, then the percent change r(t) = 100(y(t) -1). (Note that y(0) = 1 by definition.)  The shape of the middle chart is primarily conditioned by the log transform.</p><p>In the right chart (c), which is the design that Bruce features in all his work, the visual elements of chart (b) are retained while he replaced the vertical <a href="https://www.junkcharts.com/tag/text/">axis labels</a> with those from chart (a). In other words, the lines show the cumulative absolute values while the labels show the relative cumulative percent returns.</p><p>I left this note on Gelman's blog (corrected a mislabeling of the chart indices):</p><p>I'm interested in the the sleight of hand related to the plots, also tying this back to the recent post about log scales. In plot (b) (a) [middle of the panel], he transformed the data to show the cumulative value of the investment assuming one puts $1 in the stock on day 1. He applied a log scale on the vertical axis. This is fine. Then in plot (c) (b), he retained the chart but changed the vertical axis labels so instead of absolute value of the investment, he shows percent changes relative to the initial value.</p><p>Why didn't he just plot the relative percent changes? Let y(t) be the absolute values and r(t) = the percent change = 100*(y(t) -1) is a simple linear transformation of y(t). This is where the log transform creates problems! The y(t) series is guaranteed to be positive since hitting y(t) = 0 means the entire investment is lost. However, the r(t) series can hit negative values and also cross over zero many times over time. Thus, log r(t) is inoperable. The problem is using the log transform for data that are not always positive, and the sleight of hand does not fix it!</p><p>Just pick any day in which the absolute return fell below $1, e.g. the last day of the plot in which the absolute value of the investment was down to $0.80. In the middle plot (b), the value depicted is ln(0.8) = -0.22. Note that the plot is in log <a href="https://www.junkcharts.com/tag/scale/">scale</a>, so what is labeled as $1 is really ln(1) = 0. If we instead try to plot the relative percent changes, then the day 1 number should be ln(0) which is undefined while the last number should be ln(-20%) which is also undefined.</p><p>This is another example of something umcomfortable about using log scales which I pointed out in this <a href="https://www.junkcharts.com/swarmed-by-ants/">post</a>. It's this idea that when we do log plots, we can freely substitute axis labels which are not directly proportional to the actual labels. It's plotting one thing, and labelling it something else. These labels are then disconnected from the visual encoding. It's against the goal of visualizing data.</p>
          ]]></content:encoded>
          <description><![CDATA[ Breaking down a sleight of hand when using log transforms ]]></description>
        </item>
        <item>
          <title><![CDATA[ Doing tariff math right ]]></title>
          <link>https://www.junkcharts.com/doing-tariff-math-right/</link>
          <guid isPermaLink="false">68e697c1a6f93c000172b3e7</guid>
          <category><![CDATA[ Analytics-business interaction ]]></category>
          <dc:creator><![CDATA[ Kaiser Fung ]]></dc:creator>
          <pubDate>Mon, 14 Apr 2025 20:00:00 -0400</pubDate>
          <content:encoded><![CDATA[
            <style type="text/css">
              * { box-sizing: border-box !important; }
              img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                display: block !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              figure {
                margin: 0 !important;
                padding: 0 !important;
                max-width: 100% !important;
                width: 100% !important;
                overflow: hidden !important;
              }
              .kg-image, .kg-image-card img, .kg-gallery-image img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                object-fit: contain !important;
              }
              .kg-image-card, .kg-gallery-card {
                max-width: 100% !important;
                width: 100% !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              .post-content, .gh-content {
                max-width: 100% !important;
                overflow-x: hidden !important;
              }
              div, p, span {
                max-width: 100% !important;
              }
            </style>
            <p>The Trump tariff mayhem has spotlighted innumeracy in the U.S. media.</p><p>I'm not talking about the numbers coming from the U.S. administration - I didn't care to write about those because they are intended for laughs.</p><p>Unfortunately, the math emanating from the other side is also riddled with errors. I'll focus on one particular talking point that's been bugging me.</p><p>***</p><p>Many journalists want to tell us the dire effect of Trump's 25%, 125%, 145%, etc. tariffs on the prices U.S. consumers pay for common purchases. Here is a typical example from CNET (<a href="https://www.cnet.com/personal-finance/taxes/how-much-could-tariffs-increase-iphone-prices-we-do-the-math/?ref=junkcharts.com">link</a>), which focuses on Apple's iphones. The key sentence is this:</p><p>If Apple passed the China tariff costs on to customers, the iPhone 16 Pro Max with 1TB of storage could increase from $1,599 to nearly $3,600 -- assuming that the previously imposed 20% tariff was already incorporated into the current price.</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/11265_001.jpg" class="kg-image" alt="Sam-grozyan-iphone16-sm" loading="lazy" title="Sam-grozyan-iphone16-sm" width="439" height="329"></figure><p><br></p><p>3600/1599 is 2.25 so they added 125% on top of $1,599, suggesting that Apple pays about $2,000 of tariff for each iPhone. This calculation sounds reasonable but that's not how the math works.</p><p>A tariff is an import tax that importers are required to pay to the U.S. government in order to accept products from overseas. Apple imports the iPhone from a Chinese manufacturer, and pays any tariffs. The tariff is based on the value of the imported good, in this case, it is the cost that the manufacturer charges Apple for each iPhone. And that cost is much lower than the retail price of the iPhone.</p><p>The people at this website (<a href="https://www.simplymac.com/iphone/how-much-does-it-cost-to-make-an-iphone?ref=junkcharts.com">link</a>) estimates that Apple pays the manufacturer about $700 for each iPhone 16 Pro Max. Thus, a 125% tariff on this cost is $875, much smaller than the $2000 from the prior calculation.</p><p>If Apple passed the entire tariff onto the customer, the new retail price should be $1,599+875 = $2,474, which is a 55% markup, much lower than 125%.</p><p>***</p><p>What would Apple do?</p><p>Charging the entire tariff onto the customer preserves Apple's profit per unit sold. Before the additional tariffs, the (gross) profit per unit was $1599-$700 = $899. After the price hike, the profit per unit would be $2,474-$(700+875) = $899. This just embodies the idea of passing the pain to the customer.</p><p>It's hard to imagine that Apple would do such a thing. The problem is a 55% price hike might cause demand to collapse. So while Apple would still earn $899 per unit, the total profit would plunge as the number of units sold would drop sharply. (For those paying attention, the gross profit margin, expressed as a percent of revenues, would be hit hard even though the profit per unit is unchanged.)</p><p>On the other extreme, Apple might absorb the entire tariff. In this case, the new profit per unit would be $1,599-$(700+875)=$24. In this case, iPhones would immediately turn from a driver of huge profits to a money-losing product overnight. (Apple share price should crater if this materialized.)</p><p>The $24 is not pure profit for Apple. It goes into paying for all of the overhead costs, like marketing, brand advertising, engineering, R&amp;D, administrative costs, etc. There surely would be nothing left for pure profit.</p><p>This shows the insanity of this magnitude of tariffs, and why the business community was so alarmed by it. (Apple's profit margin is very fat by industry standard; imagine you're selling trinkets for laser-thin profits.)</p><p>The number of units sold would at best stay constant, but more likely, it would also drop because the general economic outlook has soured. The drop in this case would be much less severe because the retail price did not change.</p><p>***<br>Realistically, Apple would have to pass some of the tariffs on to consumers while also suffering a decline in profits.</p><p>What happened to the $875 tariff? It went to the U.S. government. So effectively, the tariff functions as a massive corporate tax hike. Apple has been forced to transfer a huge chunk of profits from shareholders to the U.S. government.</p><p>The tariff rate is so high that Apple won't be able to pay them in full from existing profits so part of the tariffs also is a tax on Apple customers through price hikes. The price hike would not be as high as the tariff rate. It's definitely not going to be 125%.</p>
          ]]></content:encoded>
          <description><![CDATA[ The media can&#39;t get the tariff math right ]]></description>
        </item>
        <item>
          <title><![CDATA[ The new NYC composting law ]]></title>
          <link>https://www.junkcharts.com/the-new-nyc-composting-law/</link>
          <guid isPermaLink="false">68e697c1a6f93c000172b3e8</guid>
          <category><![CDATA[ Behavior ]]></category>
          <dc:creator><![CDATA[ Kaiser Fung ]]></dc:creator>
          <pubDate>Thu, 10 Apr 2025 20:00:00 -0400</pubDate>
          <content:encoded><![CDATA[
            <style type="text/css">
              * { box-sizing: border-box !important; }
              img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                display: block !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              figure {
                margin: 0 !important;
                padding: 0 !important;
                max-width: 100% !important;
                width: 100% !important;
                overflow: hidden !important;
              }
              .kg-image, .kg-image-card img, .kg-gallery-image img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                object-fit: contain !important;
              }
              .kg-image-card, .kg-gallery-card {
                max-width: 100% !important;
                width: 100% !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              .post-content, .gh-content {
                max-width: 100% !important;
                overflow-x: hidden !important;
              }
              div, p, span {
                max-width: 100% !important;
              }
            </style>
            <figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/11264_001.jpg" class="kg-image" alt="Nycomposting" loading="lazy" title="Nycomposting" width="1692" height="1288" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/archives/11264_001.jpg 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/archives/11264_001.jpg 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1600/archives/11264_001.jpg 1600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/11264_001.jpg 1692w" sizes="(min-width: 720px) 720px"></figure><p>On April 1, New York City rolled out a composting law for residential buildings. Residents must separate compostables such as soiled paper and food remnants, and place them in special compost bins. Building owners will get fined if they don't follow the law.</p><p>Doing my part, I started putting things in a compost bag. I didn't expect the learning curve. For each piece of soiled material, I have to figure out if it's paper or plastic, or both; and whether it belongs to the compost bag or not.</p><p>Then I wondered how the inspectors decide what constitutes a violation.</p><p>Do they consider both the following offenses?</p><ul><li>Putting non-compostable items into the compost bag (false positives)</li><li>Putting compostable items into common trash (false negatives)</li></ul><p>***</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/11264_002.jpg" class="kg-image" alt="_nryw_bookcover" loading="lazy" title="_nryw_bookcover" width="800" height="1288" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/archives/11264_002.jpg 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/11264_002.jpg 800w" sizes="(min-width: 720px) 720px"></figure><p>This decision problem is similar to those covered in Chapter 4 of <strong>Numbers Rule Your World (<a href="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/11264_002.jpg?ref=junkcharts.com">link</a>)</strong>.</p><p>One consideration is the potential harm of each type of error. If having non-compostables ruins the compost process, then the cost of false positives is high. The cost of false negatives is a smaller amount of compost, which feels like a lesser harm, given the prior state of no composting.</p><p>Another consideration is prevalence. That in turn depends on what proportion of household waste is compostable.</p><p>Textbook analysis then takes these two factors and create an expected value of harm. In my book, I bring up other important considerations.</p><p>Yet another consideration is the cost of inspection (thus, the discoverability of the error). Looking for false positives involves digging into compost bags while looking for false negatives requires searching through the common trash. The latter seems far more onerous; I doubt they would spend the time to do it.</p><p>If the inspectors focus on uncovering false positives, then those errors become more visible. In my book, I illustrate this using steroids testing of elite athletes. A false negative is when a doper receives a clean test. Will we ever find out about a false negative? Is the doper going to announce that the anti-doping lab screwed up?</p><p>Because the testers don't get into trouble for missing dopers but do get a lot of negative publicity for falsely accusing an athlete of doping, one should expect that they should err on the side of minimizing false positives, which leads to more false negatives. In the case of composting, if those inspectors are more concerned about false positives, then we may get away with throwing some compostable items into the common trash.</p><p>Given these considerations, I'd err on the side of more false negatives. That is to say, I should put things into the compost bag only if I'm highly certain they are compostable.</p><p>I'm not really sure how the inspection works. Anyone who knows, please make a comment.</p><p>***</p><p>In the meantime, the city has already issued thousands of violations (<a href="https://nypost.com/2025/04/10/us-news/city-issues-2-5k-compost-tickets-in-first-10-days-of-new-law-as-landlords-gripe/?ref=junkcharts.com">link</a>), but the penalty is laughably small ($25-100 per ticket).</p>
          ]]></content:encoded>
          <description><![CDATA[ The NYC composting law went into effect ]]></description>
        </item>
        <item>
          <title><![CDATA[ The message left the visual ]]></title>
          <link>https://www.junkcharts.com/the-message-left-the-visual/</link>
          <guid isPermaLink="false">68e2cf4d04c09e00018ee900</guid>
          <category><![CDATA[ Bar chart ]]></category>
          <dc:creator><![CDATA[ Kaiser Fung ]]></dc:creator>
          <pubDate>Mon, 07 Apr 2025 20:00:00 -0400</pubDate>
          <content:encoded><![CDATA[
            <style type="text/css">
              * { box-sizing: border-box !important; }
              img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                display: block !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              figure {
                margin: 0 !important;
                padding: 0 !important;
                max-width: 100% !important;
                width: 100% !important;
                overflow: hidden !important;
              }
              .kg-image, .kg-image-card img, .kg-gallery-image img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                object-fit: contain !important;
              }
              .kg-image-card, .kg-gallery-card {
                max-width: 100% !important;
                width: 100% !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              .post-content, .gh-content {
                max-width: 100% !important;
                overflow-x: hidden !important;
              }
              div, p, span {
                max-width: 100% !important;
              }
            </style>
            <p>The following chart showed up in Princeton Alumni Weekly, in a report about China's population:</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/1545_001.png" class="kg-image" alt="Sciam_chinapop_19802020" loading="lazy" title="Sciam_chinapop_19802020" width="403" height="267"></figure><p>This chart was one of several that appeared in a related Scientific American <a href="https://www.scientificamerican.com/article/chinas-population-could-shrink-to-half-by-2100/?ref=junkcharts.com">article</a>.</p><p>The story itself is not surprising. As China develops, its birth rate declines, while the death rate also falls, thus, the population ages. The same story has played out in all advanced economies.</p><p>***</p><p>From a <a href="https://bitly.com/trifectacheckup?ref=junkcharts.com">Trifecta Checkup</a> perspective, this chart suffers from several problems.</p><p>The text annotation on the top right suggests what message the authors intended to deliver. Pointing to the group of people aged between 30 and 59 in 2020, they remarked that this large cohort would likely cause "a crisis" when they age. There would be fewer youngsters to support them.</p><p>Unfortunately, the data and visual elements of the chart do not align with this message. Instead of looking forward in time, the chart compares the 2020 population pyramid with that from 1980, looking back 40 years. The chart shows an insight from the data, just not the right one.</p><p>A major feature of a population pyramid is the split by gender. The trouble is gender isn't part of the story here.</p><p>In terms of age groups, the chart treats each subgroup "fairly". As a result, the reader isn't shown which of the 22 subgroups to focus on. There are really 44 subgroups if we count each gender separately, and 88 subgroups if we include the year split.</p><p>***</p><p>The following redesign traces the "crisis" subgroup (those who were 30-59 in 2020) both backwards and forwards.</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/1545_002.png" class="kg-image" alt="Junkcharts_redo_chinapopulationpyramids" loading="lazy" title="Junkcharts_redo_chinapopulationpyramids" width="1834" height="1248" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/archives/1545_002.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/archives/1545_002.png 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1600/archives/1545_002.png 1600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/1545_002.png 1834w" sizes="(min-width: 720px) 720px"></figure><p>The gender split has been removed; here, the columns show the total population. <a href="https://www.junkcharts.com/tag/color/">Color</a> is used to focus attention to one cohort as it moves through <a href="https://www.junkcharts.com/tag/time-series/">time</a>.</p><p>Notice I switched up the sample times. I pulled the population data for 1990 and 2060 (from this <a href="https://www.populationpyramid.net/china/1990/?ref=junkcharts.com">website</a>). The original design used the population data from 1980 instead of 1990. However, this choice is at odds with the message. People who were 30 in 2020 were not yet born in 1980! They started showing up in the 1990 dataset.</p><p>At the other end of the "crisis" cohort, the oldest (59 year old in 2020) would have deceased by 2100 as 59+80 = 139. Even the youngest (30 in 2020) would be 110 by 2100 so almost everyone in the pink section of the 2020 chart would have fallen off the right side of the chart by 2100.</p><p>These design decisions insert a gap between the visual and the message.</p>
          ]]></content:encoded>
          <description><![CDATA[ The message ran off the chart ]]></description>
        </item>
        <item>
          <title><![CDATA[ Swarmed by ants ]]></title>
          <link>https://www.junkcharts.com/swarmed-by-ants/</link>
          <guid isPermaLink="false">68e2cf4d04c09e00018ee901</guid>
          <category><![CDATA[ Bubble chart ]]></category>
          <dc:creator><![CDATA[ Kaiser Fung ]]></dc:creator>
          <pubDate>Wed, 02 Apr 2025 20:00:00 -0400</pubDate>
          <content:encoded><![CDATA[
            <style type="text/css">
              * { box-sizing: border-box !important; }
              img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                display: block !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              figure {
                margin: 0 !important;
                padding: 0 !important;
                max-width: 100% !important;
                width: 100% !important;
                overflow: hidden !important;
              }
              .kg-image, .kg-image-card img, .kg-gallery-image img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                object-fit: contain !important;
              }
              .kg-image-card, .kg-gallery-card {
                max-width: 100% !important;
                width: 100% !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              .post-content, .gh-content {
                max-width: 100% !important;
                overflow-x: hidden !important;
              }
              div, p, span {
                max-width: 100% !important;
              }
            </style>
            <p>Andrew discussed the following chart in a recent <a href="https://statmodeling.stat.columbia.edu/2025/03/26/on-that-claim-about-how-does-energy-impact-economic-growth?ref=junkcharts.com">blog</a> post:</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/1544_001.jpg" class="kg-image" alt="Agelmanblog_gdpel-logscale" loading="lazy" title="Agelmanblog_gdpel-logscale" width="640" height="360" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/archives/1544_001.jpg 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/1544_001.jpg 640w"></figure><p><br><br></p><p>Alert! A swarm of ants has marched onto a <a href="https://www.junkcharts.com/tag/bubble-chart/">bubble chart</a>.</p><p>These overlapping long <a href="https://www.junkcharts.com/tag/data-labels/">text labels</a> are dominating the chart; the length of these labels encodes the length of country names, which has nothing to do with the data.</p><p>We're waiting - hoping - for the ants to march off the page.</p><p>***<br>Andrew's blog post is about something else, the use of log <a href="https://www.junkcharts.com/tag/scale/">scales</a>. The chart above is a log-log plot. Both axes have log scales.</p><p>Andrew's correspondent doesn't like log scales. Andrew does.</p><p>One problem we encounter in practice with log scales is that people without science background can't read them. Andrew's correspondent said as much, while also misinterpreting the log-log chart. He says the log-log chart "visually creates a much stronger correlation than there actually is".</p><p>But that's not what happened. It's more appropriate to say that the log transformations allow us to see the correlation that exists. The correlation is not linear which is why the usual scatter plot does not reveal it.</p><p>Nevertheless, I agree with the correspondent on avoiding log scales in data displays because most readers don't get it.</p><p>***</p><p>Consider the following pair of plots.</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/1544_002.png" class="kg-image" alt="Junkcharts_loglog_sample" loading="lazy" title="Junkcharts_loglog_sample" width="2000" height="750" srcset="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w600/archives/1544_002.png 600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1000/archives/1544_002.png 1000w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/size/w1600/archives/1544_002.png 1600w, https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/1544_002.png 2084w" sizes="(min-width: 720px) 720px"></figure><p>The underlying data follow the pattern Y = 0.003 * X^2.5 but for what we're talking about, the specific pattern doesn't matter so long as X and Y has a "power" relationship.</p><p>The left plot directly shows the relationship between X and Y using regular scales. Readers see that Y is running away from X. The slope of the line increases as X increases. The speed of growth of Y exceeds that of X. This relationship is curved, which can't be described in words succinctly.</p><p>The right plot visually shows a linear relationship between X and Y but it's not really between X and Y. It's between log(X) and log(Y). Note that log(Y) = log(0.003*X^2.5) = log(0.003) + 2.5*log(X), which is a straight line with slope 2.5 and intercept log(0.003). The gap between gridlines now represents a 10-fold jump in value (of X or of Y). The linear relationship is between X and Y in log scale; in linear scale, it's a power relationship, not linear.</p><p>The practice of printing axis labels in the original scale, rather than log scale, adds to the confusion. On the right plot, the points labeled 5,000 and 50,000 do not actually lie on the line; what fall in line are the points log(5,000) and log(50,000). The reason for this confusing practice is that humans have trouble understanding data in log scale. For example, if $50,000 is the GDP per capita for some country, then log($50,000) = $4.5 which can't be interpreted.</p><p>Whether we are talking about the gaps between gridlines or about specific points on the line, what readers see on the log-log chart is only part of the story. Readers must also recognize that for the log-log chart to work, equal gaps between gridlines do not signify equal gaps in the data, while the linear relationship is between the log of the axis labels, not the labels themselves.</p><p>The X-Y plot can be interpreted visually in a direct way while the log-log plot requires the reader to transcend the visual representation, entering an abstract realm.</p>
          ]]></content:encoded>
          <description><![CDATA[ To log-log or not ]]></description>
        </item>
        <item>
          <title><![CDATA[ Know your data 45: permanence ]]></title>
          <link>https://www.junkcharts.com/know-your-data-45-permanence/</link>
          <guid isPermaLink="false">68e697c1a6f93c000172b3e9</guid>
          <category><![CDATA[ Big Data ]]></category>
          <dc:creator><![CDATA[ Kaiser Fung ]]></dc:creator>
          <pubDate>Thu, 27 Mar 2025 20:00:00 -0400</pubDate>
          <content:encoded><![CDATA[
            <style type="text/css">
              * { box-sizing: border-box !important; }
              img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                display: block !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              figure {
                margin: 0 !important;
                padding: 0 !important;
                max-width: 100% !important;
                width: 100% !important;
                overflow: hidden !important;
              }
              .kg-image, .kg-image-card img, .kg-gallery-image img {
                max-width: 100% !important;
                width: 100% !important;
                height: auto !important;
                object-fit: contain !important;
              }
              .kg-image-card, .kg-gallery-card {
                max-width: 100% !important;
                width: 100% !important;
                margin: 0 !important;
                padding: 0 !important;
              }
              .post-content, .gh-content {
                max-width: 100% !important;
                overflow-x: hidden !important;
              }
              div, p, span {
                max-width: 100% !important;
              }
            </style>
            <p>With the bankruptcy announcement of the once-highflying genetic testing company, 23andme, its customers are scrambling to delete their DNA data (<a href="https://www.msn.com/en-us/money/other/23andme-site-went-down-as-customers-struggled-to-delete-data/ar-AA1BA9vi?ref=junkcharts.com">link</a>).</p><p>These people have little understanding of how the Internet, and especially modern, cloud-based services, work.</p><p>In most, perhaps all, cases, a user who deletes data has severed oneself from the data. The data persist at the company, in the cloud, at the company's various business partners, etc.</p><p>It is very hard to delete anything in the digital age.</p><figure class="kg-card kg-image-card"><img src="https://storage.ghost.io/c/00/37/003715d2-9957-4ad9-99ad-f1deb54be59a/content/images/archives/11262_001.jpg" class="kg-image" alt="Yusuf-onuk-infinity-sm" loading="lazy" title="Yusuf-onuk-infinity-sm" width="500" height="314"></figure><p>***</p><p>Even before the arrival of computers with effectively infinite storage, when we delete a file from a PC, we have just severed the "pointer" to the locations on the hard drive where the file was stored. There are recovery software that allows us to retrieve the "deleted" file. The existence of such software proves that the file wasn't truly deleted.</p><p>Advanced software exists to supposedly truly delete files from computers. These would not be necessary if the files were truly deleted when users execute the "delete" function. Even these advanced software has blind spots. That's why some experts recommend physically destroying hard drives before disposal.</p><p>The cloud is a network of computers outside of the user's control. This network provides resiliency and efficiency by making copies of the data and strategically scattering them around the network. Thus, if someone wants to delete a file, one would have to find every copy of the file, and then thoroughly delete them from each computer using advanced software.</p><p>A business might use a cloud that is managed by some other entity (e.g. Amazon Web Services). Then, this business may not even know how many copies of a file have been created, and where they are stored. The business pays someone else to manage all of that, and the point is to wash their hands off of those details.</p><p>A business like 23andme makes money by selling user's data. Once the data change hands, they are now replicated in computers that are not controlled by 23andme. Those other businesses may also store their data in the cloud, which means the files are duplicated and distributed to yet another network of computers. 23andme has no ability to verify that its customers have thoroughly removed data from all of their computers using advanced software. It probably doesn't want to either.</p><p>Even if a business doesn't base its business model on selling customer data, they may still share customer data with business "partners." For example, if a hospital hires a third party to analyze CT scans, the CT scans of its patients will find their way to the computers of that third party. That third party may also send the images to its business partners, for exactly the same reason. If a patient requests the hospital to delete a file, the hospital would have to remove all of its copies, plus ask the third party to delete all of its copies, and so on down the tree of partnerships. The chance that all copies of that file are removed from all computers of all entities involved is exactly zero.</p><p>***</p><p>The above is not merely speculation. In your everyday usage of the Internet, you may inadvertently discover that data uploaded to some app are permanent.</p><p>Many years ago, I used an online email provider to send a single email to a list of people. This requires uploading the names and emails of those people. After sending the email, I deleted the contact information and closed my account. I specifically did this in the hope of preventing the vendor from taking the private data and using it for other purposes without my knowledge. The people on the list consented to receiving an email from me, but not more than that.</p><p>After I closed my account, surprise surpise, I got lots of marketing emails imploring me to return to the service.</p><p>One such email said that if I reactivated my account that day, I'd be able to recover all the contacts in my previous account. I could see how this would be a great convenience if I indeed wanted to continue from where I stopped.</p><p>However, it also shows that when I pushed the button to "delete" the contact list, it wasn't deleted at all! Even after I closed my account, the "deleted" contact list was still there.</p><p>***</p><p>In other words, data are permanent. It's delusional to think that going to the 23andme website and clicking on the "delete" button will remove one's DNA data from prying eyes.</p><p>Sure, doing it is better than not doing it. It's most likely a placebo - that makes one feel better but in reality, does not make a difference.</p><p>Pressing the delete button certainly detaches you from your data but there is no way to verify that your files have been permanently and thoroughly removed from all of 23andme's computers (not forgetting all the computers of employees who downloaded your files as part of some larger analyses of their database.) The DNA file would already have been sold many times over during your time as a customer, and will live on forever in many computers not owned by or accessible to 23andme. It would also have been shared with business partners who provided relevant services to 23andme. The most valuable asset that 23andme could sell during its bankruptcy proceedings is the DNA database, so you can bet that the leadership team has made sure that the data are transferred to the eventual buyer.</p><p>And remember, DNA data are itself immutable. It belongs to the class of data (like date of birth, social security number) that only needs to be stolen once. (See this previous <a href="https://www.junkcharts.com/know-your-data-29-shadow-databases/">post</a>.)</p><p>The only effective way to ensure your DNA data don't fall into the wrong hands is to not have them stored at 23andme in the first place, i.e. don't be their customer.</p>
          ]]></content:encoded>
          <description><![CDATA[ With the bankruptcy announcement of the once-highflying genetic testing company, 23andme, its customers are scrambling to delete their DNA data (link).

These people have little understanding of how the Internet, and especially modern, cloud-based services, work.

In most, perhaps all, cases, a user who deletes data has severed oneself from ]]></description>
        </item>
  </channel>
</rss>
