Wednesday, November 26, 2025

With Grokipedia, Top-Down Control of Knowledge Is New Again by Ryan McGrady

 

Grokipedia, the AI-generated encyclopedia owned by Elon Musk's xAI, went live on October 27. It is positioned as, first and foremost, an ideological foil to Wikipedia, which for years has been the subject of escalating criticism by right-wing media in general and Musk in particular. With Grokipedia, Musk wants to produce something he sees as more neutral.

Much has already been written about the character of Grokipedia’s content. This essay aims to explore the nature of the project and its version of neutrality, as compared to Wikipedia. Technologically, it is one of many experiments designed to replace human-generated writing with LLMs; conceptually, it is less a successor to Wikipedia than a return to an older model of producing officially sanctioned knowledge.

Wikipedia and neutrality

Nearly every encyclopedia asserts some version of "neutrality." Wikipedia's definition is unusual: its "neutral point of view" policy aims not to pursue some Platonic ideal of balance or objectivity, but rather a faithful and proportional summary of what the best available sources say about a subject. Original ideas, reporting, and analysis on the part of its contributors are not allowed. Casting volunteers as "editors" and not "authors" is part of how "an encyclopedia that anyone can edit" is possible — by moving the locus of dispute from truth itself to which sources to use and how to incorporate them. As with the rest of Wikipedia, neutrality is less a perfect state than a continuously negotiated process wherein disputes are expected and common. While neutrality and sourcing discussions are often deeply fraught, with complicated histories that blur lines of reliability and result in lengthy discussions, they're also constructive — a 2019 study in Nature found that articles with many such conflicts tended to be higher quality in general.

 

On which sources to use, Wikipedia's guideline about identifying "reliable sources" details its priorities: a reputation for fact-checking, accuracy, issuing corrections, editorial oversight, separating facts and opinions, no compromising connection to the subject, and other traditional markers of information literacy that librarians have taught to students and researchers for more than a century. Secondary and tertiary sources are preferred, deferring to them for the task of vetting and interpreting primary sources. Independent subjects are also preferred for any non-trivial claim, as article subjects have a hard time writing about themselves objectively. Ideological orientation is not a factor except insofar as narrative drive affects this list of priorities. Both of the following statements can align with Wikipedia's definition of a "reliable source," even though they're opposed: "unicorns aren't real but I wish they were;" "unicorns aren't real and I'm glad they aren't." Either source would take priority over a source that claims "unicorns are real," regardless of the author's pro- or anti-unicorn sentiment.

Primarying Wikipedia

However, sourcing is also at the center, implicitly or explicitly, of many allegations that Wikipedia is not actually neutral. Some of these claims focus on Wikipedia's "perennial sources list," which includes dozens of sources whose reliability is frequently discussed, highlighted according to the outcomes of those discussions. The idea is to be able to point to a central page where someone can find links and summaries of past discussions rather than have volunteers explain for the umpteenth time why e.g. InfoWars is not a reliable source.

I agree with criticism of this page to the extent it has given rise to a genre of source classification discussion applied not just to extreme cases like InfoWars but to sources that require some nuance, indirectly short-circuiting debates that should take place on a case-by-case basis. But even if the list were to be deleted altogether, it wouldn't turn unreliable sources (according to the guideline) into reliable ones; it would just require more of those debates to play out rather than let someone point to a line in a table. There's an optics argument to be had, too: it's not that there aren't more unreliable right-wing sources than left-wing sources; it's just that people try to use unreliable right-wing sources more frequently in Wikipedia articles.

But in large part, allegations of bias are a straightforward extension of a decades-old argument: that academia, science, mainstream media, etc. are broadly biased towards the left and/or untrustworthy. Whether through Rush Limbaugh's "four corners of deceit" (government was the fourth corner) or some other articulation, the frame is well established. The extent to which it is true is outside the scope of this essay, but anyone who holds this view will inevitably see that bias in Wikipedia, which summarizes academia, science, and media. Musk made this point earlier this year when he called Wikipedia "an extension of legacy media propaganda."

It should not be surprising, then, that the sourcing used by Grokipedia is often radically different from Wikipedia's. It's not clear how reliably Grok will explain its own internal processes, but it should at least communicate the way its developers want Grokipedia to be seen. So I asked it to explain the way it prioritizes sources for different kinds of content, and it provided a table that's worth including here; see below.

The most obvious trend is its preference on most topics for primary, self-published and official sources like verified X users' social media posts and government documents. These are put on par with or at higher priority than peer-reviewed journal articles, depending on the category. The only examples it provides among high-priority sources, apart from X users, are ArXiv (itself contending with an influx of LLM content) and PubMed for scientific/technical topics and Kremlin.ru for historical events.

Some of Wikipedia's fiercest critics contend that its version of neutrality unfairly endorses "Establishment" views on issues like vaccines, climate change, or the results of the 2020 US Presidential election, omitting minority positions or describing them in unfavorable terms. If many people hold a view, the argument goes, it is worth presenting on its own terms rather than deciding one set of sources is better than another. Grokipedia appears to align with this perspective, as its low-priority source criteria explains that it is sensitized to "emotional bias," labels like "pseudoscience," and anything that doesn't present alternative perspectives.

There is another characteristic of the sourcing that will be immediately apparent to anyone who has tried to do a literature review on a subject using a chatbot: it relies on sources available on the open web (or sources widely described by sources available on the open web). Commercial sites with good search engine optimization, apparent content farms, and personal blogs appear alongside traditional media sources. Grok can find extant text on the web faster than Wikipedia's human editors, but does it have access to the books and articles that aren't internet-accessible?

 

No comments:

Post a Comment