Wikipedia:Request a query

From Wikipedia, the free encyclopedia

This is a page for requesting one-off database queries for certain criteria. Users who are interested and able to perform SQL queries on the projects can provide results from the Quarry website.

You may also be interested in the following:

  • If you are interested in writing SQL queries or helping out here, visit our tips page.
  • If you need to obtain a list of article titles that meet certain criteria, consider using PetScan (user manual) or the default search. Petscan can generate list of articles in subcategories, articles which transclude some template, etc.
  • If you need to make changes to a number of articles based on a particular query, you can post to the bot requests page, depending on how many changes are needed.
  • For long-term review and checking, database reports are available.

Quarry does not have access to page content, so queries which require checking wikitext cannot be answered with Quarry. However, someone may be able to assist by using Quarry in another way (e.g. checking the table of category links rather than the "Category:" text) or suggest an alternative tool.

Edit protected-move unprotected pages[edit]

Do any exist? Can they be made? Why might they be made? Geardona (talk to me?) 06:57, 15 March 2024 (UTC)[reply]

Respectively: Lots. Clearly. I suspect mostly carelessness. —Cryptic 07:16, 15 March 2024 (UTC)[reply]
Wow, thanks for the fast response! Geardona (talk to me?) 17:29, 15 March 2024 (UTC)[reply]

Category:Green Bay Packers Hall of Fame by size and WikiProject rating[edit]

I kindly requested a query of Category:Green Bay Packers Hall of Fame to produce a wikitable with three columns:

  • Article name: linked name of the article
  • Article size: by bytes
  • WikiProject rating: stub, start, C, etc

The table would look something like (but for all 160+ articles in the cat):

Name Size (bytes) Rating
Bob Monnett 2,347 Stub

This is to help me prioritize important articles under WP:PACKERS. Thank you! « Gonzo fan2007 (talk) @ 18:39, 21 March 2024 (UTC)[reply]

@Gonzo fan2007: Does User:Certes/Reports/Green Bay Packers do what you need? Certes (talk) 19:33, 21 March 2024 (UTC)[reply]
Certes, perfect! Thank you so much. « Gonzo fan2007 (talk) @ 19:52, 21 March 2024 (UTC)[reply]
There's a link top right to update the report if the situation changes (e.g. you do a lot of reassessments). Certes (talk) 20:06, 21 March 2024 (UTC)[reply]
This is great, I appreciate it! « Gonzo fan2007 (talk) @ 20:27, 21 March 2024 (UTC)[reply]
Thanks also from me - I can make use of the code for showing the rating. I had to take out "{{!}}^[FG]A-Class_" from the REGEXP string to make it work, though. I'm not sure what that part does. Stefen Towers among the rest! GabGruntwerk 06:12, 22 March 2024 (UTC)[reply]
That bit picks up the FA and GA categories which are not biographies, like Talk:Wilner Burke which is only in GA-Class Wisconsin articles. Without it, the rating comes out blank. The {{!}} produces a "|" for regexp alternation; the template prevents it from delimiting the |sql= parameter. If you're copying the SQL into Quarry or some other SQL client that doesn't parse wikitext, you'll need to change {{!}} to | manually. Certes (talk) 10:34, 22 March 2024 (UTC)[reply]
<nowiki> should work in {{database report}} now, which would make things clearer. —Cryptic 16:35, 22 March 2024 (UTC)[reply]
Thanks. I already knew about {{!}} from all the template work I've done and alternation from all the RegEx I've written. It was the rest that confused me, esp. why there would be interest in just FA/GA outside of bios. At any rate, that part wasn't necessary for my purposes for listing articles that are definitely within one wikiproject. Stefen Towers among the rest! GabGruntwerk 16:39, 22 March 2024 (UTC)[reply]

Women in Red[edit]

I'm not exactly requesting a query, but help with one we already have: Wikipedia:WikiProject_Women_in_Red/Redlinks/C2. This version ran in 37 seconds. Some time between 20 and 25 February, it stopped working, instead throwing SQL Error: ER_DATA_TOO_LONG: Data too long for column 'subcat' at row 2. This is because it initially populates a table with one row, a fairly short category name, then attempts to add longer names. This used to work but something in the database or engine must have changed, as the same SQL now fails. I've fixed it by seeding the table with a long dummy value which doesn't affect the results, but that version takes nearly two hours to run. Does anyone know what changed? Is part of the forthcoming categorylinks "upgrade" causing the failure and wrecking performance? (A simple search for a link to details of that change timed out too – perhaps there is a more general temporary performance problem but VPT has nothing, and performance was just as bad last night.) Certes (talk) 15:24, 24 March 2024 (UTC)[reply]

I poked at this a little when you first asked it, and some more today, and I wasn't able to come up with a version that avoided a full table scan on pagelinks - taken in isolation, the CTE completes instantly, and I can do something like WITH names AS ( /* the long cte */ ) SELECT COUNT(*) FROM names JOIN pagelinks ON pl_namespace = 0 AND pl_title = name GROUP BY name, but every variant of "pl_title starts with name" I could come up with - name=SUBSTRING_INDEX like you used, pl_title LIKE CONCAT(name, '%'), LEFT(pl_title, LENGTH(name)) = name, and so on - was unindexed. Which is why your query taken as a whole is looking through all of page first, when I suspect it was able to make use of the Cn..D range limit before. Irritatingly, putting the CTE results directly in the query instead like quarry:query/81913 does work, but I don't know a workaround to force use of the index when we don't have access to the real pagelinks table, just a view. (And no, I don't know what changed.)
Something like SELECT CAST('Feminine_given_names' AS VARCHAR(256)), 0 would be cleaner than your long 'This category does not exist but...' dummy value, but obviously that's not the real problem here.
What, precisely, are you trying to do? Maybe we can find another way to do it? —Cryptic 17:50, 10 April 2024 (UTC)[reply]
I was listing frequently used redlinks which resemble women's names, so that the editors at Women in Red can consider writing articles on them. Of course, this is heuristic and contains false positives such as Ms May Rebellion, but they found it helpful and have created many wanted articles from it. I split it by initial letter to prevent timeouts by allowing index use, and to avoid overwhelming the authors by requesting hundreds of articles at once. Further details: Wikipedia:WikiProject Women in Red/Redlist index#Alphabetical (most frequently redlinked), User talk:Certes#WIR Redlist problem, User talk:Certes/Archive 9#Frequently redlinked women. Certes (talk) 19:05, 10 April 2024 (UTC)[reply]
Is WHERE pl_from_namespace IN (0, 10) ... HAVING MAX(pl_from_namespace) = 0 meant only to exclude titles linked from any template? The query completes quickly if the first half of that is changed to WHERE pl_from_namespace = 0. Explain output for query 81446 as written (sql-optimizer seems to choke on queries with ctes):
+------+-----------------+---------------+--------+------------------------------------------------------------------------------+-----------------+---------+---------------------------------+----------+-----------------------------------------------------------+
| id   | select_type     | table         | type   | possible_keys                                                                | key             | key_len | ref                             | rows     | Extra                                                     |
+------+-----------------+---------------+--------+------------------------------------------------------------------------------+-----------------+---------+---------------------------------+----------+-----------------------------------------------------------+
|    1 | PRIMARY         | page          | index  | PRIMARY,page_name_title                                                      | page_name_title | 261     | NULL                            | 57680411 | Using where; Using index; Using temporary; Using filesort |
|    1 | PRIMARY         | pagelinks     | ref    | PRIMARY,pl_namespace,pl_backlinks_namespace,pl_backlinks_namespace_target_id | PRIMARY         | 8       | enwiki.page.page_id,const       |        4 | Using where                                               |
|    1 | PRIMARY         | <derived5>    | ref    | key0                                                                         | key0            | 258     | func                            |        1 | Using where                                               |
|    1 | PRIMARY         | page          | eq_ref | page_name_title                                                              | page_name_title | 261     | const,enwiki.pagelinks.pl_title |        1 | Using index                                               |
|    5 | DERIVED         | <derived2>    | ALL    | NULL                                                                         | NULL            | NULL    | NULL                            |        2 | Using where; Using temporary                              |
|    5 | DERIVED         | categorylinks | ref    | PRIMARY,cl_timestamp,cl_sortkey                                              | cl_sortkey      | 258     | deepcat.subcat,const            |       19 | Using where; Using index                                  |
|    5 | DERIVED         | page          | eq_ref | PRIMARY,page_name_title                                                      | PRIMARY         | 4       | enwiki.categorylinks.cl_from    |        1 | Using where                                               |
|    2 | DERIVED         | NULL          | NULL   | NULL                                                                         | NULL            | NULL    | NULL                            |     NULL | No tables used                                            |
|    3 | UNION           | NULL          | NULL   | NULL                                                                         | NULL            | NULL    | NULL                            |     NULL | No tables used                                            |
|    4 | RECURSIVE UNION | <derived2>    | ALL    | NULL                                                                         | NULL            | NULL    | NULL                            |        2 | Using where                                               |
|    4 | RECURSIVE UNION | categorylinks | ref    | PRIMARY,cl_timestamp,cl_sortkey                                              | cl_sortkey      | 258     | deepcat.subcat,const            |       19 | Using where; Using index                                  |
|    4 | RECURSIVE UNION | page          | eq_ref | PRIMARY,page_name_title                                                      | PRIMARY         | 4       | enwiki.categorylinks.cl_from    |        1 | Using where                                               |
| NULL | UNION RESULT    | <union2,3,4>  | ALL    | NULL                                                                         | NULL            | NULL    | NULL                            |     NULL |                                                           |
+------+-----------------+---------------+--------+------------------------------------------------------------------------------+-----------------+---------+---------------------------------+----------+-----------------------------------------------------------+

and the same for WHERE pl_from_namespace = 0:

+------+-----------------+---------------+--------+------------------------------------------------------------------------------+------------------------+---------+---------------------------------+----------+-----------------------------------------------------------+
| id   | select_type     | table         | type   | possible_keys                                                                | key                    | key_len | ref                             | rows     | Extra                                                      |
+------+-----------------+---------------+--------+------------------------------------------------------------------------------+------------------------+---------+---------------------------------+----------+-----------------------------------------------------------+
|    1 | PRIMARY         | pagelinks     | range  | PRIMARY,pl_namespace,pl_backlinks_namespace,pl_backlinks_namespace_target_id | pl_backlinks_namespace | 265     | NULL                            | 50732398 | Using where; Using index; Using temporary; Using filesort |
|    1 | PRIMARY         | <derived5>    | ref    | key0                                                                         | key0                   | 258     | func                            |        1 | Using where                                               |
|    1 | PRIMARY         | page          | eq_ref | page_name_title                                                              | page_name_title        | 261     | const,enwiki.pagelinks.pl_title |        1 | Using index                                               |
|    1 | PRIMARY         | page          | eq_ref | PRIMARY,page_name_title                                                      | PRIMARY                | 4       | enwiki.pagelinks.pl_from        |        1 | Using where                                               |
|    5 | DERIVED         | <derived2>    | ALL    | NULL                                                                         | NULL                   | NULL    | NULL                            |        2 | Using where; Using temporary                              |
|    5 | DERIVED         | categorylinks | ref    | PRIMARY,cl_timestamp,cl_sortkey                                              | cl_sortkey             | 258     | deepcat.subcat,const            |       19 | Using where; Using index                                  |
|    5 | DERIVED         | page          | eq_ref | PRIMARY,page_name_title                                                      | PRIMARY                | 4       | enwiki.categorylinks.cl_from    |        1 | Using where                                               |
|    2 | DERIVED         | NULL          | NULL   | NULL                                                                         | NULL                   | NULL    | NULL                            |     NULL | No tables used                                            |
|    3 | UNION           | NULL          | NULL   | NULL                                                                         | NULL                   | NULL    | NULL                            |     NULL | No tables used                                            |
|    4 | RECURSIVE UNION | <derived2>    | ALL    | NULL                                                                         | NULL                   | NULL    | NULL                            |        2 | Using where                                               |
|    4 | RECURSIVE UNION | categorylinks | ref    | PRIMARY,cl_timestamp,cl_sortkey                                              | cl_sortkey             | 258     | deepcat.subcat,const            |       19 | Using where; Using index                                  |
|    4 | RECURSIVE UNION | page          | eq_ref | PRIMARY,page_name_title                                                      | PRIMARY                | 4       | enwiki.categorylinks.cl_from    |        1 | Using where                                               |
| NULL | UNION RESULT    | <union2,3,4>  | ALL    | NULL                                                                         | NULL                   | NULL    | NULL                            |     NULL |                                                           |
+------+-----------------+---------------+--------+------------------------------------------------------------------------------+------------------------+---------+---------------------------------+----------+-----------------------------------------------------------+

The first doesn't look like it'll look through so many more rows, but it's definitely taking forever, and the table order makes less sense. —Cryptic 20:03, 10 April 2024 (UTC)[reply]

quarry:query/81916 if my reading of your intention above was correct. —Cryptic 20:13, 10 April 2024 (UTC)[reply]
And quarry:query/81918 is a more (ahem) straightforward fix. —Cryptic 20:29, 10 April 2024 (UTC)[reply]
 Fixed @Cryptic: Thank you so much. So it needs a STRAIGHT_JOIN rather than a JOIN, to force the tables to be processed in the suggested order. I've put that into C2 and it works, so I'll update the other queries.
I've also neatened the table creation as you suggest. I had something similar originally but replaced it by a simpler kludge, because the CAST was the only code I'd changed when the timing went off and I suspected it was causing the problem, perhaps by making columns to be joined have incompatible types. Certes (talk) 20:54, 10 April 2024 (UTC)[reply]
...and yes, HAVING is to exclude people in templates, who might be linked from hundreds of articles just because their name appears in one navbox. Certes (talk) 20:56, 10 April 2024 (UTC)[reply]
I'd bet money that what changed is just that pagelinks is incrementally bigger. The optimizer's wildly overestimating the cost of scanning it, so that looking at page AS Pf became more attractive. Removing namespace 10 helped not because the database horrifically couldn't use two disjoint ranges anymore, but because it lowered the estimated cost just enough to scan it first. What confused me last month is that I thought it was looking at page AS Pt instead, and that it was primarily the cte that was meant to cut the number of rows down. —Cryptic 21:06, 10 April 2024 (UTC)[reply]
That sounds very plausible. I remember from my SQL-writing days decades ago that query plans would suddenly tip over from the desired order to something unhelpful when data sizes changed. I've applied the fix to the similar reports with other initials, rerun the broken ones and everything's now working. Thanks again. Certes (talk) 21:20, 10 April 2024 (UTC)[reply]
I've also updated the first chapter of Men in Red, though currently no one seems interested in its output. There's a biologist there with over 100 links (from actual articles, not navboxes) and articles in French, German and Spanish. Certes (talk) 15:21, 11 April 2024 (UTC)[reply]

Fetch a subset of log comments from the last year[edit]

quarry:query/81887. The part of my query that limits things to the last year (log_id > subquery) is running too slow and is causing it to timeout. Any suggestions to speed it up? The query works fine when I hard-code a log_id like I did in quarry:query/81844. But I'd prefer the query to be more dynamic than just hard-coding a log_id. Thanks. –Novem Linguae (talk) 13:55, 9 April 2024 (UTC)[reply]

Why not just do "and log_timestamp > TIMESTAMP" directly? That's what I do at Wikipedia:Database reports/Possibly out-of-process deletions and it seems to work. * Pppery * it has begun... 14:04, 9 April 2024 (UTC)[reply]
quarry:query/81893. Good suggestion, thank you. That's good for code readability. But the query is still timing out if anyone has any suggestions. –Novem Linguae (talk) 23:50, 9 April 2024 (UTC)[reply]
quarry:query/81896. Using the indexed tables logging_logindex and actor_logging sped things up a lot. As a side note, since text fields in the replicas are either VARBINARY or BLOB type, the LIKE clause is in fact case-sensitive. If you first convert the fields to UTF-8 then LIKE works as expected. Also, unless you don't want the oldest results to appear first, there's no reason to sort by the log_id ascending since that's already the clustered index of the table. Uhai (talk) 03:25, 10 April 2024 (UTC)[reply]
Bah. I always forget to add those pesky alternative views. Thank you very much for catching that, and for the other tips. –Novem Linguae (talk) 06:10, 10 April 2024 (UTC)[reply]
The specialized views helped a little, but what made the real difference here was adding "log_type = 'block'". log_action is a subindex of that; trying to filter by that without log_type is like searching a paper dictionary for words where the second letter is "r". Compare EXPLAIN output without and with the change, and see Database index#Column order. —Cryptic 16:07, 10 April 2024 (UTC)[reply]