From Wikipedia, the free encyclopedia
Status and updates for Task 17
List of params[edit]
- Doing
- Possible
- CNDID
- WT.ec_id
- cid
- sp_mid
- sp_rid
- Necessary to keep
Regex updates[edit]
because these things are boring
|
Original
\??(?:&?utm_[^=]*?=[^&\s\]\|]*)+(?=]|\s|\|)|(?<=\?)(?:&?utm_[^=]*?=[^&\s\]\|]*)+&
27 May (BRFA trial) - add green code to catch utm_ params in the middle, and catching more end-of-URL possibilities
\??(?:&?utm_[^=\s]*?=[^&\s\]\|]*?)+(?=}|]|\s|\|)|(?<=\?)(?:&?utm_[^=\s]*?=[^&\s\]\|]*)+&|(?<=&)(?:&?utm_[^=\s]*?=[^&\s\]\|]*)+&
7 June (catch ref tags) - add < to end-of-check exceptions
\??(?:&?utm_[^=\s]*?=[^&\s\]\|]*?)+(?=<|}|]|\s|\|)|(?<=\?)(?:&?utm_[^=\s]*?=[^&\s\]\|]*)+&|(?<=&)(?:&?utm_[^=\s]*?=[^&\s\]\|]*)+&
8 June (catch malformed utm_ params) - utm_ must be followed by text and an =
\??(?:&?utm_[^=\s\|<}\]]*?=[^&\s\]\|]*?)+(?=<|}|]|\s|\|)|(?<=\?)(?:&?utm_[^=\s\|<}\]]*?=[^&\s\]\|]*)+&|(?<=&)(?:&?utm_[^=\s\|<}\]]*?=[^&\s\]\|]*)+&
10 June (avoid web archive links)
(?<!https://web.archive.org[\S]+)(\??(?:&?utm_[^=\s\|<}\]]*?=[^&\s\]\|]*?)+(?=<|}|]|\s|\|)|(?<=\?)(?:&?utm_[^=\s\|<}\]]*?=[^&\s\]\|]*)+&|(?<=&)(?:&?utm_[^=\s\|<}\]]*?=[^&\s\]\|]*)+&)
1 July (avoid _utms just hanging out in text)
(?<!https://web.archive.org[\S]+|\||\s)(\??(?:&?utm_[^=\s\|<}\]]*?=[^&\s\]\|]*?)+(?=<|}|]|\s|\|)|(?<=\?)(?:&?utm_[^=\s\|<}\]]*?=[^&\s\]\|]*)+&|(?<=&)(?:&?utm_[^=\s\|<}\]]*?=[^&\s\]\|]*)+&)
|