Wikimedia

Bug fix: ISBNs getting chunked incorrectly due to greedy regex

Bug report here: https://phabricator.wikimedia.org/T116056

Citoid is a node.js application that can get metadata from urls to turn it into a citation on wikipedia. For this task you will have to install both citoid and Zotero, another JS service that citoid uses to generate citations; instruction on how to install both here: https://www.mediawiki.org/wiki/Citoid#Install_from_scratch

In cases where multiple ISBNs are being returned from Zotero, the function fixISBN in lib/Exporter.js is sometimes chunking them incorrectly when converting a string into a list. This is because the regex is too greedy and matches the first 13 digits, when only a 10 digit ISBN is provided.

For example, the request for https://www.worldcat.org/title/pieter-bruegel/oclc/49531157&referer=brief_results i.e.

http://localhost:1970/api?format=mediawiki&search=https%3A%2F%2Fwww.worldcat.org%2Ftitle%2Fpieter-bruegel%2Foclc%2F49531157%26referer%3Dbrief_result) gives thewrong result.

The correct value for ISBN should be ["0810935317","978081093531"], not ["0810935317 97","8081093531"]. The original value from Zotero is "ISBN":"0810935317 9780810935310"

To fix this you should

  • Export the fixISBN function in lib/Exporter.js at the bottom of the file, so you can use the function in unit tests
  • Add unit tests for the fixISBN function in test/unit/features/exporter.js
  • Fix the fixISBN function in lib/Exporter.js so it no longer chunks the ISBNs here incorrectly

Task tags

  • regex
  • node.js
  • javascript
  • isbn

Students who completed this task

Geoffrey Mon

Task type

  • code Code
close

2015