Q&A

How exactly does PredictiveBIB support creation & exchange of public-domain bibliographic records between libraries?

CRESS is an experimental public-domain cloud repository available to PredictiveBIB users who want to optionally search/import/export public-domain bibliographic records. Libraries are welcome to host mirrors of CRESS bibliographic content to ensure data-replication, distributed access, and non-monopolization.

PredictiveBIB enables libraries to:

  • Generate bibliographic records without a CC0 statement. Records are private to the authoring library and are never saved in the cloud.
  • Add a CC0 statement to generated original records but keep them private. Records are private to the authoring library and are never saved in the cloud.
  • Add a CC0 statement to generated original records and upload them to a public-domain cloud repository (CRESS) for free sharing with all other libraries (i.e. shared publicly). This has the potential to become a popular model because it can operate on pennies per record, stimulates innovation, is equitable through equal access and distributed responsibility, and taps into a global need for open information exchange.


Have you studied how your software-driven subject headings compare to those already assigned by the Library of Congress?

Some background information: predictive capability occurs when there is overlap between previous crowdsourced metadata and the current book being cataloged. When I catalog in a new subject area (e.g. interracial relationships), I try to select a rich set of Tags & LCSH in order to build-up predictive capability in that area. Take for example the picture book “I’m in love with a big blue frog”. The subject headings were informed by the Tags: Folk music, Romance, Tolerance, Equality, Discrimination, Animal, & Amphibian. Using PredictiveBIB is like having perfect memory recall for all LCSH you have ever used (and other community catalogers have ever used) in a particular subject area, but it still relies on a cataloger making good choices, which only humans can do.

For the book below, if a cataloger is unaware that interracial dating and tolerance are relevant themes, then even if they use PredictiveBIB they probably won’t select the tags Romance & Tolerance, and PredictiveBIB won’t auto-suggest Interracial dating and Toleration as LCSH. In other words, the “software-driven subject headings” are dependent on accurate tag assignment by the cataloger and good judgment in selecting from the predictive list, which might very well mean going out to LC Authorities when an appropriate one doesn’t yet exist in crowdsourced metadata.

At this time PredictiveBIB does not support Children’s Subject Headings (that is on my to-do list), which explains the different second indicator.

I’m in love with a big blue frog (ISBN 9781936140374):

Library of Congress subject headings:

650_0 |a Folk songs, English |z United States |v Texts.
650_1 |a Folk songs.
650_1 |a Frogs |z Songs and music.

PredictiveBIB subject headings:

650 _0 ǂa Folk songs, English ǂz United States ǂv Texts.
650 _0 ǂa Toleration ǂv Juvenile literature.
650 _0 ǂa Equality ǂv Juvenile literature.
650 _0 ǂa Frogs ǂv Juvenile literature.
650 _0 ǂa Interracial dating ǂv Juvenile literature.

This answer is intended to clarify the nature of “software-driven subject headings” and emphasize that a comparison of completed records is more likely to speak to cataloger familiarity with the item being cataloged. Independent beta-testing by libraries provides an opportunity for evaluation of the software’s auto-suggestions with respect to cataloger-selected Tags.


Do you have plans to include a Z39.50 or API connection to OCLC?

Exporting records from PredictiveBIB to OCLC: This depends on whether OCLC will permit an export connection from PredictiveBIB into WorldCat, OCLC’s policy on export of public-domain records to WorldCat, and whether OCLC member libraries may elect to share their original public-domain records publicly prior to export to WorldCat.

Importing records from OCLC into PredictiveBIB: since PredictiveBIB extracts metadata from bibliographic records generated using PredictiveBIB, I’ll need OCLC’s confirmation that there will be no problematic restrictions on the use of metadata extracted from OCLC records for prediction purposes before making plans to support this. The risk is contaminating open-metadata with usage restrictions that limit predictive functionality or innovation.

OCLC members interested in predictive-cataloging may wish to explore these issues with OCLC.

These webpages/articles seem relevant:

Z39.50 access to other databases: The issues are similar to those mentioned above. It depends on the usage & licensing terms of a particular database and its contents.


Is your software open-source?

Since the project is not publicly-funded and very new, I plan to take a considered approach to open-sourcing. Open-sourcing supports transparency and community contribution/innovation. On the other hand, due to the size & complexity of Project software, and the expertise needed to manage the distributed cloud cataloging platform, open-sourcing the software is unlikely to benefit libraries directly. Also, closed-source software has more commercial licensing potential, which could subsidize ongoing research, operating costs, and library usage.

I welcome suggestions from the developer/library community about ways to support community contribution/innovation whilst also funding labor and infrastructure costs.


Do you think a public library could employ someone like Index Data or ByWater to work with implementing your tool?

That’s not necessary, the desktop app can be downloaded, installed, and ready for cataloging in about 5-minutes, since the cloud infrastructure is already in place and managed by me. So if a library sends me an email asking to beta-test PredictiveBIB, I’ll provide a download link & logon credentials, and you may begin cataloging with PredictiveBIB almost immediately. Beta testing is of course free of charge and can extend as long as a library needs it to (up to 1 year).

I’d welcome the opportunity to work directly with libraries (as a volunteer) to implement this tool.


What is your plan for BIBFRAME workflows or might you talk about how you are testing MARC2BIBFRAME conversion workflows?

Experimentation with the Library of Congress marc2bibframe2 conversion utility is ongoing, and my goal is to support linked data by mid 2021.


Do you think the platform might also be valuable for academic libraries?

The PredictiveBIB software platform integrates several distinct components, and this question needs to be answered with respect to each:

  • Predictive cataloging (predicts subject headings, etc. through metadata mining): Since predictive capability evolves in a subject area as cataloging progresses in that area, library's catalogers in specialized subject areas may end up initially training the prediction component in those areas. This is not as slow as it sounds because after LSCH are assigned and a record generated, those LCSH become candidates for similar books cataloged by the user community.
  • Aggregated metadata (metadata collected over time from one or more catalogers): this component is independent of library type.
  • Record creation logic (generates records in multiple formats based on cataloger entered bibliographic data): This is where some extra work might be needed. The current MARC fields/subfields generated are tailored to supported mainstream public library materials. Please take a look at the sample records to see how they differ from your library's requirements. ModMARC or other MARC editor can be used to add/modify MARC fields on your end, or potentially the record creation logic can be expanded on my side.
  • Assistive cataloger-interface (minimizes bibliographic data-entry requirements and augments cataloger effort): This is also where some extra work might be needed, since the UI design is driven by bibliographic data input requirements, which may change if record creation logic is expanded.

Academic libraries, law libraries, medical libraries, museums, & historical societies are on my support roadmap. Academic libraries/researchers interested in beta testing are welcome to reach out.