A cooperative for big data in scholarly publishing

Kevin S. Hawkins


Both for-profit and not-for-profit organizations increasingly use big data not only to study what has happened (data analytics) but also to make predict future trends (predictive analytics).  With certain notable exceptions (student recruitment in US institutions and compulsive evaluations of research productivity in the UK and Australia), academia has generally lagged behind other sectors in its use of big data.

One domain that has moved halfway into collecting and analyzing big data is scholarly publishing, whose stakeholders of varying size include libraries and other research institutions, learned societies, for-profit publishers, and not-for-profit publishers.  These stakeholders generate and collect various types of data, especially relating content usage and sales, but often lack both resources to explore the data and ways to compare their data with that of other stakeholders.  The situation is not one where a market participant tries to acquire competitive intelligence to help them compete against others; rather, because the stakeholders are so tightly related, they nearly all have some sort of data that would help all of them function more efficiently.  Unfortunately, the challenges associated with gathering, integrating, interpreting, and reporting usage data limit the ability of individual publishers, libraries, and other stakeholders to identify—much less predict—important usage trends and opportunities through which these organizations might extend their impact.

At the same time, there are real concerns about ownership of, access to, and analysis of data for "predictive bibliometrics".  Furthermore, while all stakeholders would like to have rich data and be able to carry out predictive analytics of some sort, the high cost of providing or purchasing data-related services risks reinforcing inequalities in the landscape of scholarly publishing.

This paper will present a vision for a cooperative of stakeholder institutions called the Publishing Analytics Data Alliance.  Member institutions will contribute data that they gather about scholarly publishing to the cooperative, which will normalize and aggregate data for exploration by its members, who will be able to see their data in the context of their peers.  The cooperative's members, through a system of shared governance, will also establish an ethical framework governing the functionality of the cooperative's data services and, more generally, the use of data by members.

Beyond shared governance, the pooling of resources by the cooperative offers a way for members to achieve more than they would be able to on their own -- namely, to explore and analyze data about scholarly publishing.  It is hoped that the cooperative will lead to increased cooperation and efficiency in the scholarly publishing ecosystem, all the while addressing ethical concerns raised by the power of data.

We gratefully acknowledge the support from the following organizations: