© Hoffman Construction Company

What can we learn from 128 million checkouts at the Seattle Public Library?

By Devin Smith, November 2021.

Earlier this year, I was asked to give a 30-minute data analytics presentation for a job interview with a news tech company. My prior data work is primarily visualization and web design, so I decided to put something together from scratch. While I didn't land the job — meaning you could hire me! — this was a great exercise, and I ended up learning some interesting things about library usage in Seattle.

What's In The Data Set?

The Seattle Public Library's continually-updated "Monthly Checkouts By Title" contains checkout totals for each item checked out during the month. It covers both physical and digital platforms, and extends all the way back to 2005. In total, it contains 36.6 million rows of data, and comes as an 8gb CSV file when downloaded. (h/t to Data Is Plural for surfacing this.) In a single row of data, we get:

...but it's kinda messy, too. Each of the digital platforms reports its data in a slightly different way. And while the formatting between physical items is fairly consistent, a movie on DVD and its streaming counterpart may list completely different information; for example, whether a film's director or star is listed in the "creator" field.

Seattle's Open Data program provides a graphical front-end for users, called Socrata, which is like an open source version of Tableau. It's a little clunky and limited... but it's free! so whatever. It also includes a graphing utility, which is fairly decent, if a little limited. Importantly, it also allows you to filter the data set and then download the results, which is great for folks like me with limited bandwidth or older computers (I'm on a 2011 Mac Mini). For SQL, I'm running Postgress in the Terminal.

So let's get into it! We'll start by examining global trends in physical vs. digital checkouts over the years. Then we'll drill down into a single year, 2017, and take a look at 2017's most-checked-out items. Lastly, we'll briefly look at the effects of COVID-19 on checkouts in 2020.

1. Global Trends

Total checkouts, Physical v. Digital

\copy checkouts_full
	FROM 'Checkouts_by_Title.csv'
	WITH csv header;

-- Full monthly rollup

CREATE TABLE monthly_rollup_full
AS SELECT
	checkoutyear,
	checkoutmonth,
	usageclass,
	checkouttype,
	materialtype,
	SUM(checkouts) AS "monthlycheckouts"
FROM checkouts_full
GROUP BY 
	checkoutyear,
	checkoutmonth,
	usageclass,
	checkouttype,
	materialtype
ORDER BY
	checkoutyear ASC,
	checkoutmonth ASC,
	usageclass ASC,
	checkouttype ASC,
	materialtype ASC;

\copy monthly_rollup_full
	TO 'monthly_rollup_full.csv' 
	WITH CSV HEADER;

LINK: monthly_rollup_full.csv


-- Physical vs digital

CREATE TABLE monthly_rollup_phsical_digital
AS SELECT
	checkoutyear,
	checkoutmonth,
	usageclass,
	SUM(checkouts) AS "monthlycheckouts"
FROM checkouts_full
GROUP BY 
	checkoutyear,
	checkoutmonth,
	usageclass
ORDER BY
	checkoutyear ASC,
	checkoutmonth ASC,
	usageclass ASC;

\copy monthly_rollup_phsical_digital
	TO 'monthly_rollup_physical_digital.csv' 
	WITH CSV HEADER;

LINK: monthly_rollup_physical_digital.csv

While the total checkouts per year have remained relatively static over the last decade at around 9 million, the share of total checkouts from digital platforms has steadily increased. Prior to COVID-19, the trends indicate that digital checkouts would have surpassed physical ones some time in the next few years.

Digital Platforms, AV Split

Note: physical checkouts are labeled "Horizon" (the physical system's software package) in these graphs.

-- Monthly rollup w/ Platform Splits

CREATE TABLE monthly_rollup_full_platforms
AS SELECT
	checkoutyear,
	checkoutmonth,
	usageclass,
	checkouttype,
	SUM(monthlycheckouts) AS "monthlycheckoutsplatform"
	FROM monthly_rollup_full
GROUP BY 
	checkoutyear,
	checkoutmonth,
	usageclass,
	checkouttype
ORDER BY
	checkoutyear ASC,
	checkoutmonth ASC,
	usageclass ASC,
	checkouttype ASC;

\copy monthly_rollup_full_platforms
	TO 'monthly_rollup_full_platforms.csv' 
	WITH CSV HEADER;

LINK: monthly_rollup_full_platforms.csv

-- Digital platforms only

CREATE TABLE monthly_rollup_digital
	AS SELECT *
	FROM monthly_rollup_full_platforms
	WHERE usageclass = 'Digital';

\copy monthly_rollup_digital
	TO 'monthly_rollup_digital.csv' 
	WITH CSV HEADER;

LINK: monthly_rollup_digital.csv

[TK Media types]

When we zoom in on the digital checkouts, we can see that Overdrive is the dominant platform. And when we split the digital checkouts by material type, we find that ebooks rather than AV (movies and music, primarily) account for the vast majority of the digital checkouts.

-- List of material types for reference

DROP TABLE material_list;
CREATE TABLE material_list
	AS SELECT DISTINCT
	UNNEST(STRING_TO_ARRAY(materialtype, ', '))
		AS "material type list",
	usageclass
	FROM checkouts_full
	ORDER BY "material type list" ASC;

\copy material_list
	TO 'material_list.csv' 
	WITH CSV HEADER;

LINK: material_list.csv


CREATE TABLE av_formats_only
AS SELECT
	checkoutyear,
	checkoutmonth,
	checkouttype,
	SUM(checkouts) AS "monthlycheckouts"
FROM checkouts_full
	WHERE materialtype IN
	('AUDIOBOOK','MOVIE','MUSIC',
	'MUSICSNDREC','SONG','SOUNDCASS',
	'SOUNDDISC','SOUNDREC','TELEVISION',
	'VIDEO','VIDEOCART','VIDEOCASS',
	'VIDEODISC','VIDEOREC')
GROUP BY 
	checkoutyear,
	checkoutmonth,
	checkouttype
ORDER BY
	checkoutyear ASC,
	checkoutmonth ASC,
	checkouttype ASC;

\copy av_formats_only
	TO 'av_formats.csv'
	WITH CSV HEADER;

LINK: av_formats.csv

...But when we add back in the physical checkouts to the AV formats, we find that physical formats like DVDs, BluRays, CDs (etc) count for the solid majority of checkouts. However, checkout data alone is unclear on why: There might be a better selection of titles available physically, the patrons might be unfamiliar with the library's streaming services (which, in my experience, is common), or the affordability of DVD/Blu-Ray players when compared with computers and streaming subscriptions could be a factor.

2. Single-Year Drilldown: 2017

Monthly checkout totals, platform split

-- 2017 Rollup

CREATE TABLE monthly_rollup_2017
AS SELECT
	checkoutyear,
	checkoutmonth,
	usageclass,
	checkouttype,
	materialtype,
	SUM(checkouts) AS "monthlycheckouts"
FROM checkouts_full
WHERE checkoutyear = 2017 
GROUP BY 
	checkoutyear,
	checkoutmonth,
	usageclass,
	checkouttype,
	materialtype
ORDER BY
	checkoutyear ASC,
	checkoutmonth ASC,
	usageclass ASC,
	checkouttype ASC,
	materialtype ASC;

\copy monthly_rollup_2017
	TO 'monthly_rollup_2017.csv'
	WITH CSV HEADER;

LINK: monthly_rollup_2017.csv

Month-to-month, checkout totals remained fairly consistent at around 770k, with a slight uptick during the summer. The platform percentage splits were also fairly consistent, with physical checkouts hovering around 70%. There's a small increase in Overdrive checkouts in December, which I'm guessing could be from people checking out digital material for their holiday travels.

How many times is an item typically checked out per month?

CREATE TABLE checkouts_2017
AS SELECT *
	FROM checkouts_full
	WHERE checkoutyear = 2017 
ORDER BY
	checkoutyear ASC,
	checkoutmonth ASC,
	usageclass ASC;

WITH c1 (count) AS (
	SELECT COUNT(checkouts)
	FROM checkouts_2017
	WHERE checkouts = 1),
c2 (count) AS (
	SELECT COUNT(checkouts)
	FROM checkouts_2017
	WHERE checkouts = 2),
c3 (count) AS (
	SELECT COUNT(checkouts)
	FROM checkouts_2017
	WHERE checkouts = 3),
c4 (count) AS (
	SELECT COUNT(checkouts)
	FROM checkouts_2017
	WHERE checkouts = 4),
c5 (count) AS (
	SELECT COUNT(checkouts)
	FROM checkouts_2017
	WHERE checkouts = 5),
c6 (count) AS (
	SELECT COUNT(checkouts)
	FROM checkouts_2017
	WHERE checkouts > 5)
SELECT * FROM
	c1 AS "1 Checkout", 
	c2 AS "2 Checkouts",
	c3 AS "3 Checkouts",
	c4 AS "4 Checkouts",
	c5 AS "5 Checkouts",
	c6 AS ">5 Checkouts";
	

I was also curious to see how often an particular title was typically checked out per month. The results of this cut are interesting: About half of all the titles in 2017 were checked out only once, with each increasing checkout decreasing in percentage. One takeaway here could be the breadth of a library's collection is a significant factor in its usage; patrons are just as interested in the "deep cuts" as the "hits."

But there's also some red flags here! For example, we don't know how many copies of each item the SPL stocks: If they only have one copy, it's not going to be checked out more than once. Furthermore, when you look at this at the checkout-level rather than per-item, it's clear that single-monthly-checkouts account for only around 12% of total checkouts, and the "hits" are indeed driving most of the total checkout count.

3. 2017's Hot Content 🔥

Finally, let's take a look at 2017's 1,000 most-checked-out items for the year. We'll start by listing the Top 20 titles by total checkouts (drumroll please):

Check.
Title
◼︎ Ph. / ◼︎ Dig.
Media
7072
SPL HotSpot
◼︎ Phys.
Mixed
6028
In-Building Device
◼︎ Phys.
Mixed
© Geekwire

Reality check! The top 2 items are mobile HotSpots and device checkouts. For context: in 2005, Google granted $225k to the SPL to purchase 150 mobile hotspots to lend to patrons. This program has been expanded over the years and replicated at other libraries. On the device side, the SPL provides laptops and iPads. This is an important reminder that, beyond just having stuff, libraries provide critical services to citizens without access to technology.

But moving on to the top 20 titles:

Check.
Title
◼︎ Ph. / ◼︎ Dig.
Media
4711
The Economist
◼︎ Dig.
Magazine
4048
Arrival
◼︎ Phys.
Videodisc
3790
The New Yorker
◼︎ Dig.
Magazine
3525
The Underground Railroad
◼︎ Phys.
Book
3458
Hillbilly Elegy
◼︎ Phys.
Book
3238
Hidden Figures
◼︎ Phys.
Videodisc
3184
Girl on the Train
◼︎ Dig.
Ebook
3184
Moonlight
◼︎ Phys.
Videodisc
3082
The Goldfinch
◼︎ Dig.
Ebook
3065
Manchester by the Sea
◼︎ Phys.
Videodisc
2965
La La Land
◼︎ Phys.
Videodisc
2944
Girl on the Train
◼︎ Phys.
Videodisc
2853
Moana
◼︎ Phys.
Videodisc
2778
The Accountant
◼︎ Phys.
Videodisc
2766
The Underground Railroad
◼︎ Dig.
Ebook
2693
Sully
◼︎ Phys.
Videodisc
2679
Hillbilly Elegy
◼︎ Dig.
Ebook
2672
Hacksaw Ridge
◼︎ Dig.
Ebook
2654
Passengers
◼︎ Phys.
Videodisc
2631
Lion
◼︎ Phys.
Videodisc
-- 2017 Top 20

CREATE VIEW top_20_2017 AS
SELECT title, usageclass, materialtype, 
	SUM(checkouts) AS total_checkouts
	FROM checkouts_2017
	GROUP BY title, usageclass, materialtype
	ORDER BY total_checkouts DESC
	LIMIT 22;

\copy (SELECT * FROM top_20_2017)
	TO 'top20.csv' WITH csv header;

LINK: top20.csv

There are some interesting findings here:

In fact, if you split 2017's top 1,000 items by format, you can see that 73% are physical.

...but about 2/3 of those physical checkouts are DVDs/Blu-Rays. Another interesting thing here is that it appears audiobook usage has transitioned primarily to digital. Digital audiobook checkouts account for 5.1% of the top 1,000, about a quarter of the digital total, compared to a measly 0.2% for the physical "Sounddisc" media type, which includes both music and audiobooks. (Note: The "Physical: MIXED" slice represents the HotSpot and device checkouts referenced earlier.)

And if we zoom in on books specifically, we find some interesting trends. Let's look at the top 10 most-checked-out authors (across all their titles) in both physical and digital formats.

Top Authors, Physical & Digital

-- 2017 top authors

CREATE VIEW authors_2017 AS
SELECT creator, usageclass,
	SUM(checkouts) AS total_checkouts
	FROM checkouts_2017
	GROUP BY creator, usageclass
	ORDER BY total_checkouts DESC;

\copy (SELECT * FROM authors_2017)
	TO 'authors_2017.csv' WITH csv header;

LINK: authors_2017.csv

On the physical side, 9 of the top 10 are children's authors, with Mo Willem (author of the "Pigeon" and "Elephant & Piggie" series) with a significant lead at #1. The lone adult author on the physical list is James Patterson...

...who tops the list of digital checkouts. The authors on the digital list write primarily long-running mystery/thriller series, with Stephen King and Neil Gaimen working in horror and sci-fi. Two of the three women authors on this list, Nora Roberts and Janet Evanovitch, were both prolific romance writers before crossing over into mystery/thrillers.

Putting the top digital and physical authors side-by-side, I noticed that Patterson's digital total was higher than his physical one. Curious about the other top digital authors, I checked out their physical totals, and found this was true of all 10.

Going back to a question from the beginning, about when the percentage of digital checkouts might surpass physical ones, this finding suggests that this crossover may have already happened for particular book genres or readership demographics.

4. 2020: COVID-19 & Beyond

Any smooth trends in physical vs. digital checkouts were drastically interrupted by the effects of COVID-19; as the SPL closed in mid-March, physical checkouts dropped almost to zero. While digital checkouts have increased during the library's closure, these gains don't come close to making up for the lost physical checkouts. It's difficult to predict how much of this increase in digital traffic will continue after the SPL can safely re-open.

Anecdotally: At Green Apple, we've averaged around 2,000 web orders per year over the last decade. From March to December of 2020, we've had around 50,000, and the majority of customers who place orders on our web site are doing so for the first time. I imagine this general increase in awareness of digital offerings from outlets beyond Amazon, Apple, Netflix, (etc) is being experienced by many other organizations as well.

But the unfortunate wrinkle with libraries relates back to the Top 2 checked-out items of 2017: They're are one of the few places which provide free access to technology for individuals who otherwise lack it. With budget cuts already a common experience among American libraries, putting more resources toward digital offerings could result in decreased resources for these critical social services in the future. ♦︎