PFCH - Videos

These videos are lectures and demos from the Programming for Cultural Heritage course. This class focuses on building skills in the Python programming language, we often use resources from cultural heritage (museums, libraries) but it is not limited to that domain. Lecture videos cover a general area of interest and are cumulative building off of previous videos. The demo videos are shorter videos on a specific task that use skills learned in the lectures.

Created by Matt Miller
Creative Commons License This work is licensed under a Creative Commons Attribution 4.0 International License.

Lectures


  1. Command Line
  2. Installing Python
  3. Introduction to Python
  4. Reading CSV files
  5. Reading JSON files
  6. Writing CSV
  7. Writing JSON
  8. Working with XML
  9. Installing Modules with PIP
  10. Working with APIs
  11. Regular Expressions
  12. Web Scraping
  13. Git & Github

Command Line

Short lecture on using the command line on mac and pc, focus on using tools to parse and extract data from CSV files. The goal of this video is to get comfortable using the command line. We use data from the NYPL What’s on the Menu project and Open NYC Data

Installing Python

How to get started installing python on your computer. Using Mac + Windows, also configuring visual studio code option and showing python notebooks.

Introduction to Python programming language

We look at the basics of python and go over many facets of the langauge like variables, if statements, for loops, etc.

Reading & Writing CSV files

We look at reading and writing files using the Open function and then reading CSV files using the CSV module.

Reading & Writing JSON files

We look at using the JSON module to load large JSON files and loop through them.

Writing CSV

We're writing a CSV file in this video. Specifically reducing a very large 2GB CSV to a smaller CSV file.

Writing JSON

A quick look at using the json module to write out to json files.

Working with XML

We go through what XML is, how to read and write it in python and do a challenge involving ETL (extract, transform, load) script where we transform a CSV into an EAD XML finding aid.

Installing Modules with PIP

To use external modules (libraries) in python you need to install them. This video shows how to use pip to install them on mac, windows, visual studio code and google colab notebooks.

Working with APIs

We dive into interacting with APIs via python and the requests module. We look at working with the Smithsonian Museums API and try querying it.

Regular Expressions

An introduction to writing Regular Expressions in python

Web Scraping

We use python to web scrape two sites, the Frick museum and the Milwaukee Art Museum. Lots of details on problems you can encounter while web scraping.

Git & Github

We look at using Git and Github on Mac and PC.

Demos


  1. Programmatic Wikidata
  2. GPT2 with Python on Mac
  3. Twitter API Part 1 - Searching
  4. Twitter API Part 2 - Posting
  5. Twitter API Part 3 - Bots
  6. Google Places API
  7. Working with Genius.com API
  8. Working with Google Sheets API
  9. RAWG Game DB API
  10. MARC XML
  11. Simple Functions
  12. GLOB File Managment
  13. Github Pages Hosting
You can request a demo video by filling out this request form. This is open to anyone who would like to see a demo of how to do something in Python.
Request Video

Programmatic Wikidata

We look at using python to interact with Wikidata. We run SPARQL queries from python, both using general queries and lookup by Q Id. And access the results and look retrieve entity data from the Special:EntityData endpoint.

GPT2 with Python on Mac

We use the python module gpt-2-simple (https://github.com/minimaxir/gpt-2-si...) to generate text using the 124M model. We run into some problems with needed to run different version of python so we install pyenv to run an older version of python with an older version of tensorflow.

Twitter API Part 1 - Searching

We look at using the Twitter Search V2 API.

Twitter API Part 2 - Posting

We look at posting to twitter using the Tweepy python module.

Twitter API Part 3 - Bots

In this video we create 2(!) twitter bots, a random taco bot and a Niles and Frasier dialog bot. We also build an AWS Lambda for them to run on. This is a long video but there are many sections:

Intro - About the different types of twitter bots and their requirements
08:00 - Start building the data for our TACO BOT
35:00 - Start building the twitter post a AWS Lambda setup
1:04:45 - Start building the Niles and Frasier dialog bot using Beautiful Soup to do web scraping to build the data.

Google Places API

We use the Google Places API to look up information about restaurants in NYC in a specific geographical area.

Working with Genius.com API

We retrieve data from the Genius API using the LyricsGenius python module

Working with Google Sheets API

We look at using python to work with Google Sheets. Reading data and writing data. We access the sheet as a JSON file and use the pygsheets modules to update data in a sheet.

RAWG Game DB API

A quick look at using the RAWG Game DB API via a python script.

MARC XML

A quick look at MARC XML parsing using pymarc using data from https://data.nls.uk/.

Functions

A brief intro to writing functions. We write a wrapper for the Brooklyn Museum API using our own functions.

GLOB File Managment

We use the Harvard Art Museums API to download data locally and use glob to parse the records.
We also look at https://americanarchive.org to download XML PBCore files using web scraping and glob to manage the files.

Github Pages hosting with custom domain

We register a domain and setup Github Page hosting.