SqlAlchemy Tutorial - Basic Model, database Creation and Data Load

**Larz60+**

Introduction

This tutorial is a very basic Introduction to SqlAlchemy.

Although written for sqlite, the same model can be used for other DBMS with a one line code change.
The tutorial will use the city data from the Connecticut State Library website at https://ctstatelibrary.org/cttowns/counties
to create a database containing just one table 'ConnecticutCity', which is defined as:

Output:'ConnecticutCity' (
	`id`                        INTEGER NOT NULL,
	`TownName`                  VARCHAR ( 40 ),
	`County`                    VARCHAR ( 40 ),
	`YearEstablished`           VARCHAR ( 10 ),
	`ParentTown`                VARCHAR ( 40 ),
	`HistoryOfIncorporation`    VARCHAR ( 500 ),
	PRIMARY KEY(`id`)
);

Will cover the following:

Download the data to a CSV file
Define a 'paths' module named SqlPaths.py which will create the necessary data directory structure
Define a database model (which for this tutorial will only include a single table 'ConnecticutCity')
Create initial database from the Model
Populate the table from created CSV file
Query 'ConnecticutCity' table

To set up the project environment, please perform the following operations (internet access is required):

create a new directory named SqlAlchemyExample
run following commands:
- Open a terminal window
- cd SqlAlchemyExample
- python -m venv venv
- . ./venv/bin/activate
  if entered correctly you should see (venv) before prompt
- pip install --upgrade pip
- Copy the requirements.txt data below to requirements.txt in the new SqlAlchemyExample directory
- pip install -r requirements.txt
- pip list
  compare installed results to requirements.txt
- mkdir utils
- download latest version of gekodriver from here: https://github.com/mozilla/geckodriver/releases to utils directory
  make sure you select the 64 bit version (unless your computer is 32 bit)
- Extract the tar file and copy file named gekodriver to venv/bin
- if you use chrome, download the latest version of chromedriver to utils from: https://chromedriver.chromium.org/downloads
  make sure version matches your chrome installation
- Extract the driver and copy to venv/bin
- mkdir src

Path definitions

The first module will define and create (if not already available) the data directory structure using pathlib. This script also contains filenames and URL's that are used for the data collection.

cut and paste the SqlPaths.py code below to the src directory:

from pathlib import Path
import os


class SqlPaths:
    def __init__(self):
        os.chdir(os.path.abspath(os.path.dirname(__file__)))
        homepath = Path('.')
        rootpath = homepath / '..'

        self.datapath = rootpath / 'data'
        self.datapath.mkdir(exist_ok=True)

        self.cachepath = self.datapath / 'cache'
        self.cachepath.mkdir(exist_ok=True)

        self.csvpath = self.datapath / 'csv'
        self.csvpath.mkdir(exist_ok=True)

        self.dbpath = self.datapath / 'database'
        self.dbpath.mkdir(exist_ok=True)

        self.tmppath = self.datapath / 'tmp'
        self.tmppath.mkdir(exist_ok=True)
        
        self.sample_db = self.dbpath / 'sample.db'

        self.city_list_url = 'https://ctstatelibrary.org/cttowns/counties'
        self.citycsv = self.csvpath / 'city.csv'


if __name__ == '__main__':
    SqlPaths()

when done run the code from command line: using following commands

cd .../SqlAlchemyExample
replace ... with your path
python src/SqlPaths.py

this will create a new folder in the SqlAlchemyExample directory and add all sub directories needed for the project.
project structure should now look like:

SqlAlchemyExample/
├── data
│   ├── cache
│   ├── csv
│   ├── database
│   └── tmp
├── requirements.txt
├── src
│   └── SqlPaths.py
├── utils
│   ├── geckodriver
│   └── geckodriver-v0.26.0-linux64.tar.g
└── venv
...

Capture Connecticut Data

next in src directory create a program that will use selenium to download the data from ctstatelibrary.org and create a .csv file in the data/csv directory. This will be used to populate the database table that will be created later.

This code used selenium to scrape the data. I will not get into details as it does not apply to SqlAlchemy.
If you would like to know more about how selenium works, i refer you to the following tutorials (which were written by Snippsat, and on this forum)

web scraping part 1
web scraping part 2

Cut and paste CreateCityFile.py to src directory

from selenium import webdriver
from selenium.webdriver.common.by import By
from bs4 import BeautifulSoup
from SqlPaths import SqlPaths
import csv
import time
import sys


class CreateCityFile:
    def __init__(self):
        self.spath = SqlPaths()
        self.get_city_info()

    def start_browser(self):
        caps = webdriver.DesiredCapabilities().FIREFOX
        caps["marionette"] = True
        self.browser = webdriver.Firefox(capabilities=caps)

    def stop_browser(self):
        self.browser.close()

    def get_city_info(self, remove_savefile=False):
        savefile = self.spath.cachepath / "citypage.html"
        if remove_savefile:
            savefile.unlink(missing_ok=True)

        if savefile.exists():
            with savefile.open() as fp:
                page = fp.read()
        else:
            self.start_browser()
            self.browser.get(self.spath.city_list_url)
            time.sleep(2)
            page = self.browser.page_source
            with savefile.open("w") as fp:
                fp.write(page)
            self.stop_browser()

        with self.spath.citycsv.open("w") as fp:
            cwrt = csv.writer(fp, delimiter="|")
            soup = BeautifulSoup(page, "lxml")
            table = soup.find(
                "table",
                {
                    "summary": "This table displays Connecticut towns and the year of their establishment."
                },
            )
            trs = table.tbody.find_all("tr")
            for n, tr in enumerate(trs):
                contigname = "Unspecified"
                if n == 0:
                    header = []
                    for td in self.get_td(tr):
                        header.append(td.p.b.i.text.strip())
                    cwrt.writerow(header)
                else:
                    row = []
                    for n1, td in enumerate(self.get_td(tr)):
                        if td.p and len(td.p):
                            value = td.p.text.strip()
                            if value == '—-':
                                value = 'Unspecified'
                        else:
                            value = 'Unspecified'
                        value = value.replace('"', "'")
                        row.append(value)
                    cwrt.writerow(row)

    def get_td(self, tr):
        tds = tr.find_all("td")
        for td in tds:
            yield td


if __name__ == "__main__":
    CreateCityFile()

once done:

Make sure you have internet access
change directory to SqlAlchemyExample
python src/CreateCityFile.py
Check /data/csv for file named: city.csv

Database Model

In SqlAlchemy ORM databases are defined in a Model.
Within the module, the database is described with a create_engine statement.
The following description is taken from sqlalchemy.org:

The Engine is the starting point for any SQLAlchemy application. It’s “home base” for the actual database and its DBAPI, delivered to the SQLAlchemy application through a connection pool and a Dialect, which describes how to talk to a specific kind of database/DBAPI combination.

(click to expand)

Creating an engine is just a matter of issuing a single call, create_engine()

Please refer to https://docs.sqlalchemy.org/en/13/core/engines.html for a complete description.

in this application, the engine is created below on line 17.
tables will be created using declarative_base

The new base class will be given a metaclass that produces appropriate Table objects and makes the appropriate mapper() calls based on the information provided declaratively in the class and any subclasses of the class.

For complete description see see: https://docs.sqlalchemy.org/en/13/orm/ex...e/api.html

Each table is constructed using an individual class.

This model will only have one table (with several indexes) but more tables can be added by
adding additional classes, one for each table.

Things to be included in each class:

the ORM definition of the table itself.
this is done on lines 23 through 31

always starts with the __tablename__ definition
for this example table will be named 'ConnecticutCity'
line 23: __tablename__ = "ConnecticutCity"
Next is an id used internally by SqlAlchemy:
id = Column(Integer, primary_key=True)
And then the definitions of each column within the table.
see: https://docs.sqlalchemy.org/en/13/orm/ma...lumns.html
NOTE Indexes are added to column definitions see script.
Further reading: https://docs.sqlalchemy.org/en/13/orm/ex...c_use.html
Next you need an __init__ method (lines 33-39) which defined each field for external interface.
Though not necessary, I usually include an __repr__ method for printing the class
any modules for common table tasks (vary by application)

the statement on line 51: Base.metadata.create_all(engine)
creates the new database

Save the following code in the src directory as DatabaseModel.py:

from sqlalchemy import (
    Column,
    String,
    Integer,
    Date,
    create_engine,
    DateTime,
    ForeignKey,
)
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import relationship, backref
from datetime import datetime
from SqlPaths import SqlPaths


spath = SqlPaths()
engine = create_engine(f"sqlite:///{spath.sample_db}")
Base = declarative_base()


class ConnecticutCity(Base):

    __tablename__ = "ConnecticutCity"

    id = Column(Integer, primary_key=True)

    TownName = Column(String(40), index=True, unique=True)
    County = Column(String(40), index=True)
    YearEstablished = Column(String(10), index=True)
    ParentTown = Column(String(40), index=True)
    HistoryOfIncorporation = Column(String(500))

    def __init__(self, TownName, County, YearEstablished, 
        ParentTown, HistoryOfIncorporation):
        self.TownName = TownName
        self.County = County
        self.YearEstablished = YearEstablished
        self.ParentTown = ParentTown
        self.HistoryOfIncorporation = HistoryOfIncorporation


    def __repr__(self):
        return (
            f"<ConnecticutCity {self.TownName}, {self.County}, " \
            f"{self.YearEstablished}, {self.ParentTown}, " \
            f"{self.HistoryOfIncorporation}>"
        )


def main():
    Base.metadata.create_all(engine)


if __name__ == "__main__":
    main()

Once you have created this module, run as follows to create the database in data/database

cd SqlAlchemyExample
python src/DatabaseModel.py
Check data/database you should find sample.db

The database should look like (click to expand)

Populate Table

The following code will add the data created by CreateCityFile.py (city.csv) to the database.

The steps required are quite simple:

import SqlPaths
defines where files are located
from sqlalchemy.orm import sessionmaker
used to create session with database
import DatabaseModel
defines the database table(s) only one table for this example but usually will have many table classes.
import csv
to process city.csv
Define database model
link model to sqlalchmy engine
create database session
read csv data and add to database
commit session
close session

Save the following code in the src directory as LoadTables.py

from SqlPaths import SqlPaths
from sqlalchemy.orm import sessionmaker
import DatabaseModel
import csv
import sys
 
 
class LoadTables:
    def __init__(self):
        self.spath = SqlPaths()
        self.dmap = DatabaseModel
        self.db = self.dmap.engine
         
    def load_db(self):
        db = self.db
        Session = sessionmaker(bind=db)
        Session.configure(bind=db)
        session = Session()
 
        with self.spath.citycsv.open() as fp:
            crdr = csv.reader(fp, delimiter='|')
            next(crdr) # Skip header
            for row in crdr:
                cityrec = self.dmap.ConnecticutCity(
                    TownName = row[0],
                    County = row[1],
                    YearEstablished = row[2],
                    ParentTown = row[3],
                    HistoryOfIncorporation = row[4])
                session.add(cityrec)
        session.commit()
        session.close()
 
 
def main():
    lt = LoadTables()
    lt.load_db()
 
 
if __name__ == '__main__':
    main()

Once you have created this module, run as follows to add the city data to database

cd SqlAlchemyExample
python src/LoadTables.py
with dbbrowser or other sqlite database tool, examine sample.db
from 'Execute SQL' tab run query: select * from ConnecticutCity;
results should be like image below.

Query
The following script uses SqlAlchemy ORM for two simple queries
The first is a simple list of all cities and towns in Connecticut
The second extracts all cities and towns in any county that begins with 'New' (2 in Connecticut) ordered by County, TownName

import DatabaseModel
from sqlalchemy.orm import sessionmaker
from SqlPaths import SqlPaths


class QueryCityTable:
    def __init__(self):
        self.spath = SqlPaths()
        self.dmap = DatabaseModel
        db = self.dmap.engine
        self.Session = sessionmaker(bind=db)
        self.Session.configure(bind=db)
        self.session = self.Session()
        self.city_table = self.dmap.ConnecticutCity

    def all_towns(self):
        citieslist = self.session.query(self.city_table).all()
        print(f"\n=================================")
        print(f"List of all cities in Connecticut")
        for row in citieslist:
            print(f"{row.TownName}")

    def filter_by_county(self):
        ctab = self.city_table

        citylist = self.session.query(ctab).filter(ctab.County.like('New%')).order_by(ctab.County).all()
        print(f"\n===================================================================================")
        print(f"List of all cities in all counties where county name starts with New alphabetically")
        for element in citylist:
            print(element.County, element.TownName)

def main():
    qct = QueryCityTable()
    qct.all_towns()
    qct.filter_by_county()

if __name__ == '__main__':
    main()

Partial results:

Output:=================================
List of all cities in Connecticut
Andover
Ansonia
Ashford
Avon
Barkhamsted
Beacon Falls
Berlin
Bethany
...

===================================================================================
List of all cities in all counties where county name starts with New alphabetically
New Haven Ansonia
New Haven Beacon Falls
New Haven Bethany
New Haven Branford
New Haven Cheshire
New Haven Derby
New Haven East Haven
New Haven Guilford
New Haven Hamden
New Haven Madison
New Haven Meriden
New Haven Middlebury
New Haven Milford
New Haven Naugatuck
New Haven New Haven
New Haven North Branford
New Haven North Haven
New Haven Orange
New Haven Oxford
New Haven Prospect
New Haven Seymour
New Haven Southbury
New Haven Wallingford
New Haven Waterbury
New Haven West Haven
New Haven Wolcott
New Haven Woodbridge
New London Bozrah
New London Colchester
New London East Lyme
New London Franklin
New London Griswold
New London Groton
New London Lebanon
New London Ledyard
New London Lisbon
New London Lyme
New London Montville
New London New London
New London North Stonington
New London Norwich
New London Old Lyme
New London Preston
New London Salem
New London Sprague
New London Stonington
New London Voluntown
New London Waterford

Requirements

Python 3.6 or newer
Tutorial was created on Linux, not tested on other Operating Systems, but should work with possibly some minor adjustments.

requirements.txt:

Output:appdirs==1.4.3
attrs==19.3.0
beautifulsoup4==4.8.2
certifi==2019.11.28
chardet==3.0.4
Click==7.0
idna==2.8
lxml==4.5.0
pathspec==0.7.0
regex==2020.1.8
requests==2.22.0
selenium==3.141.0
soupsieve==1.9.5
SQLAlchemy==1.3.13
toml==0.10.0
typed-ast==1.4.1
urllib3==1.25.8

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	SQLAlchemy Dynamic model generation and Database load directly from CSV	Larz60+	0	5,433	Jun-11-2021, 09:02 AM Last Post: Larz60+

SqlAlchemy Tutorial - Basic Model, database Creation and Data Load

User Panel Messages

Announcements