Feb-02-2020, 03:32 AM
(This post was last modified: Jun-19-2021, 02:57 AM by Larz60+.
Edit Reason: Fixed font size
)
Introduction
This tutorial is a very basic Introduction to SqlAlchemy.
Although written for sqlite, the same model can be used for other DBMS with a one line code change.
The tutorial will use the city data from the Connecticut State Library website at https://ctstatelibrary.org/cttowns/counties
to create a database containing just one table 'ConnecticutCity', which is defined as:
To set up the project environment, please perform the following operations (internet access is required):
Path definitions
The first module will define and create (if not already available) the data directory structure using pathlib. This script also contains filenames and URL's that are used for the data collection.
cut and paste the SqlPaths.py code below to the src directory:
project structure should now look like:
SqlAlchemyExample/
├── data
│ ├── cache
│ ├── csv
│ ├── database
│ └── tmp
├── requirements.txt
├── src
│ └── SqlPaths.py
├── utils
│ ├── geckodriver
│ └── geckodriver-v0.26.0-linux64.tar.g
└── venv
...
Capture Connecticut Data
next in src directory create a program that will use selenium to download the data from ctstatelibrary.org and create a .csv file in the data/csv directory. This will be used to populate the database table that will be created later.
This code used selenium to scrape the data. I will not get into details as it does not apply to SqlAlchemy.
If you would like to know more about how selenium works, i refer you to the following tutorials (which were written by Snippsat, and on this forum)
web scraping part 1
web scraping part 2
Cut and paste CreateCityFile.py to src directory
Database Model
In SqlAlchemy ORM databases are defined in a Model.
Within the module, the database is described with a create_engine statement.
The following description is taken from sqlalchemy.org:
The Engine is the starting point for any SQLAlchemy application. It’s “home base” for the actual database and its DBAPI, delivered to the SQLAlchemy application through a connection pool and a Dialect, which describes how to talk to a specific kind of database/DBAPI combination.
(click to expand)
Creating an engine is just a matter of issuing a single call, create_engine()
Please refer to https://docs.sqlalchemy.org/en/13/core/engines.html for a complete description.
in this application, the engine is created below on line 17.
tables will be created using declarative_base
The new base class will be given a metaclass that produces appropriate Table objects and makes the appropriate mapper() calls based on the information provided declaratively in the class and any subclasses of the class.
For complete description see see: https://docs.sqlalchemy.org/en/13/orm/ex...e/api.html
Each table is constructed using an individual class.
This model will only have one table (with several indexes) but more tables can be added by
adding additional classes, one for each table.
Things to be included in each class:
This tutorial is a very basic Introduction to SqlAlchemy.
Although written for sqlite, the same model can be used for other DBMS with a one line code change.
The tutorial will use the city data from the Connecticut State Library website at https://ctstatelibrary.org/cttowns/counties
to create a database containing just one table 'ConnecticutCity', which is defined as:
Output:'ConnecticutCity' (
`id` INTEGER NOT NULL,
`TownName` VARCHAR ( 40 ),
`County` VARCHAR ( 40 ),
`YearEstablished` VARCHAR ( 10 ),
`ParentTown` VARCHAR ( 40 ),
`HistoryOfIncorporation` VARCHAR ( 500 ),
PRIMARY KEY(`id`)
);
Will cover the following:- Download the data to a CSV file
- Define a 'paths' module named SqlPaths.py which will create the necessary data directory structure
- Define a database model (which for this tutorial will only include a single table 'ConnecticutCity')
- Create initial database from the Model
- Populate the table from created CSV file
- Query 'ConnecticutCity' table
To set up the project environment, please perform the following operations (internet access is required):
- create a new directory named SqlAlchemyExample
- run following commands:
- Open a terminal window
- cd SqlAlchemyExample
- python -m venv venv
- . ./venv/bin/activate
if entered correctly you should see (venv) before prompt
- pip install --upgrade pip
- Copy the requirements.txt data below to requirements.txt in the new SqlAlchemyExample directory
- pip install -r requirements.txt
- pip list
compare installed results to requirements.txt
- mkdir utils
- download latest version of gekodriver from here: https://github.com/mozilla/geckodriver/releases to utils directory
make sure you select the 64 bit version (unless your computer is 32 bit)
- Extract the tar file and copy file named gekodriver to venv/bin
- if you use chrome, download the latest version of chromedriver to utils from: https://chromedriver.chromium.org/downloads
make sure version matches your chrome installation
- Extract the driver and copy to venv/bin
- mkdir src
- Open a terminal window
Path definitions
The first module will define and create (if not already available) the data directory structure using pathlib. This script also contains filenames and URL's that are used for the data collection.
cut and paste the SqlPaths.py code below to the src directory:
from pathlib import Path import os class SqlPaths: def __init__(self): os.chdir(os.path.abspath(os.path.dirname(__file__))) homepath = Path('.') rootpath = homepath / '..' self.datapath = rootpath / 'data' self.datapath.mkdir(exist_ok=True) self.cachepath = self.datapath / 'cache' self.cachepath.mkdir(exist_ok=True) self.csvpath = self.datapath / 'csv' self.csvpath.mkdir(exist_ok=True) self.dbpath = self.datapath / 'database' self.dbpath.mkdir(exist_ok=True) self.tmppath = self.datapath / 'tmp' self.tmppath.mkdir(exist_ok=True) self.sample_db = self.dbpath / 'sample.db' self.city_list_url = 'https://ctstatelibrary.org/cttowns/counties' self.citycsv = self.csvpath / 'city.csv' if __name__ == '__main__': SqlPaths()when done run the code from command line: using following commands
- cd .../SqlAlchemyExample
replace ... with your path
- python src/SqlPaths.py
project structure should now look like:
SqlAlchemyExample/
├── data
│ ├── cache
│ ├── csv
│ ├── database
│ └── tmp
├── requirements.txt
├── src
│ └── SqlPaths.py
├── utils
│ ├── geckodriver
│ └── geckodriver-v0.26.0-linux64.tar.g
└── venv
...
Capture Connecticut Data
next in src directory create a program that will use selenium to download the data from ctstatelibrary.org and create a .csv file in the data/csv directory. This will be used to populate the database table that will be created later.
This code used selenium to scrape the data. I will not get into details as it does not apply to SqlAlchemy.
If you would like to know more about how selenium works, i refer you to the following tutorials (which were written by Snippsat, and on this forum)
web scraping part 1
web scraping part 2
Cut and paste CreateCityFile.py to src directory
from selenium import webdriver from selenium.webdriver.common.by import By from bs4 import BeautifulSoup from SqlPaths import SqlPaths import csv import time import sys class CreateCityFile: def __init__(self): self.spath = SqlPaths() self.get_city_info() def start_browser(self): caps = webdriver.DesiredCapabilities().FIREFOX caps["marionette"] = True self.browser = webdriver.Firefox(capabilities=caps) def stop_browser(self): self.browser.close() def get_city_info(self, remove_savefile=False): savefile = self.spath.cachepath / "citypage.html" if remove_savefile: savefile.unlink(missing_ok=True) if savefile.exists(): with savefile.open() as fp: page = fp.read() else: self.start_browser() self.browser.get(self.spath.city_list_url) time.sleep(2) page = self.browser.page_source with savefile.open("w") as fp: fp.write(page) self.stop_browser() with self.spath.citycsv.open("w") as fp: cwrt = csv.writer(fp, delimiter="|") soup = BeautifulSoup(page, "lxml") table = soup.find( "table", { "summary": "This table displays Connecticut towns and the year of their establishment." }, ) trs = table.tbody.find_all("tr") for n, tr in enumerate(trs): contigname = "Unspecified" if n == 0: header = [] for td in self.get_td(tr): header.append(td.p.b.i.text.strip()) cwrt.writerow(header) else: row = [] for n1, td in enumerate(self.get_td(tr)): if td.p and len(td.p): value = td.p.text.strip() if value == '—-': value = 'Unspecified' else: value = 'Unspecified' value = value.replace('"', "'") row.append(value) cwrt.writerow(row) def get_td(self, tr): tds = tr.find_all("td") for td in tds: yield td if __name__ == "__main__": CreateCityFile()once done:
- Make sure you have internet access
- change directory to SqlAlchemyExample
- python src/CreateCityFile.py
- Check /data/csv for file named: city.csv
Database Model
In SqlAlchemy ORM databases are defined in a Model.
Within the module, the database is described with a create_engine statement.
The following description is taken from sqlalchemy.org:
The Engine is the starting point for any SQLAlchemy application. It’s “home base” for the actual database and its DBAPI, delivered to the SQLAlchemy application through a connection pool and a Dialect, which describes how to talk to a specific kind of database/DBAPI combination.
(click to expand)
Creating an engine is just a matter of issuing a single call, create_engine()
Please refer to https://docs.sqlalchemy.org/en/13/core/engines.html for a complete description.
in this application, the engine is created below on line 17.
tables will be created using declarative_base
The new base class will be given a metaclass that produces appropriate Table objects and makes the appropriate mapper() calls based on the information provided declaratively in the class and any subclasses of the class.
For complete description see see: https://docs.sqlalchemy.org/en/13/orm/ex...e/api.html
Each table is constructed using an individual class.
This model will only have one table (with several indexes) but more tables can be added by
adding additional classes, one for each table.
Things to be included in each class:
- the ORM definition of the table itself.
this is done on lines 23 through 31- always starts with the __tablename__ definition
for this example table will be named 'ConnecticutCity'
line 23:__tablename__ = "ConnecticutCity"
- Next is an id used internally by SqlAlchemy:
id = Column(Integer, primary_key=True)
- And then the definitions of each column within the table.
see: https://docs.sqlalchemy.org/en/13/orm/ma...lumns.html
NOTE Indexes are added to column definitions see script.
- Further reading: https://docs.sqlalchemy.org/en/13/orm/ex...c_use.html
- Next you need an __init__ method (lines 33-39) which defined each field for external interface.
- Though not necessary, I usually include an __repr__ method for printing the class
- any modules for common table tasks (vary by application)
the statement on line 51:Base.metadata.create_all(engine)
creates the new database
Save the following code in the src directory as DatabaseModel.py:
from sqlalchemy import ( Column, String, Integer, Date, create_engine, DateTime, ForeignKey, ) from sqlalchemy.ext.declarative import declarative_base from sqlalchemy.orm import relationship, backref from datetime import datetime from SqlPaths import SqlPaths spath = SqlPaths() engine = create_engine(f"sqlite:///{spath.sample_db}") Base = declarative_base() class ConnecticutCity(Base): __tablename__ = "ConnecticutCity" id = Column(Integer, primary_key=True) TownName = Column(String(40), index=True, unique=True) County = Column(String(40), index=True) YearEstablished = Column(String(10), index=True) ParentTown = Column(String(40), index=True) HistoryOfIncorporation = Column(String(500)) def __init__(self, TownName, County, YearEstablished, ParentTown, HistoryOfIncorporation): self.TownName = TownName self.County = County self.YearEstablished = YearEstablished self.ParentTown = ParentTown self.HistoryOfIncorporation = HistoryOfIncorporation def __repr__(self): return ( f"<ConnecticutCity {self.TownName}, {self.County}, " \ f"{self.YearEstablished}, {self.ParentTown}, " \ f"{self.HistoryOfIncorporation}>" ) def main(): Base.metadata.create_all(engine) if __name__ == "__main__": main()
Once you have created this module, run as follows to create the database in data/database- cd SqlAlchemyExample
- python src/DatabaseModel.py
- Check data/database you should find sample.db
The database should look like (click to expand)
Populate Table
The following code will add the data created by CreateCityFile.py (city.csv) to the database.
The steps required are quite simple:- import SqlPaths
defines where files are located
- from sqlalchemy.orm import sessionmaker
used to create session with database
- import DatabaseModel
defines the database table(s) only one table for this example but usually will have many table classes.
- import csv
to process city.csv
- Define database model
- link model to sqlalchmy engine
- create database session
- read csv data and add to database
- commit session
- close session
Save the following code in the src directory as LoadTables.py
from SqlPaths import SqlPaths from sqlalchemy.orm import sessionmaker import DatabaseModel import csv import sys class LoadTables: def __init__(self): self.spath = SqlPaths() self.dmap = DatabaseModel self.db = self.dmap.engine def load_db(self): db = self.db Session = sessionmaker(bind=db) Session.configure(bind=db) session = Session() with self.spath.citycsv.open() as fp: crdr = csv.reader(fp, delimiter='|') next(crdr) # Skip header for row in crdr: cityrec = self.dmap.ConnecticutCity( TownName = row[0], County = row[1], YearEstablished = row[2], ParentTown = row[3], HistoryOfIncorporation = row[4]) session.add(cityrec) session.commit() session.close() def main(): lt = LoadTables() lt.load_db() if __name__ == '__main__': main()
Once you have created this module, run as follows to add the city data to database- cd SqlAlchemyExample
- python src/LoadTables.py
- with dbbrowser or other sqlite database tool, examine sample.db
from 'Execute SQL' tab run query:select * from ConnecticutCity;
results should be like image below.
Query
The following script uses SqlAlchemy ORM for two simple queries
The first is a simple list of all cities and towns in Connecticut
The second extracts all cities and towns in any county that begins with 'New' (2 in Connecticut) ordered by County, TownName
import DatabaseModel from sqlalchemy.orm import sessionmaker from SqlPaths import SqlPaths class QueryCityTable: def __init__(self): self.spath = SqlPaths() self.dmap = DatabaseModel db = self.dmap.engine self.Session = sessionmaker(bind=db) self.Session.configure(bind=db) self.session = self.Session() self.city_table = self.dmap.ConnecticutCity def all_towns(self): citieslist = self.session.query(self.city_table).all() print(f"\n=================================") print(f"List of all cities in Connecticut") for row in citieslist: print(f"{row.TownName}") def filter_by_county(self): ctab = self.city_table citylist = self.session.query(ctab).filter(ctab.County.like('New%')).order_by(ctab.County).all() print(f"\n===================================================================================") print(f"List of all cities in all counties where county name starts with New alphabetically") for element in citylist: print(element.County, element.TownName) def main(): qct = QueryCityTable() qct.all_towns() qct.filter_by_county() if __name__ == '__main__': main()
Partial results:
RequirementsOutput:================================= List of all cities in Connecticut Andover Ansonia Ashford Avon Barkhamsted Beacon Falls Berlin Bethany ... =================================================================================== List of all cities in all counties where county name starts with New alphabetically New Haven Ansonia New Haven Beacon Falls New Haven Bethany New Haven Branford New Haven Cheshire New Haven Derby New Haven East Haven New Haven Guilford New Haven Hamden New Haven Madison New Haven Meriden New Haven Middlebury New Haven Milford New Haven Naugatuck New Haven New Haven New Haven North Branford New Haven North Haven New Haven Orange New Haven Oxford New Haven Prospect New Haven Seymour New Haven Southbury New Haven Wallingford New Haven Waterbury New Haven West Haven New Haven Wolcott New Haven Woodbridge New London Bozrah New London Colchester New London East Lyme New London Franklin New London Griswold New London Groton New London Lebanon New London Ledyard New London Lisbon New London Lyme New London Montville New London New London New London North Stonington New London Norwich New London Old Lyme New London Preston New London Salem New London Sprague New London Stonington New London Voluntown New London Waterford
Python 3.6 or newer
Tutorial was created on Linux, not tested on other Operating Systems, but should work with possibly some minor adjustments.
requirements.txt:
Output:appdirs==1.4.3 attrs==19.3.0 beautifulsoup4==4.8.2 certifi==2019.11.28 chardet==3.0.4 Click==7.0 idna==2.8 lxml==4.5.0 pathspec==0.7.0 regex==2020.1.8 requests==2.22.0 selenium==3.141.0 soupsieve==1.9.5 SQLAlchemy==1.3.13 toml==0.10.0 typed-ast==1.4.1 urllib3==1.25.8 - always starts with the __tablename__ definition