学习如何使用Python的内置sqlite3模块处理SQLite数据库。 还将学习如何创建索引以加快查询速度。
假设你正在查阅一本书的内容。你希望能够更快地找到你正在寻找的信息。你会怎么做呢?你可能会查找术语索引,然后跳转到引用特定术语的页面。SQL中的索引的工作方式与书籍中的索引类似。
在Python中连接到数据库
cursor()
import sqlite3
# connect to the db
db_conn = sqlite3.connect('people_db.db')
db_cursor = db_conn.cursor()
# main.py
...
# create table
db_cursor.execute('''CREATE TABLE people (
id INTEGER PRIMARY KEY,
name TEXT,
email TEXT,
job TEXT)''')
...
# commit the transaction and close the cursor and db connection
db_conn.commit()
db_cursor.close()
db_conn.close()
使用Faker生成合成数据
$ pip install faker
# main.py
...
from faker import Faker
...
# create and insert records
fake = Faker() # be sure to import: from faker import Faker
Faker.seed(42)
num_records = 100000
for _ in range(num_records):
first = fake.first_name()
last = fake.last_name()
name = f"{first} {last}"
domain = fake.domain_name()
email = f"{first}.{last}@{domain}"
job = fake.job()
db_cursor.execute('INSERT INTO people (name, email, job) VALUES (?,?,?)', (name,email,job))
# commit the transaction and close the cursor and db connection
db_conn.commit()
db_cursor.close()
db_conn.close()
现在,main.py 文件具有以下代码:
# main.py
# imports
import sqlite3
from faker import Faker
# connect to the db
db_conn = sqlite3.connect('people_db.db')
db_cursor = db_conn.cursor()
# create table
db_cursor.execute('''CREATE TABLE people (
id INTEGER PRIMARY KEY,
name TEXT,
email TEXT,
job TEXT)''')
# create and insert records
fake = Faker()
Faker.seed(42)
num_records = 100000
for _ in range(num_records):
first = fake.first_name()
last = fake.last_name()
name = f"{first} {last}"
domain = fake.domain_name()
email = f"{first}.{last}@{domain}"
job = fake.job()
db_cursor.execute('INSERT INTO people (name, email, job) VALUES (?,?,?)', (name,email,job))
# commit the transaction and close the cursor and db connection
db_conn.commit()
db_cursor.close()
db_conn.close()
运行此脚本一次,以使用记录数填充表。num_records
查询数据库
# sample_query.py
import sqlite3
import time
db_conn = sqlite3.connect("people_db.db")
db_cursor = db_conn.cursor()
t1 = time.perf_counter_ns()
db_cursor.execute("SELECT name, email FROM people WHERE job='Product manager' LIMIT 10;")
res = db_cursor.fetchall()
t2 = time.perf_counter_ns()
print(res)
print(f"Query time without index: {(t2-t1)/1000} us")
Output >>
[
("Tina Woods", "Tina.Woods@smith.com"),
("Toni Jackson", "Toni.Jackson@underwood.com"),
("Lisa Miller", "Lisa.Miller@solis-west.info"),
("Katherine Guerrero", "Katherine.Guerrero@schmidt-price.org"),
("Michelle Lane", "Michelle.Lane@carr-hardy.com"),
("Jane Johnson", "Jane.Johnson@graham.com"),
("Matthew Odom", "Matthew.Odom@willis.biz"),
("Isaac Daniel", "Isaac.Daniel@peck.com"),
("Jay Byrd", "Jay.Byrd@bailey.info"),
("Thomas Kirby", "Thomas.Kirby@west.com"),
]
Query time without index: 448.275 us
$ sqlite3 people_db.db
SQLite version 3.37.2 2022-01-06 13:25:41
Enter ".help" for usage hints.
要获取索引列表,你可以运行:.index
sqlite> .index
由于当前没有索引,因此不会列出任何索引。
你还可以像这样检查查询计划:
sqlite> EXPLAIN QUERY PLAN SELECT name, email FROM people WHERE job='Product Manager' LIMIT 10;
QUERY PLAN
`--SCAN people
这里的查询计划是扫描所有效率低下的行。
在特定列上创建索引
若要在特定列上创建数据库索引,可以使用以下语法:
CREATE INDEX index-name on table (column(s))
假设我们需要经常查找具有特定职位的个人的记录。在作业列上创建索引会有所帮助:people_job_index
# create_index.py
import time
import sqlite3
db_conn = sqlite3.connect('people_db.db')
db_cursor =db_conn.cursor()
t1 = time.perf_counter_ns()
db_cursor.execute("CREATE INDEX people_job_index ON people (job)")
t2 = time.perf_counter_ns()
db_conn.commit()
print(f"Time to create index: {(t2 - t1)/1000} us")
Output >>
Time to create index: 338298.6 us
尽管创建索引需要这么长时间,但这是一次性操作。运行多个查询时,你仍将获得显著的加速。
现在,如果你在 SQLite 命令行客户端上运行,你将获得:.index
sqlite> .index
people_job_index
使用索引查询数据库
sqlite> EXPLAIN QUERY PLAN SELECT name, email FROM people WHERE job='Product manager' LIMIT 10;
QUERY PLAN
`--SEARCH people USING INDEX people_job_index (job=?)
你可以重新运行sample_query.py。仅修改语句并查看查询现在运行需要多长时间:print()
# sample_query.py
import sqlite3
import time
db_conn = sqlite3.connect("people_db.db")
db_cursor = db_conn.cursor()
t1 = time.perf_counter_ns()
db_cursor.execute("SELECT name, email FROM people WHERE job='Product manager' LIMIT 10;")
res = db_cursor.fetchall()
t2 = time.perf_counter_ns()
print(res)
print(f"Query time with index: {(t2-t1)/1000} us")
下面是输出:
Output >>
[
("Tina Woods", "Tina.Woods@smith.com"),
("Toni Jackson", "Toni.Jackson@underwood.com"),
("Lisa Miller", "Lisa.Miller@solis-west.info"),
("Katherine Guerrero", "Katherine.Guerrero@schmidt-price.org"),
("Michelle Lane", "Michelle.Lane@carr-hardy.com"),
("Jane Johnson", "Jane.Johnson@graham.com"),
("Matthew Odom", "Matthew.Odom@willis.biz"),
("Isaac Daniel", "Isaac.Daniel@peck.com"),
("Jay Byrd", "Jay.Byrd@bailey.info"),
("Thomas Kirby", "Thomas.Kirby@west.com"),
]
Query time with index: 167.179 us
我们看到查询现在大约需要167.179微秒来执行。
性能改进
对于我们的示例查询,使用索引查询的速度约为2.68倍。 在执行时间上,我们得到了62.71%的百分比加速。
你也可以尝试运行更多查询:查询,并查看性能改进。
还要注意:因为我们只在作业列上创建了索引,所以如果你正在运行涉及其他列的查询,则查询的运行速度不会比没有索引的情况快。