Can you come up with an example of how to use duckdb in python to upload data in a multi-threaded manner?
To upload data to DuckDB in a multi-threaded manner in Python, you can use the following example. This demonstrates creating writer threads that insert data into a DuckDB table. Each thread uses a separate cursor from the same connection to ensure thread safety.1. Setup: Create a DuckDB connection and table.
2. Define Writer Function: Each thread will execute this function to insert data.
3. Create and Run Threads: Initialize writer threads and start them.LBKSimport duckdb from threading import Thread, current_thread import randomLBKS# Connect to DuckDB and create a table duckdb_con = duckdb.connect('my_persistent_db.duckdb') duckdb_con.execute(""" CREATE OR REPLACE TABLE my_inserts ( thread_name VARCHAR, insert_time TIMESTAMP DEFAULT current_timestamp ) """)LBKS# Function to be executed by each writer thread def write_from_thread(duckdb_con): local_con = duckdb_con.cursor() # Thread-local connection thread_name = str(current_thread().name) local_con.execute(""" INSERT INTO my_inserts (thread_name) VALUES (?) """, (thread_name,))LBKS# Create and start writer threads threads = [] write_thread_count = 50 for i in range(write_thread_count): thread = Thread(target=write_from_thread, args=(duckdb_con,), name='write_thread_' + str(i)) threads.append(thread)LBKSfor thread in threads: thread.start()LBKS# Wait for all threads to complete for thread in threads: thread.join()LBKS# Display the inserted data print(duckdb_con.execute("SELECT * FROM my_inserts ORDER BY insert_time").df())LBKSThis code snippet demonstrates how to use DuckDB with multiple Python threads to insert data concurrently. Each thread operates on its own cursor to avoid conflicts.Sources:
- Multiple Python Threads
💡 Hint: Mention RunLLM in the thread for followups.
How have folks handled upserts esp if there are no primary keys defined?
.png)