pandas + tqdm
Overview
pandas + tqdm is a combination of the tqdm
library, which provides progress bars, and the pandas
library, which is widely used for data manipulation in Python. Using tqdm
with pandas
allows tracking the progress of operations such as apply()
, groupby()
, and iterrows()
in a more user-friendly way.
Why Use tqdm with pandas?
When working with large datasets, operations on pandas
DataFrames can take a long time. The tqdm
library helps visualize progress, making it easier to estimate completion time and identify bottlenecks.
Key Benefits:
- Displays real-time progress updates for long-running operations.
- Enhances the readability of loops and function executions.
- Simple integration with
pandas
DataFrames.
Installing tqdm for pandas
To use tqdm
with pandas
, install it using:
pip install tqdm
If pandas
is not installed, you can install both together:
pip install pandas tqdm
Using tqdm with pandas
1. Applying tqdm to apply()
The apply()
function in pandas
is commonly used for row-wise or column-wise operations. To add a progress bar:
import pandas as pd
from tqdm import tqdm
# Enable tqdm with pandas
tqdm.pandas()
# Sample DataFrame
df = pd.DataFrame({'A': range(1, 10001)})
# Apply function with progress bar
df['B'] = df['A'].progress_apply(lambda x: x * 2)
2. Using tqdm in groupby()
Operations
When processing grouped data, adding tqdm
helps monitor execution:
df.groupby('A').progress_apply(lambda x: x.sum())
3. tqdm with iterrows()
Although iterrows()
is generally not recommended for performance reasons, adding a progress bar can help when iterating over large datasets:
for index, row in tqdm(df.iterrows(), total=len(df)):
pass # Process each row
Customizing tqdm in pandas
The tqdm
progress bar can be customized using various parameters:
df['B'] = df['A'].progress_apply(lambda x: x * 2, desc="Processing", position=0, leave=True)
Key Customization Options:
desc
: Adds a description to the progress bar.position
: Adjusts the display position of the bar.leave
: Keeps the progress bar visible after completion.
Conclusion
Using tqdm with pandas significantly improves the user experience when handling large datasets by providing real-time feedback on processing progress. By integrating it into apply()
, groupby()
, and iterrows()
, developers can monitor and optimize their data workflows effectively.

Professional data parsing via ZennoPoster, Python, creating browser and keyboard automation scripts. SEO-promotion and website creation: from a business card site to a full-fledged portal.