Introduction to Polars
Polars is a fast, multi-threaded DataFrame library designed to be used in both Rust and Python. It offers a wealth of features and APIs for data manipulation, allowing developers to work efficiently with large datasets.
Key Features and APIs
Polars comes with many powerful APIs for data operations. Below are several useful examples to help you get started:
Creating a DataFrame
import polars as pl
# Create a DataFrame
df = pl.DataFrame({
"name": ["John", "Alice", "Bob"],
"age": [28, 24, 19],
"city": ["New York", "Los Angeles", "Chicago"]
})
print(df)
Read a CSV file
df = pl.read_csv("example.csv")
print(df)
Column Operations
# Select a column
ages = df["age"]
# Add a new column
df = df.with_column(pl.Series("score", [85, 90, 75]))
# Rename a column
df = df.rename({"age": "years"})
Filtering Data
# Filter rows where age is greater than 20
df_filtered = df.filter(pl.col("age") > 20)
print(df_filtered)
Group By Operations
# Group by city and calculate mean age
df_grouped = df.groupby("city").agg([pl.col("age").mean()])
print(df_grouped)
Joining DataFrames
# Another DataFrame for joining
df_scores = pl.DataFrame({
"name": ["John", "Alice", "Bob"],
"score": [85, 90, 75]
})
# Join on 'name' column
df_joined = df.join(df_scores, on="name", how="left")
print(df_joined)
Missing Data Handling
# Fill missing values in 'score' with the mean
df_filled = df.fill_null(strategy="mean")
print(df_filled)
Merging DataFrames
# Merge two DataFrames
df_merged = df.hstack(df_scores)
print(df_merged)
Example Data Processing Application
In this example, we will read a CSV file, filter rows, aggregate data, and export the results to a new CSV file.
import polars as pl
# Step 1: Read the CSV file
df = pl.read_csv("data.csv")
# Step 2: Filter rows where score is above 50
df_filtered = df.filter(pl.col("score") > 50)
# Step 3: Group by city and calculate average score
df_grouped = df_filtered.groupby("city").agg([pl.col("score").mean()])
# Step 4: Write the result to a new CSV file
df_grouped.write_csv("filtered_data.csv")
print("Data processing completed!")
Polars is a versatile library that makes data manipulation faster and more efficient, providing essential tools for both Rust and Python developers.
Hash: e37e7e864c58e1e96380229f66040d1852e73f5e80739aa742488ea869faecc1