Explore Polars A Powerful DataFrame Library for Rust and Python

Introduction to Polars

Polars is a fast, multi-threaded DataFrame library designed to be used in both Rust and Python. It offers a wealth of features and APIs for data manipulation, allowing developers to work efficiently with large datasets.

Key Features and APIs

Polars comes with many powerful APIs for data operations. Below are several useful examples to help you get started:

Creating a DataFrame

import polars as pl

# Create a DataFrame
df = pl.DataFrame({
    "name": ["John", "Alice", "Bob"],
    "age": [28, 24, 19],
    "city": ["New York", "Los Angeles", "Chicago"]
})

print(df)

Read a CSV file

df = pl.read_csv("example.csv")
print(df)

Column Operations

# Select a column
ages = df["age"]

# Add a new column
df = df.with_column(pl.Series("score", [85, 90, 75]))

# Rename a column
df = df.rename({"age": "years"})

Filtering Data

# Filter rows where age is greater than 20
df_filtered = df.filter(pl.col("age") > 20)
print(df_filtered)

Group By Operations

# Group by city and calculate mean age
df_grouped = df.groupby("city").agg([pl.col("age").mean()])
print(df_grouped)

Joining DataFrames

# Another DataFrame for joining
df_scores = pl.DataFrame({
    "name": ["John", "Alice", "Bob"],
    "score": [85, 90, 75]
})

# Join on 'name' column
df_joined = df.join(df_scores, on="name", how="left")
print(df_joined)

Missing Data Handling

# Fill missing values in 'score' with the mean
df_filled = df.fill_null(strategy="mean")
print(df_filled)

Merging DataFrames

# Merge two DataFrames
df_merged = df.hstack(df_scores)
print(df_merged)

Example Data Processing Application

In this example, we will read a CSV file, filter rows, aggregate data, and export the results to a new CSV file.

import polars as pl

# Step 1: Read the CSV file
df = pl.read_csv("data.csv")

# Step 2: Filter rows where score is above 50
df_filtered = df.filter(pl.col("score") > 50)

# Step 3: Group by city and calculate average score
df_grouped = df_filtered.groupby("city").agg([pl.col("score").mean()])

# Step 4: Write the result to a new CSV file
df_grouped.write_csv("filtered_data.csv")

print("Data processing completed!")

Polars is a versatile library that makes data manipulation faster and more efficient, providing essential tools for both Rust and Python developers.

Hash: e37e7e864c58e1e96380229f66040d1852e73f5e80739aa742488ea869faecc1

Leave a Reply

Your email address will not be published. Required fields are marked *