Introduction to Keras Preprocessing
Keras Preprocessing is a comprehensive toolkit provided by Keras to facilitate data preprocessing for deep learning models. It comes with numerous APIs that help in preparing text and image data, ensuring that your models receive the cleanest and most relevant data possible.
Image Data Augmentation
This section covers how to use Keras Preprocessing for image data augmentation, which can help improve the performance of your computer vision models.
from keras.preprocessing.image import ImageDataGenerator
# Create an instance of ImageDataGenerator
datagen = ImageDataGenerator(
rotation_range=40,
width_shift_range=0.2,
height_shift_range=0.2,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True,
fill_mode='nearest'
)
# Example: Loading an image and applying data augmentation
from keras.preprocessing.image import img_to_array, load_img
img = load_img('path_to_image.jpg') # Load image
x = img_to_array(img) # Convert to NumPy array
x = x.reshape((1,) + x.shape) # Reshape
# Generate batches of augmented images
for batch in datagen.flow(x, batch_size=1):
# Do something with the batch (e.g., display or save it)
break
Text Data Tokenization
Keras Preprocessing also offers tools for text data tokenization. Tokenization is a critical step in preparing text data for neural network models.
from keras.preprocessing.text import Tokenizer
texts = [
'Keras is an API designed for human beings, not machines.',
'Keras follows best practices for reducing cognitive load.'
]
# Create an instance of the Tokenizer
tokenizer = Tokenizer(num_words=1000)
tokenizer.fit_on_texts(texts)
sequences = tokenizer.texts_to_sequences(texts)
word_index = tokenizer.word_index
print('Found %s unique tokens.' % len(word_index))
Padding Text Sequences
When working with sequences of text data, it’s often necessary to ensure that all sequences have the same length. This can be achieved using the pad_sequences
function in Keras Preprocessing.
from keras.preprocessing.sequence import pad_sequences
data = pad_sequences(sequences, maxlen=10)
print(data)
Application Example: Sentiment Analysis
Let’s build a simple sentiment analysis model using Keras Preprocessing to prepare our data.
from keras.models import Sequential
from keras.layers import Embedding, LSTM, Dense
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
# Sample data
texts = [
'I love this product!',
'This is the worst thing I ever bought.'
]
labels = [1, 0]
# Tokenize and pad the sequences
tokenizer = Tokenizer(num_words=1000)
tokenizer.fit_on_texts(texts)
sequences = tokenizer.texts_to_sequences(texts)
data = pad_sequences(sequences, maxlen=20)
# Build the model
model = Sequential()
model.add(Embedding(input_dim=1000, output_dim=64, input_length=20))
model.add(LSTM(128))
model.add(Dense(1, activation='sigmoid'))
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
# Train the model
model.fit(data, labels, epochs=10, batch_size=2)
With this setup, you can preprocess your image and text data effectively using Keras Preprocessing APIs. This will undoubtedly give your deep learning models a significant boost in performance.
Hash: 2b4ae9bd54162b18edad4bf7f994addf4c6a4610a1ab458ade56bc4e6c62b046