异常检测[Keras]

断水流大师兄 2018-06-06 11:35
关注文章

前天看到Oreilly推荐的一篇文章,用自编码器做异常检测基于mxnet实现,数据来自一个运动传感器收集的数据,今天我用Keras来实现这个自编码器,来进行异常检测.

Import Data

import pandas as pd
import numpy as np
# 训练数据
train_data = pd.read_csv('normal.csv')

train_data.shape
(6613, 26)
# 验证
test_data = pd.read_csv('verify.csv')

valid_data.shape
(5200, 26)

数据来自LP研究所制造的传感器收集数据,传感器可以收集21个特征值

# 特征列表
train_data.columns
Index([u'SensorId', u' TimeStamp (s)', u' FrameNumber', u' AccX (g)',
       u' AccY (g)', u' AccZ (g)', u' GyroX (deg/s)', u' GyroY (deg/s)',
       u' GyroZ (deg/s)', u' MagX (uT)', u' MagY (uT)', u' MagZ (uT)',
       u' EulerX (deg)', u' EulerY (deg)', u' EulerZ (deg)', u' QuatW',
       u' QuatX', u' QuatY', u' QuatZ', u' LinAccX (g)', u' LinAccY (g)',
       u' LinAccZ (g)', u' Pressure (kPa)', u' Altitude (m)',
       u' Temperature (degC)', u' HeaveMotion (m)'],
      dtype='object')
# 选择一个特征来训练模型,Y方向上的线性加速度
train_data_selected = train_data[[' LinAccX (g)']].as_matrix().astype(np.float32)
valid_data_selected = valid_data[[' LinAccX (g)']].as_matrix().astype(np.float32)

Data visualization

正常LinAccY

%matplotlib inline
import matplotlib.pyplot as plt
# LinAccY
plt.figure(figsize=(16,7))
plt.plot(range(len(train_data_selected)), train_data_selected, 'go')
plt.title('Normal')
plt.ylabel('LinAccX')
plt.xlabel('Time')
plt.grid()

description

plt.figure(figsize=(16,7))
plt.plot(range(len(valid_data_selected)), valid_data_selected, 'go')
plt.title('Anomal')
plt.ylabel('LinAccX')
plt.xlabel('Time')
plt.grid()

description

MLP Autoencoder

MLP实现一个子自编码网络,至少应该包含3层,一个输入层,一个隐藏层,一个输出层,输入和输出层的神经元个数相同.
MLP是不擅长处理时间序列数据的,MLP的每一个输入被视为一个单独的样本,但是时间序列数据之间的依赖性是十分重要
数据中的尖峰并不是异常,可能是数据的周期性特征

数据预处理

def prepare_dataset(dataset, seq_len):
    seq_data = []
    data = dataset.flatten()
    for i in range(seq_len):
        padd_zeros = np.zeros(seq_len)
        padd_zeros[-(i+1):] = data[:seq_len][:i+1]
        seq_data.append(padd_zeros)
    for i in range(0, len(data)-seq_len,1):
        seq_data.append(data[i: i+seq_len])
    return np.array(seq_data)
data_train = prepare_dataset(train_data_selected, 25)
data_valid = prepare_dataset(valid_data_selected, 25) 

Split data training and testing sets

from sklearn.model_selection import train_test_split
train_x , val_x , train_y, val_y = train_test_split(data_train, data_train.copy(),
                                                    test_size=0.2, random_state=3)

Build Model

from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation
from keras import losses
model = Sequential()
model.add(Dense(16, activation='tanh', input_dim=25))
model.add(Dropout(0.25))
model.add(Dense(16, activation='tanh'))
model.add(Dropout(0.2))
model.add(Dense(16, activation='tanh'))
model.add(Dropout(0.25))
model.add(Dense(25, activation='tanh'))
model.compile(loss=losses.mse, optimizer='adam')
history = model.fit(train_x, train_y, batch_size=256, epochs=200,validation_data=(val_x,val_y),verbose=2)
Train on 5290 samples, validate on 1323 samples
Epoch 1/200
 - 1s - loss: 0.0026 - val_loss: 0.0012
Epoch 2/200
 - 0s - loss: 0.0017 - val_loss: 0.0010
Epoch 3/200
 - 0s - loss: 0.0014 - val_loss: 9.0889e-04
Epoch 4/200
 - 0s - loss: 0.0012 - val_loss: 8.4728e-04
Epoch 5/200
 - 0s - loss: 0.0011 - val_loss: 8.0012e-04
Epoch 6/200
 - 0s - loss: 0.0010 - val_loss: 7.6292e-04
Epoch 7/200
 - 0s - loss: 9.4862e-04 - val_loss: 7.2764e-04
Epoch 8/200
 - 0s - loss: 8.9246e-04 - val_loss: 6.9592e-04
Epoch 9/200
 - 0s - loss: 8.6560e-04 - val_loss: 6.6588e-04

....................................................................................

Epoch 172/200
 - 0s - loss: 5.0720e-04 - val_loss: 3.4436e-04
Epoch 173/200
 - 0s - loss: 5.0580e-04 - val_loss: 3.4301e-04
Epoch 174/200
 - 0s - loss: 5.1182e-04 - val_loss: 3.4183e-04
Epoch 175/200
 - 0s - loss: 4.9710e-04 - val_loss: 3.4733e-04
Epoch 176/200
 - 0s - loss: 5.0847e-04 - val_loss: 3.4290e-04
Epoch 177/200
 - 0s - loss: 5.0791e-04 - val_loss: 3.4510e-04
Epoch 178/200
 - 0s - loss: 5.0291e-04 - val_loss: 3.4397e-04
Epoch 179/200
 - 0s - loss: 5.0073e-04 - val_loss: 3.4405e-04
Epoch 180/200
 - 0s - loss: 5.0530e-04 - val_loss: 3.4269e-04
Epoch 181/200
 - 0s - loss: 5.0060e-04 - val_loss: 3.4447e-04
Epoch 182/200
 - 0s - loss: 5.0487e-04 - val_loss: 3.4401e-04
Epoch 183/200
 - 0s - loss: 5.0181e-04 - val_loss: 3.4068e-04
Epoch 184/200
 - 0s - loss: 5.0707e-04 - val_loss: 3.4075e-04
Epoch 185/200
 - 0s - loss: 4.9829e-04 - val_loss: 3.4117e-04
Epoch 186/200
 - 0s - loss: 5.0557e-04 - val_loss: 3.4093e-04
Epoch 187/200
 - 0s - loss: 5.0193e-04 - val_loss: 3.3929e-04
Epoch 188/200
 - 0s - loss: 5.0516e-04 - val_loss: 3.4222e-04
Epoch 189/200
 - 0s - loss: 5.0814e-04 - val_loss: 3.4438e-04
Epoch 190/200
 - 0s - loss: 5.0533e-04 - val_loss: 3.4306e-04
Epoch 191/200
 - 0s - loss: 4.9998e-04 - val_loss: 3.4417e-04
Epoch 192/200
 - 0s - loss: 5.0657e-04 - val_loss: 3.4660e-04
Epoch 193/200
 - 0s - loss: 5.0803e-04 - val_loss: 3.4325e-04
Epoch 194/200
 - 0s - loss: 5.0309e-04 - val_loss: 3.4618e-04
Epoch 195/200
 - 0s - loss: 5.0879e-04 - val_loss: 3.4109e-04
Epoch 196/200
 - 0s - loss: 5.0460e-04 - val_loss: 3.4525e-04
Epoch 197/200
 - 0s - loss: 5.0905e-04 - val_loss: 3.4116e-04
Epoch 198/200
 - 0s - loss: 5.0503e-04 - val_loss: 3.4351e-04
Epoch 199/200
 - 0s - loss: 5.0153e-04 - val_loss: 3.4494e-04
Epoch 200/200
 - 0s - loss: 5.0527e-04 - val_loss: 3.3873e-04

loss cruve

plt.figure(figsize=(16,7))
plt.plot(history.history['loss'], label='train')
plt.plot(history.history['val_loss'], label='val')
plt.grid()
plt.legend()

description

threshold

train_predictions = model.predict(train_x)
threshold =  np.mean(train_x.mean(axis=1))+1.25*np.std(train_x.mean(axis=1)) 
threshold
0.01215754349469141

检测结果

Validation data

valid_predictions = model.predict(data_valid)
anomaly = list(map(lambda v: v > threshold, valid_predictions.mean(axis=1)))

# Result visualization
anomaly = list(map(lambda v: "red" if v else "green", anomaly))
plt.figure(figsize=(16, 7))
plt.scatter(list(range(len(valid_data_selected))),valid_data_selected,c=anomaly)
plt.ylabel('LinAccX')
plt.show()

description

train data

train_predictions = model.predict(data_train)
anomaly = list(map(lambda v: v > threshold, train_predictions.mean(axis=1)))

# Result visualization
anomaly = list(map(lambda v: "red" if v else "green", anomaly))
plt.figure(figsize=(16, 7))
plt.scatter(list(range(len(train_data_selected))),train_data_selected,c=anomaly)
plt.ylabel('LinAccX')
plt.show()

description

{{panelTitle}}(1)
支持Markdown和数学公式,公式格式:\\(...\\)或\\[...\\]
云中鹤 2018-08-27 15:35

楼主你好,请问你用到的数据能给出来一下吗,我也想学习一下。另外我是初学者,你的有些代码我看的不是太明白,能不能向您交流学习一下,微信18868818361。谢谢!

关注微信公众号