avi解码(avi解码器手机)

前沿拓展：

avi解码

预览时候卡，是机器配置问题，你的机器配置相对于家用机来说，可以算的上相当不错的了，但是用于pe（premiere）来说，就不算高了，尤其是显卡，视频编辑用的显卡，要更好，另外你还应该有一块卡，视频编辑卡，便宜点的叫视频压缩卡，一般叫全长卡。你的显卡也就是一般家用的好显卡，所以卡。

视频解之序卫绝码器，一般式视频编辑卡里面带的解码器

PREMIERE必安插件

包括PREMIERE模版的一些常用插件，强烈建议使用模版之前先安装它们。
1、JPG公司的AVI文件解码器MJPEG VIDEO CODEC
2、PREMIERE

3.2.1 算法原理

C3D的算法来源于论文《Learning spatiotemporal features with 3d convolutional networks.》，关于算法介绍可以参见论文笔记——基于的视频行为识别/动作识别算法笔记(三)。

3.2.2 算法网络结构

图3 C3D网络

3.2.3 编译方法

C3D的caffe实现版本较旧(caffe git commit b80fc86)，有一个fork出来的caffe版本较新，此外还有tensorflow的实现版本与keras的实现版本，本文将分析tensorflow的实现版本。

Git下载后文件内容如下：

train_c3d_ucf101.py 以ucf101数据集作为训练集进行训练

predict_c3d_ucf101.py以ucf101数据集作为测试集进行预测

c3d_model.py c3d构建的网络模型

input_data.py 输入数据预处理

list 文件夹内train.list与test.list分别是待训练或预测的文件列表，脚本分别用来视频解码、列表生成，详细内容参加Readme.md。

直接运行 python train_c3d_ucf101.py或python predict_c3d_ucf101.py即可实现训练或预测。

tensorflow版本之所以能够以较少的代码实现，是由于本身API包含了3d的卷积与池化，不用自己再单独实现了。

tf.nn.conv3d(input, filter, strides, padding, name=None)

Computes a 3-D convolution given 5-D input and filter tensors. In signal processing, cross-correlation is a measure of similarity of two waveforms as a function of a time-lag applied to one of them. This is also known as a sliding dot product or sliding inner-product. Our Conv3D implements a form of cross-correlation.

tf.nn.avg_pool3d(input, ksize, strides, padding, name=None)

Performs 3D average pooling on the input.

tf.nn.max_pool3d(input, ksize, strides, padding, name=None)

Performs 3D max pooling on the input.

3.2.4 源码分析

(1). 数据预处理

a. 数据载入

在get_frames_data函数中完成。通过读取目录filename下随机且连续num_frames_per_clip个数的图片(时域深度一般是16，即num_frames_per_clip=16)，并转换成np.array格式。例如v_ApplyEyeMakeup_g01_c01目录下有描述该动作的连续165张图片(1个avi视频解码成若干jpg图片)，随机选择编号从32到48(32+16)的图片，其实这种方式有缺陷，应该基于某种帧率进行均匀采样抽取更合理一些。

b. 数据划分

在read_clip_and_label函数中完成。设置好index全局下的当前序号，batch_index批量中的序号、batch_size批量大小和next_batch_start下一批次的起始序号这四者之间的关系即可。

for index in video_indices:
if(batch_index>=batch_size):
next_batch_start = index
break

c. 数据打乱

在read_clip_and_label函数中完成，video_indices为序号列表。

video_indices = range(len(lines))
random.seed(time.time())
random.shuffle(video_indices)

d. 并行计算设定

在run_training()函数中，根据gpu的数量确定与传入batch_size的大小，来确定最终batch_size的大小，主要有如下几处。

images_placeholder, labels_placeholder = placeholder_inputs(
FLAGS.batch_size * gpu_num
) #占位符
for gpu_index in range(0, gpu_num):
with tf.device(‘/gpu:%d’ % gpu_index):
…
logit = c3d_model.inference_c3d(
images_placeholder[gpu_index * FLAGS.batch_size:(gpu_index + 1) * FLAGS.batch_size,:,:,:,:],
0.5,
FLAGS.batch_size,
weights,
biases
)#并行数据样本模型赋值
…
loss = tower_loss(
loss_name_scope,
logit,
labels_placeholder[gpu_index * FLAGS.batch_size:(gpu_index + 1) * FLAGS.batch_size]
)#并行数据标签模型赋值
…
for step in xrange(FLAGS.max_steps):
…
train_images, train_labels, _, _, _ = input_data.read_clip_and_label(
filename=’list/train.list’,
batch_size=FLAGS.batch_size * gpu_num,
num_frames_per_clip=c3d_model.NUM_FRAMES_PER_CLIP,
crop_size=c3d_model.CROP_SIZE,
shuffle=True
) #训练集batch size设置
val_images, val_labels, _, _, _ = input_data.read_clip_and_label(
filename=’list/test.list’,
batch_size=FLAGS.batch_size * gpu_num,
num_frames_per_clip=c3d_model.NUM_FRAMES_PER_CLIP,
crop_size=c3d_model.CROP_SIZE,
shuffle=True
)#验证集batch size设置

e. 数据中心化

均值已经事先算好，并以numpy的压缩格式存储在crop_mean.py文件中。

read_clip_and_label函数中实现：

np_mean = np.load(‘crop_mean.npy’).reshape([num_frames_per_clip, crop_size, crop_size, 3])

由于归一化和白化对图像数据来说不是必须的，故未进行这两项处理。

f. 视频数据缩放、裁剪

从np.array数据格式还原回image格式，进行缩放和裁剪。

img = Image.fromarray(tmp_data[j].astype(np.uint8))
if(img.width>img.height):
scale = float(crop_size)/float(img.height)
img = np.array(cv2.resize(np.array(img),(int(img.width * scale + 1), crop_size))).astype(np.float32)
else:
scale = float(crop_size)/float(img.width)
img = np.array(cv2.resize(np.array(img),(crop_size, int(img.height * scale + 1)))).astype(np.float32)
crop_x = int((img.shape[0] – crop_size)/2)
crop_y = int((img.shape[1] – crop_size)/2)
img = img[crop_x:crop_x+crop_size, crop_y:crop_y+crop_size,:] – np_mean[j]
img_datas.append(img)

(2). 网络构建

a. 网络设计

with tf.variable_scope(‘var_name’) as var_scope:
weights = {
‘wc1’: _variable_with_weight_decay(‘wc1’, [3, 3, 3, 3, 64], 0.0005),
‘wc2’: _variable_with_weight_decay(‘wc2’, [3, 3, 3, 64, 128], 0.0005),
‘wc3a’: _variable_with_weight_decay(‘wc3a’, [3, 3, 3, 128, 256], 0.0005),
‘wc3b’: _variable_with_weight_decay(‘wc3b’, [3, 3, 3, 256, 256], 0.0005),
‘wc4a’: _variable_with_weight_decay(‘wc4a’, [3, 3, 3, 256, 512], 0.0005),
‘wc4b’: _variable_with_weight_decay(‘wc4b’, [3, 3, 3, 512, 512], 0.0005),
‘wc5a’: _variable_with_weight_decay(‘wc5a’, [3, 3, 3, 512, 512], 0.0005),
‘wc5b’: _variable_with_weight_decay(‘wc5b’, [3, 3, 3, 512, 512], 0.0005),
‘wd1’: _variable_with_weight_decay(‘wd1’, [8192, 4096], 0.0005),
‘wd2’: _variable_with_weight_decay(‘wd2’, [4096, 4096], 0.0005),
‘out’: _variable_with_weight_decay(‘wout’, [4096, c3d_model.NUM_CLASSES], 0.0005)
}
biases = {
‘bc1’: _variable_with_weight_decay(‘bc1’, [64], 0.000),
‘bc2’: _variable_with_weight_decay(‘bc2’, [128], 0.000),
‘bc3a’: _variable_with_weight_decay(‘bc3a’, [256], 0.000),
‘bc3b’: _variable_with_weight_decay(‘bc3b’, [256], 0.000),
‘bc4a’: _variable_with_weight_decay(‘bc4a’, [512], 0.000),
‘bc4b’: _variable_with_weight_decay(‘bc4b’, [512], 0.000),
‘bc5a’: _variable_with_weight_decay(‘bc5a’, [512], 0.000),
‘bc5b’: _variable_with_weight_decay(‘bc5b’, [512], 0.000),
‘bd1’: _variable_with_weight_decay(‘bd1’, [4096], 0.000),
‘bd2’: _variable_with_weight_decay(‘bd2’, [4096], 0.000),
‘out’: _variable_with_weight_decay(’bout’, [c3d_model.NUM_CLASSES], 0.000),
}

b. 网络初始化

采用了xavier_initializer的方式进行初始化，并使用l2惩罚进行正则化

def _variable_with_weight_decay(name, shape, wd):
var = _variable_on_cpu(name, shape, tf.contrib.layers.xavier_initializer())
if wd is not None:
weight_decay = tf.nn.l2_loss(var)*wd
tf.add_to_collection(‘weightdecay_losses’, weight_decay)
return var

c. 模型载入

模型用的是sports1M

use_pretrained_model = True
model_filename = “./sports1m_finetuning_ucf101.model”

(3). 分类函数与loss定义

分类函数是Softmax，loss是交叉熵函数。

def tower_loss(name_scope, logit, labels):
cross_entropy_mean = tf.reduce_mean(
tf.nn.sparse_softmax_cross_entropy_with_logits(labels=labels,logits=logit)
)
tf.summary.scalar(
name_scope + ‘_cross_entropy’,
cross_entropy_mean
)
weight_decay_loss = tf.get_collection(‘weightdecay_losses’)
tf.summary.scalar(name_scope + ‘_weight_decay_loss’, tf.reduce_mean(weight_decay_loss) )
# Calculate the total loss for the current tower.
total_loss = cross_entropy_mean + weight_decay_loss
tf.summary.scalar(name_scope + ‘_total_loss’, tf.reduce_mean(total_loss) )
return total_loss

(4). 优化器定义

采用了Adam，并设置了粗调和微调两种学习率

opt_stable = tf.train.AdamOptimizer(1e-4)
opt_finetuning = tf.train.AdamOptimizer(1e-3)

(5). 训练与验证过程

训练过程的每个Step都会记录时间消耗。

for step in xrange(FLAGS.max_steps):
start_time = time.time()
train_images, train_labels, _, _, _ = input_data.read_clip_and_label(
filename=’list/train.list’,
batch_size=FLAGS.batch_size * gpu_num,
num_frames_per_clip=c3d_model.NUM_FRAMES_PER_CLIP,
crop_size=c3d_model.CROP_SIZE,
shuffle=True
)
sess.run(train_op, feed_dict={
images_placeholder: train_images,
labels_placeholder: train_labels
})
duration = time.time() – start_time
print(‘Step %d: %.3f sec’ % (step, duration))

每10次对训练准确率进行报告，并开启验证过程，并将验证准确率进行报告。

if (step) % 10 == 0 or (step + 1) == FLAGS.max_steps:
saver.save(sess, os.path.join(model_save_dir, ‘c3d_ucf_model’), global_step=step)
print(‘Training Data Eval:’)
summary, acc = sess.run(
[merged, accuracy],
feed_dict={images_placeholder: train_images,
labels_placeholder: train_labels
})
print (“accuracy: ” + “{:.5f}”.format(acc))
train_writer.add_summary(summary, step)
print(‘Validation Data Eval:’)
val_images, val_labels, _, _, _ = input_data.read_clip_and_label(
filename=’list/test.list’,
batch_size=FLAGS.batch_size * gpu_num,
num_frames_per_clip=c3d_model.NUM_FRAMES_PER_CLIP,
crop_size=c3d_model.CROP_SIZE,
shuffle=True
)
summary, acc = sess.run(
[merged, accuracy],
feed_dict={
images_placeholder: val_images,
labels_placeholder: val_labels
})
print (“accuracy: ” + “{:.5f}”.format(acc))
test_writer.add_summary(summary, step)

(6). 测试过程

测试过程在predict_c3d_ucf101.py中，进行最终的评分计算。

for step in xrange(all_steps):
# Fill a feed dictionary with the actual set of images and labels
# for this particular training step.
start_time = time.time()
test_images, test_labels, next_start_pos, _, valid_len =
input_data.read_clip_and_label(
test_list_file,
FLAGS.batch_size * gpu_num,
start_pos=next_start_pos
)
predict_score = norm_score.eval(
session=sess,
feed_dict={images_placeholder: test_images}
)

3.2.5 结果

(1). t-SNE特征抽取可视化

C3D论文中曾经将训练好的模型在分类前的数据用t-SNE算法进行聚集程度表示，该算法来源于论文《Visualizing data using t-SNE》。

图4 t-SNE数据聚集度呈现

图5 C3D论文中UCF101数据集特征呈现

数据可视化内涵就是将高维数据在低维空间进行合理呈现，由于人眼最高只能看到3维，这里的低维特指二维和三维。数据降维的方法有很多，如下图所示，而SNE使用的降维方法就是流行学习(Manifold Learning)，从高维采集数据中恢复低维的流形结构，从而将高维空间中数据聚集程度映射到低维流形空间中。关于流行学习和t-SNE算法本身可参考如下三篇文章：1,2,3

图6 数据降维算法分类

在实现上使用了scikit learn机器学习库的t-SNE算法实现，其中t-SNE算法输入是将训练好的C3D网络当成特征提取子(feature extractor)的输出，即模型的fc7的输出(101维)，t-SNE算法输出是根据不同label标记不同颜色的二维数据分布。本文是通过预测过程中，将fc7输出和label存成csv文件，再将csv文件数据进行可视化呈现。predict载入的模型是c3d_ucf101_finetune_whole_iter_20000_TF.model，注意由于ucf101和sports1M第一个全连接层(fc6)上有所差异，需要将pool5的重排列屏蔽掉(c3d_model.py中tf.transpose(pool5)行)。

tsne.py代码实现如下:

import tensorflow as tf
import numpy as np
from sklearn.manifold import TSNE
import csv
import matplotlib.pyplot as plt
colors = [“#476A2A”, “#7851B8”, “#BD3430”, “#4A2D4E”, “#875525”,
“#A83683”, “#4E655E”, “#853541”, “#3A3120”, “#535D8E”]
tsne = TSNE(random_state=42)
data_file=open(“data_file_part.csv”,’r’)
label_file=open(“label_file_part.csv”,’r’)
csv_reader_1=csv.reader(data_file,dialect=’excel’)
csv_reader_2=csv.reader(label_file,dialect=’excel’)
data = []
label = []
for row_1 in csv_reader_1:
data.append(row_1)
for row_2 in csv_reader_2:
label.append(row_2)
data = np.array(data)
label = np.array(label)
print (data.shape)
print (label.shape)
data_tsne=tsne.fit_transform(data)
plt.figure(figsize=(10,10))
plt.xlim(data_tsne[:,0].min(),data_tsne[:,0].max()+1)
plt.ylim(data_tsne[:,1].min(),data_tsne[:,1].max()+1)
for i in range(len(data)):
plt.text(data_tsne[i,0], data_tsne[i,1],str(int(label[i])),color=colors[int(label[i])%10],fontdict={‘weight’:’bold’,’size’:9})
plt.xlabel(“t-sne feature 0”)
plt.ylabel(“t-sne feature 1”)
plt.show()

predict_c3d_ucf101.py代码修改如下：

def run_test():
data_file=open(“data_file.csv”,’wb’)
label_file=open(“label_file.csv”,’wb’)
csv_write_1=csv.writer(data_file,dialect=’excel’)
csv_write_2=csv.writer(label_file,dialect=’excel’)
model_name = “c3d_ucf101_finetune_whole_iter_20000_TF.model”
… …
predict_score = norm_score.eval(
session=sess,
feed_dict={images_placeholder: test_images}
)
feature=sess.run(logits,feed_dict={images_placeholder: test_images})
test_labels_T = test_labels.reshape(test_labels.shape[0],1)
for i in range(0,len(feature)):
csv_write_1.writerow(feature[i])
csv_write_2.writerow(test_labels_T[i])
… …

前5个分类的结果(label 0:AppleEyeMakeUp 1:ApplyLipStick 2:Archery 3:BabyCrawling 4:BalanceBeam)如下图，可见label 0和label 1还不太能分开(画眼线和涂口红动作太相似了)，其他类别动作都已经良好的分开。

图7 UCF101前5个分类t-SNE聚集度呈现

前50个分类的结果如下图

图8 UCF101前50个分类t-SNE聚集度呈现

由上图可见，C3D特征提取后总体上数据离散度较明显，通过良好的分类算法(如神经网络本身的交叉熵softmax，或SVM)实现分类。

(2). 特征图可视化

C3D论文中对一些视频的几帧图像进行了特征图可视化，特征图可视化来源于论文《Visualizing and Understanding Convolutional Networks》。

图9 C3D论文中对特征图进行可视化

特征图可视化的目标是搞清楚CNN的每一层到底学到了什么特征，即通过调参得到精度性能提升后所对应的特征图是什么样的？具体来说，只有通过特征图才能了解CNN的工作机理，才能进一步指导调参来提升性能，此外一幅图像不同部分对于分类准确率有何影响，CNN每一层泛化能力是否相同也是需要通过特征图来寻找**。具体方法就是将网络各层的输出特征图作为输入进行反卷积过程(反卷积、反激活、反池化)得回原始图片的重建。比如模型B=f(A)，A为图像二维数组，f为CNN中的第一层，B为输出特征图，由A可计算出B，但可以将B这个二维数组保留一个位置的数B(i,j)，其他位置都清0，再经过反卷积计算出A’，则B(i,j)由A’来激活，A’反映了A中各像素对B(i,j)的贡献。图10中左侧即是得到A’，右侧是传统CNN做了两处改进:

1). maxpooling需要记录最大值在矩阵中的位置，Unpooling的时候再将数值放回该位置，其他位置置为0；

2). 因为能够得到卷积核的权值，故反卷积变换一下直接使用即可，不需要进行训练。