深度学习踩坑实录

类型特征one_hot编码

来源:Pytorch中,将label变成one hot编码的两种方式

解释TORCH.SPARSE

总结:

方法一: 先转张量,再编码

但是有个小问题,若取出一列变成张量之后怎么和原来的数据合并呢?

1
2
3
4
5
6
7
8
9
10
11
12
import torch
(1)
label = np.random.randint(0,class_num,size=(batch_size,1))
label = torch.LongTensor(label)
y_one_hot =torch.zeros(batch_size,class_num).scatter_(1,label,1)
print(y_one_hot)

(2)
ones = torch.sparse.torch.eye(class_num)
return ones.index_select(0,label)

二维标签先转成向量形式,再one-hot编码,最后转回二维

方法一: 先编码,再转张量

利用np.hstack合并

data = data.reindex(columns=new_col, fill_value=onehot_encoded)不知道fill_value到底需要什么类型…反正ndarrary不行

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# 完整示例
from sklearn import preprocessing
# 类别
data = ['cold', 'cold', 'warm', 'cold', 'hot', 'hot', 'warm', 'cold', 'warm', 'hot']
values = array(data)
print(values)
# integer encode
label_encoder = preprocessing.LabelEncoder()
integer_encoded = label_encoder.fit_transform(values)
print(integer_encoded)
# binary encode
onehot_encoder = preprocessing.OneHotEncoder(sparse=False)
integer_encoded = integer_encoded.reshape(len(integer_encoded), 1)
onehot_encoded = onehot_encoder.fit_transform(integer_encoded)
print(onehot_encoded)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
# 个人运用
def dataPre():
data = pd.read_csv('18_19_各类销量.csv')
# print(data)
data = data[['类型', '标价', '折扣', '库存', '销量']]
class_data = data['类型']
values = np.array(class_data)
# print(values)
# integer encode
label_encoder = preprocessing.LabelEncoder()
integer_encoded = label_encoder.fit_transform(values)
# print(integer_encoded)
# binary encode
onehot_encoder = preprocessing.OneHotEncoder(sparse=False)
integer_encoded = integer_encoded.reshape(len(integer_encoded), 1)
onehot_encoded = onehot_encoder.fit_transform(integer_encoded)
# print(onehot_encoded)
# onehot_dataframe = pd.DataFrame(onehot_encoded, index=None)
# print(onehot_dataframe)
# onehot_dataframe.rename(columns=['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'])
data = data.drop(columns='类型', axis=1)
# print('f', data)
# new_col = ['类型', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '标价', '折扣', '库存', '销量']
# print(new_col)
# data = data.reindex(columns=new_col, fill_value=0)
# data = data.reindex(columns=new_col, fill_value=onehot_encoded)
# print(data)
# 转成narrary合并编码与原数据集
data = np.hstack((integer_encoded, data))
# data = np.hstack((onehot_encoded, data))
# data = pd.merge(data, onehot_dataframe, on='9')
print(data)
dataFrame = pd.DataFrame(data)
dataFrame.to_csv('here.csv')

X = []
Y = []
# for i in range(data.shape[0] - sequence):
# X.append(np.array(data.iloc[i:(i + sequence), 0:4], dtype=np.float32))
# Y.append(np.array(data.iloc[(i + sequence), 4], dtype=np.float32))
for i in range(data.shape[0] - sequence):
# print(data[i:(i + sequence), 0:12])
X.append(np.array(data[i:(i + sequence), :4], dtype=np.float32))
Y.append(np.array(data[(i + sequence), 4], dtype=np.float32))
# print(X[0])
# print(Y[0])

total_len = len(X)
train_x, train_y = X[:int(0.7 * total_len)], Y[:int(0.7 * total_len)]
test_x, test_y = X[int(0.7 * total_len):], Y[int(0.7 * total_len):]
train_loader = DataLoader(dataset=Mydataset(train_x, train_y, transform=transforms.ToTensor()),
batch_size=batch_size,
)
test_loader = DataLoader(dataset=Mydataset(test_x, test_y), batch_size=batch_size, )
return train_loader, test_loader
1
2
import module
print(module.__file__)

bug

sklearn. DLL load failed: %1 不是有效的 Win32 应用程序

从ubuntu换到window执行程序,突然报这个错

不仅重装了虚拟环境,差一点就要重装conda了。。。

解决方案:

删掉对应scikit-learn库

1
2
3
conda uninstall scikit-learn
#别用conda安装,还是会报错
pip install scikit-learn

同样的什么库报这个错,用这个方法试试。

ImportError: C extension: No module named ‘pandas._libs.tslib’ not built. If you want to import pandas from the source directory, you may need to run ‘python setup.py build_ext –inplace –force’ to build the C extensions first.

1
2
conda uninstall pandas
pip install pandas

而后pandas库还有问题,看了下报错信息,貌似缺少six库,再一次安装

1
pip install six

Microsoft Visual C++ Redistributable is not installed, this may lead to the DLL load failure.

It can be downloaded at https://aka.ms/vs/16/release/vc_redist.x64.exe

给了方法,下载安装vc_redist.x64.exe即可

调通了一个虚拟环境,另一个就崩了

所以重安anconda.

RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.FloatTensor) should be the same

主要原因是输入的数据类型与网络参数的类型不符。

RuntimeError: CUDA out of memory. Tried to allocate 82.00 MiB (GPU 0; 7.93 GiB total capacity; 6.88 GiB already allocated; 49.31 MiB free; 6.92 GiB reserved in total by PyTorch)

nvidia-smi查看显存情况

RunTime Error : cuda out of memory

放入的图片尺寸过大,导致内存溢出,调整至256*256.

---------------- 本文结束 ----------------

本文标题:深度学习踩坑实录

文章作者:Pabebe

发布时间:2020年08月20日 - 15:34:58

最后更新:2020年08月29日 - 12:26:19

原始链接:https://pabebezz.github.io/article/43ed33f1/

许可协议: 署名-非商业性使用-禁止演绎 4.0 国际 转载请保留原文链接及作者。

0%