Loading Data

10596 ワード

DeepLearning PyTorch テキストリンク

Minibatch Gradient Descent

大量のデータは一度に勉強できない!(速度、ハードウェアの問題)
統一学習

Minibatch Gradient Descent: Effects

すべてのデータが使用されていません.更新が速くなります.

はすべてのデータが書かれておらず、誤った方向で更新される可能性があります.

Pytorch Dataset

from torch.utils.data import Dataset

class CustomDataset(Dataset):
	def __init__(self):
    	self.x_data = [[73, 80, 75],
		       [93, 88, 93],
                       [89, 91, 90],
                       [96, 98, 100],
                       [73, 66, 70]]
        self.y_data = [[152], [185], [180], [196], [142]]
        
	def __len__(self):
    	return len(self.x_data)
        
    def __getitem__(self, idx):
    	x = torch.FloatTensor(self.x_data[idx])
        y = torch.FloatTensor(self.y_data[idx])
		
        return x,y

Pytorch DataLoader

from torch.utils.data import Dataloader

dataloader = DataLoader(
	dataset,
    # 통상적으로 2의 제곱으로 설정
    batch_size = 2,
    # Epoch 마다 데이터셋을 섞어서 데이터가 학습되는 순서를 바꾼다
    shuffle = True,
)

Full code with Dataset and DataLoader

x_train = torch.FloatTensor(x_data)
y_train = torch.FloatTensor(y_data)
# 모델 초기화
W = torch.zeros((3, 1), requires_grad=True)
b = torch.zeros(1, requires_grad=True)
# optimizer 설정
optimizer = optim.SGD([W, b], lr=1e-5)

nb_epochs = 20
for epoch in range(nb_epochs + 1):
    for batch_idx, sample in enumerate(dataloader):
    	x_train, y_train = samples
    		# H(x) 계산
    		hypothesis = x_train.matmul(W) + b # or .mm or @

    		# cost 계산
            cost = torch.mean((hypothesis - y_train) ** 2)

    		# cost로 H(x) 개선
    		optimizer.zero_grad()
        	cost.backward()
    		optimizer.step()

          	# 100번마다 로그 출력
    		print('Epoch {:4d}/{} Cost: {:.6f}'.format(
        		epoch, nb_epochs, cost.item()
    		))

Reference

この問題について(Loading Data), 我々は、より多くの情報をここで見つけました https://velog.io/@hyoju2259/Loading-Data

テキストは自由に共有またはコピーできます。ただし、このドキュメントのURLは参考URLとして残しておいてください。

Collection and Share based on the CC Protocol

Amazon Machine Learningのサンプルをちょっとチューニングしてみる

整数降順で整列[programmers]-練習問題