Perception¶

感知器模型¶

假设我们的模型中有 N 个特征，那么输入向量的大小就是 N。感知器是一个**二分类**模型，也就是说，它可以区分两类输入数据。我们假设对于每个输入向量 x，感知器的输出要么是 +1，要么是 -1，具体取决于类别。输出将使用以下公式计算：

y(x) = f(w^Tx)

其中 f 是阶跃激活函数（step activation function）

$f(x) = \begin{cases} 1 & \text{if } x \geq 0 \\ -1 & \text{if } x < 0 \end{cases}$

训练感知器¶

为了训练感知器，我们需要找到一个权重向量 w，它能够对大多数值进行正确分类，即误差 E 最小。 This error E is defined by perceptron criterion in the following manner:

E(w) = -∑w^Tx_it_i

where:

the sum is taken on those training data points i that result in the wrong classification
x_i is the input data, and t_i is either -1 or +1 for negative and positive examples accordingly.

This criteria is considered as a function of weights w, and we need to minimize it. Often, a method called gradient descent is used, in which we start with some initial weights w₀, and then at each step update the weights according to the formula:

w_t+1 = w_t - η∇E(w)

Here η is the so-called learning rate, and ∇E(w) denotes the gradient of E. After we calculate the gradient, we end up with

w_t+1 = w_t + ∑ηx_it_i

在 w_t+1 = w_t + ∑ηx_it_i 这一式中，如果 η 为常数，则可以把公因子提取至前面。从而写成 w_t+1 = w_t + η∑x_it_i

但是我们常用 optimizer 来动态调整学习率，此时 η 不是常数。

The algorithm in Python looks like this:

def train(positive_examples, negative_examples, num_iterations = 100, eta = 1):

    weights = [0,0,0] # Initialize weights (almost randomly :)

    for i in range(num_iterations):
        pos = random.choice(positive_examples)
        neg = random.choice(negative_examples)

        z = np.dot(pos, weights) # compute perceptron output
        if z < 0: # positive example classified as negative
            weights = weights + eta*weights.shape

        z  = np.dot(neg, weights)
        if z >= 0: # negative example classified as positive
            weights = weights - eta*weights.shape

    return weights