## 梯度下降的思想

$$J(\theta_0,\theta_1,...,\theta_n)=\frac{1}{2m}\sum_{i=1}^{m}(h(x^i)-y^i)^2$$
$\theta_t$是$x^t$对应的系数，$m$是样本总数，其中$h(x^i)$是第$i$个变量预测的结果，计算公式定义如下：
$$h(x^i)=\theta_0+\theta_1x_1+...+\theta_nx_n$$

\begin{matrix}
\left\{
\begin{aligned}
\theta_0:=\theta_0-\alpha\frac{1}{2m}\sum_{i=1}^{m}(h(x^i)-y^i)\\
\theta_n:=\theta_n-\alpha\frac{1}{2m}\sum_{i=1}^{m}(h(x^i)-y^i)x^i_n\\
\end{aligned}
\right.
\end{matrix}

## matlab代码

### 梯度下降

function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)
%   taking num_iters gradient steps with learning rate alpha

% Initialize some useful values
m = length(y); % number of training examples
J_history = zeros(num_iters, 1);

for iter = 1:num_iters

% ====================== YOUR CODE HERE ======================
% Instructions: Perform a single gradient step on the parameter vector
%               theta.
%
% Hint: While debugging, it can be useful to print out the values
%       of the cost function (computeCost) and gradient here.
%
h=X*theta;
theta=theta-alpha/m*((h-y)'*X)';

% ============================================================

% Save the cost J in every iteration
J_history(iter) = computeCost(X, y, theta);

end
end


### 计算代价函数

function J = computeCost(X, y, theta)
%COMPUTECOST Compute cost for linear regression
%   J = COMPUTECOST(X, y, theta) computes the cost of using theta as the
%   parameter for linear regression to fit the data points in X and y

% Initialize some useful values
m = length(y); % number of training examples

% You need to return the following variables correctly
J = 0;

% ====================== YOUR CODE HERE ======================
% Instructions: Compute the cost of a particular choice of theta
%               You should set J to the cost.
h=X*theta;
J=1/2/m*(h-y)'*(h-y);

% =========================================================================

end


### 特征一般化

function [X_norm, mu, sigma] = featureNormalize(X)
%FEATURENORMALIZE Normalizes the features in X
%   FEATURENORMALIZE(X) returns a normalized version of X where
%   the mean value of each feature is 0 and the standard deviation
%   is 1. This is often a good preprocessing step to do when
%   working with learning algorithms.

% You need to set these values correctly
X_norm = X;

% ====================== YOUR CODE HERE ======================
% Instructions: First, for each feature dimension, compute the mean
%               of the feature and subtract it from the dataset,
%               storing the mean value in mu. Next, compute the
%               standard deviation of each feature and divide
%               each feature by it's standard deviation, storing
%               the standard deviation in sigma.
%
%               Note that X is a matrix where each column is a
%               feature and each row is an example. You need
%               to perform the normalization separately for
%               each feature.
%
% Hint: You might find the 'mean' and 'std' functions useful.
%

mun=mean(X);
sigma=std(X);
[m,~]=size(X);
for i=1:m
X_norm(i,:)=(X(i,:)-mun)./sigma;
end
mu=mun;
% ============================================================

end