Q: How to bin a dataset efficiently?

The following code is an excellent example of Matlab's elegance. Not a single loop. ;-)
First it uses the fast built-in function histc to determine which bin each datapoint falls into.
Then it uses some sparse matrix magic to average the data inside the bins (conversion of a sparse matrix to a full one sums the repeated data!). E-mail me if you need further details (ashcherbina@ucsd.edu).


function [b,n,s]=bindata(x,y,gx)
% [b,n,s]=bindata(x,y,gx)
% Bins y(x) onto b(gx), gx defining centers of the bins. NaNs ignored.
% Optional return parameters are:
% n: number of points in each bin   
% s: standard deviation of data in each bin 
% A.S.

[yr,yc]=size(y);

x=x(:);y=y(:);
idx=find(~isnan(y));
if (isempty(idx)) b=nan*gx;n=nan*gx;s=nan*gx;return;end

x=x(idx);
y=y(idx);

xx = gx(:)';
binwidth = diff(xx) ;
xx = [xx(1)-binwidth(1)/2 xx(1:end-1)+binwidth/2 xx(end)+binwidth(end)/2];

% Shift bins so the interval is "( ]" instead of "[ )".
bins = xx + max(eps,eps*abs(xx));

[nn,bin] = histc(x,bins,1);
nn=nn(1:end-1);
nn(nn==0)=NaN;

idx=find(bin>0);
sum=full(sparse(bin(idx),idx*0+1,y(idx)));
sum=[sum;zeros(length(gx)-length(sum),1)*nan];% add zeroes to the end
b=sum./nn;

if nargout>1, n=nn;end
if nargout>2
	sum=full(sparse(bin(idx),idx*0+1,y(idx).^2));
	sum=[sum;zeros(length(gx)-length(sum),1)*nan];	% add zeroes to the end
   s=sqrt(sum./(nn-1) - b.^2.*nn./(nn-1) );
end