## Spectral Analysis of wide variety of data in MATLAB

Signals can be any time-varying or space-varying quantity. Examples: speech, temperature readings, seismic data, stock price fluctuations.

A signal has one or more frequency components in it and can be viewed from two different standpoints: time-domain and frequency domain. In general, signals are recorded in time-domain but analyzing signals in frequency domain makes the task easier. For example, differential and convolution operations in time domain become simple algebraic operation in the frequency domain.

## Fourier Transform of a signal:

We can go between time-domain and frequency domain by using a tool called Fourier transform.

Now get comfortable with Fourier transform, let’s take an example in MATLAB:

clear; close all; clc

%%Creating dataset
fs=100;    %sampling frequency (samples/sec)
t=0:1/fs:1.5-1/fs;%time
f1=10;  %frequency1
f2=20;  %frequency2
f3=30;  %frequency3

x=1*sin(2*pi*f1*t+0.3)+2*sin(2*pi*f2*t+0.2)+3*sin(2*pi*f3*t+0.4);

We represent the signal by the variable $x$. It is the summation of three sinusoidal signals with different amplitude, frequency and phase shift. We plot the signal first.

%%Visualizing data
figure(1)
plot(x)
legend('Two-tone signal')
saveas(gcf,'signal_plot.jpg')

We then do the Fourier transform of the signal to obtain its magnitude and phase spectrum.

%%Fourier transform of the data
X=fft(x);
X_mag=abs(fftshift(X));   %magnitude
X_phase=angle(fftshift(X)); %phase

%%Frequency bins
N=length(x);
Fbins=(-N/2:N/2-1)/N*fs;

%%Magnitude response
figure(2);
plot(Fbins,X_mag)
xlabel('Frequency(Hz)')
ylabel('Magnitude')
title('FFT - Magnitude response')
grid on
saveas(gcf,'fft_mag_response.jpg')

%%Phase response
figure(3);
plot(Fbins,X_phase/pi)
xlabel 'Frequency (Hz)'
ylabel 'Phase / \pi'
title('FFT - Phase response')
grid on
saveas(gcf,'fft_phase_response.jpg')

## Power spectrum estimation using the Welch’s method:

Now we compute the power spectrum of the using the Welch’s method. We can use the function “pwelch” in Matlab to obtain the desired result.

pwelch(x,[],[],[],fs) %one-sided power spectral density
saveas(gcf,'power_spectral_plot.jpg')

Pwelch is a spectrum estimator. It computes an averaged squared magnitude of the Fourier transform of a signal. In short, it computes a set of smaller FFTs (using sliding windows), computes the magnitude square of each and averages them.

By default, $latex x$ is divided into the longest possible segments to obtain as close to but not exceed 8 segments with 50% overlap. Each segment is windowed with a Hamming window. The modified periodograms are averaged to obtain the PSD estimate. If you cannot divide the length of $latexx$  exactly into an integer number of segments with 50% overlap, $latex x$ is truncated accordingly.

Note the peak at 10, 20 and 30 Hz. Also, note the display default for Pwelch is in dB (logarithmic).

Let us inspect another data using the “pwelch” function of the Matlab.

clear; close all; clc

load officetemp1; %16 weeks 2 samples/hr

%%Visualizing data
figure(1);
plot(t,tempC)
ylabel(sprintf('Temperature (%c C)', char(176)))
xlabel('Time (weeks)')
grid on
saveas(gcf,'tempReadings.jpg')

tempnorm=tempC-mean(tempC); %temperature residual

fs=2*24*7; %2 samples per hour

figure(2);
pwelch(tempnorm,[],[],[],fs);
axis([0 16 -30 0])
xlabel('Occurrence/week')
saveas(gcf,'power_spectral_tempReadings.jpg')

## Resolving two close frequency components using “pwelch”:

Let’s first plot the data (having the frequency components at 500 and 550 Hz) using the default parameters of “pwelch” function:

clear; close all; clc

%%

pwelch(s,[],[],[],Fs);
title('pwelch with default input- f1: 500Hz, f2: 550Hz')
set(gca,'YLim',[-120,0])
set(gca,'XLim',[0,5])

saveas(gcf,'pwelchDefaultPlot.jpg')

Here, we can see that the frequency component at 500Hz can be resolved but the frequency component at 550 Hz is barely visible.

One can obtain better frequency resolution by increasing the window size:

figure;
filename = 'pwelchWindowAnalysis.gif';
for len=500:300:N
pwelch(s,len,[],len,Fs);
title(sprintf('Hamming Window size: %d',len))
set(gca,'YLim',[-120,0])
set(gca,'XLim',[0,1])
drawnow;
frame = getframe(1);
im = frame2im(frame);
[imind,cm] = rgb2ind(im,256);
if len == 500;
imwrite(imind,cm,filename,'gif', 'Loopcount',inf);
else
imwrite(imind,cm,filename,'gif','WriteMode','append');
end
end

Here, we can see that by increasing the window width, we can resolve the two components. By increasing the window width, we lose the temporal resolution but at the same time, we gain the spectral resolution.

## Resolving the frequency component using the Kaiser window:

“pwelch” function uses Hamming window by default.

L = 64;
wvtool(hamming(L))

Because of the inherent property of the Hamming window (fixed side lobe), sometimes the signal can get masked.

clear; close all; clc

figure;
pwelch(s,[],[],[],Fs);
title('pwelch with default input- f1: 500Hz, f2: 5kHz')
set(gca,'YLim',[-90,0])
set(gca,'XLim',[0,10])
saveas(gcf,'pwelchComplexDefaultPlot.jpg')

In the above figure, we can see that the frequency component of the signal at 5kHz is barely distinguishable.

To resolve this component of frequency, we use Kaiser window instead of the default Hamming window.

% %% Kaiser window
figure;
len=2050;
h1=kaiser(len,4.53); %side lobe attenuation of 50dB
pwelch(s,h1,[],len,Fs);
title('Kaiser window with 50dB side lobe attenuation')
saveas(gcf,'pwelchComplexKaise50dBsidelobe.jpg')

% %% Kaiser window
figure;
len=2050;
h2=kaiser(len,12.26); %side lobe attenuation of 50dB
pwelch(s,h2,[],len,Fs);
title('Kaiser window with 120dB side lobe attenuation')
saveas(gcf,'pwelchComplexKaise120dBsidelobe.jpg')

To obtain a Kaiser window that designs an FIR filter with sidelobe attenuation of α dB, we use the following β :

kaiser(len,beta)

## Amplitude of one frequency component is much lower than the other:

We can have some signal where the amplitude of one frequency component is much lower than the others and is inundated in noise.

To deal with such signals, we need to get rid of the noise using the averaging of the window.

clear; close all; clc

% %% Kaiser window
figure;
len=2050;
h1=kaiser(len,4.53); %side lobe attenuation of 50dB
pwelch(s,h1,[],len,Fs);
set(gca,'XLim',[8,18]);
set(gca,'YLim',[-60,-20]);
title('Kaiser window with 50dB side lobe attenuation')
saveas(gcf,'pwelchAvgComplexKaise50dBsidelobe.jpg')

In the above signal plot, the second frequency component at 14kHz is undetectable. We can get rid of the noise using the averaging approach.

figure;
filename = 'pwelchAveraging.gif';
for len=2050:-100:550
h2=kaiser(len,4.53); %side lobe attenuation of 50dB
pwelch(s,h2,[],len,Fs);
drawnow;
frame = getframe(1);
im = frame2im(frame);
[imind,cm] = rgb2ind(im,256);
if len == 2050;
imwrite(imind,cm,filename,'gif', 'Loopcount',inf);
else
imwrite(imind,cm,filename,'gif','WriteMode','append');
end
end

Here, we take the smaller window in steps to show the effect of averaging. Smaller window in frequency domain is equivalent to the larger window in time domain.

## Dealing with data having missing samples:

clear; close all; clc
%%Signals having missing samples
plot(1:length(wgt),wgt,'LineWidth',1.2)
ylabel('Weight in kg'); xlabel('Day of the year')
grid on; axis tight; title('Weight readings of a person')
saveas(gcf,'weightReading.jpg')

In case, we have missing samples in the data, i.e., the data is not regularly recorded, then we cannot apply the “pwelch” function of matlab to retrieve its frequency components. But thankfully, we have the function “plomb” which can be applied in such cases.

figure;
[p,f]=plomb(wgt,7,'normalized');
plot(f,p)
xlabel('Cycles per week');ylabel('Magnitude')
grid on; axis tight
saveas(gcf,'plombspectrum.jpg')

## Analyzing time and frequency domain simultaneously:

Sometimes, we need the time and frequency informations simultaneously. For a long series of data, we need to know which frequency component is recorded first and at what time. This can be done by making the spectrogram.

clear; close all; clc

%%
figure;
pwelch(x,[],[],[],Fs)
saveas(gcf,'powerspectrum_dtmf.jpg')

% %%
figure;
spectrogram(x,[],[],[],Fs,'yaxis'); colorbar; %default window is hamming window
saveas(gcf,'spectrogramDefault_dtmf.jpg')

Since, the time resolution is higher for smaller window and frequency resolution is lower at small window length. So there is the trade off between the time and frequency domain window length. We need to figure out the optimal point for better resolution in both time and frequency domain.

segLen = [120, 240,480,600,800,1000,1200,1600];

figure;
filename='spectrogramAnalysis.gif';
for itr = 1:numel(segLen)
spectrogram(x,segLen(itr),[],segLen(itr),Fs,'yaxis');
set(gca,'YLim',[0,1.7]);
title(sprintf('window length: %d',segLen(itr)))
colorbar;
drawnow;
frame = getframe(1);
im = frame2im(frame);
[imind,cm] = rgb2ind(im,256);
if itr == 1;
imwrite(imind,cm,filename,'gif', 'Loopcount',inf);
else
imwrite(imind,cm,filename,'gif','WriteMode','append');
end
end

## Introduction to Genetics Algorithm (GA) (Part 2)

To find a basic introduction of GA, the first part can be found here.

III. Examples using Genetics Algorithm

In these examples, we will use Matlab and its function ga to apply GA for the optimization problem. For the manual of using this function, you can find it at https://www.mathworks.com/help/gads/ga.html or type in Matlab:

help ga

III.1. Optimization of a polynomial function

Problem: find the integer x  value that minimizes function:

y = 0.2x2 + 50/x

First, we define the function:

function y = simple_fitness(x)
y = 0.2*x.^2+(50./x);

And use Matlab to finish the job:

clear; close all; clc

x=linspace(0,40,100);
y=simple_fitness(x);
plot(x,y);

rng(1,'twister') % for reproducibility
lb=[1];
[x,fval] = ga(@simple_fitness,1,[],[],[],[],lb

xlabel('Size of cup');
ylabel('Objective Function Value');
title(sprintf('Minimum of the function is %.2f at x = %.2f',min(y), x))
grid on

And here is the result:

III.2. Find the best location of the earthquakes

Problem: we have a set of 5 seismic stations with coordinates: [-2 3 0; 1 3 0; -2 -1 0; 0 -3 0; 2 -2 0]

The travel time of seismic ray from EQ to stations is calculated from:

To simplify the problem, we choose v = 6 km

clear; close all; clc
stnx=[-2 1 -2 0 2];
stny=[3 3 -1 -3 -2];
stnz=[0 0 0 0 0];
EQ=[2 2 -2];

to=sqrt((EQ(1)-stnx).^2+(EQ(2)-stny).^2+(EQ(3)-stnz).^2)/6
to=to+0.05*randn(size(to))
rms = @(mp) sum(sqrt((1./(length(to)-4)).*((sqrt((mp(1)-stnx).^2+(mp(2)-stny).^2+(mp(3)-stnz).^2)/6)-to).^2));
% rms([2 2 -2])
ub=[10 10 0]; %upper bound
lb=[0 0 -10]; %lower bound
options = gaoptimset('Display','iter','PlotFcn',@gaplotbestf,'Generations',150)
% options = gaoptimset('Display','iter','PlotFcn',@gaplotbestf,'TolCon',1e-3);

mp = ga(rms,3,[],[],[],[],lb,ub,[],options)

figure; hold on; grid on;
scatter3(stnx, stny, stnz, 'b', 'filled', '^');
scatter3(mp(1), mp(2), mp(3), 'r', '*');
scatter3(EQ(1), EQ(2), EQ(3), 'g', '*');
view(3); xlabel('X axis'); ylabel('Y axis'); zlabel('Z axis');
legend('Stations','EQ location from GA','Real EQ location');



Utpal Kumar

IESAS

## Introduction to Genetics Algorithm (GA) (Part 1)

I. Introduction

In daily life as well as in doing research, we might come to problems that require a lowest/highest value of variables, e.g.:  find the shortest way from home to work, buying household items with a fixed amount of money, etc. These problems could be called “optimization” and today we will introduce an algorithm to solve these kinds of problem: the Genetic Algorithm.

Genetic Algorithm has been developed by Prof. John Holland in 1975, which search algorithm that mimics the process of evolution. This method is based on the “Survival of the Fittest” concept (Darwinian Theory), i.e.: successive generations are becoming better and better.

II. Algorithm

1. Initialization
2. Fitness Calculation
3. Selection
4. Crossing Over
5. Mutation
6. Repetition of the steps from 2-5 for the new population generated.

The pdf version of this introduction could be found here: Genetic Algorithm

The 2nd part can be found here.

Utpal Kumar

IESAS

## Best-fit quadratic surface from given points in 3D using Matlab

In Earth Science research, sometimes we need to construct 3D surfaces from given points, for example: creating the fault surface, locating a subducting slab from earthquake hypocenters, etc.  in a region of interest in X-Y plane.

In this example, we will show how to create a best-fit quadratic surface from given points in 3D using Matlab. The code is written in the following steps:

1. Input the data 3D points: x, y, z
2. Calculate the best fitting curve, store the parameters in C matrix.
3. Define a region of interest xx, yy and using C to get zz.
4. Plot the xx, yy, zz surface with x, y, z data.
5. Make comparison (probabilistic density function of misfit between z and zz).

We use getPolygonGrid.m from mathworks.com.

Main program:

% We start with some some 3d points
data = mvnrnd([0 0 0], [1 -0.5 0.8; -0.5 1.1 0; 0.8 0 1], 50);

x = data(:,1); y = data(:,2); z = data(:,3);

% Make a best-fit quadratic curve from the given points
C = x2fx(data(:,1:2), 'quadratic') \ data(:,3);

% Define an area of interest in X-Y plane by polyon nodes
xv = [-3 -3 0 3 3 0];
yv = [ 0 3 3 0 -3 -3];

% Create a area of interest in x, y plane
inpoints = getPolygonGrid(xv, yv, 1/0.1);
xx = inpoints(:,1);
yy = inpoints(:,2);

% Create the corresponding zz value from the above grid
zz = [ones(numel(xx),1) xx(:) yy(:) xx(:).*yy(:) xx(:).^2 yy(:).^2] * C;
zz = reshape(zz, size(xx));

% plot points and surface
figure('Renderer','opengl')
hold on;

% Plot the points
line(data(:,1), data(:,2), data(:,3), 'LineStyle','none', 'Marker','.', 'MarkerSize',25, 'Color','r')

% Make a triangular surface plot from xx, yy, zz
tri = delaunay(xx,yy); %x,y,z column vectors
trisurf(tri,xx,yy,zz, 'FaceColor','interp', 'EdgeColor','b', 'FaceAlpha',0.2);

% Change view propeties
title('Fitting surface');
grid on; axis tight equal;
view(9,9);
xlabel x; ylabel y; zlabel z;
colormap(cool(64))

% Calculate the misfit between surface and each points:
zfit = [ones(numel(x),1) x y x.*y x.^2 y.^2] * C;
h = -2: 0.01: 2;
misfit = z - zfit;
mu = mean(misfit); sigma = std(misfit);
f = exp(-(h-mu).^2./(2*sigma^2))./(sigma*sqrt(2*pi));
figure
title('Probability function of misfit');
hold on
plot(h,f,'LineWidth',1.5)



## Locating Earthquake using Geiger’s Method

Earthquake location problem is old, however, it is still quite relevant. The problem can be stated as to determine the hypocenter (x0,y0,z0) and origin time (t0) of the rupture of fault on the basis of arrival time data of P and S waves. Here, we have considered hypocenter in the cartesian coordinate system. It can be readily converted to geographical coordinates using appropriate conversion factor. The origin time is the start time of the fault rupture.

Here, our data are the P and S arrival times at the stations located at the surface (xi,yi,0). We have also assumed a point source and a constant velocity medium for the simplicity sake.

The arrival time of the P or S wave at the station is equal to the sum of the origin time of the earthquake and the travel time of the wave to the station.

The arrival time, ti=Ti + t0, where Ti is the travel time of the particular phase and t0 is the origin time of the earthquake.

Ti = sqrt((xi-x0)^2 + (yi-y0)^2 + z0^2)/v; where v is the P or S wave velocity.

Now, we have to determine the hypocenter location and the origin time of the Earthquake. This problem is invariably ill-posed as the data are  contaminated by noise. So, we can pose this problem as an over-determined case. One way to solve this inconsistent problem for the hypocenter and origin time is to minimize the least square error of the observed and predicted arrival times at the stations.

Before that, we need to formulate the problem. As evident from the above equation, the expression for arrival time and eventually the least square error function is nonlinear. This nonlinear inverse problem can be solved using the Newton’s method. We will utilize the derivative of the error function in the vicinity of the trial solution (initializing the problem with the initial wise guess) to devise a better solution. The next generation of the trial solution is obtained by expanding the error function in a Taylor series about the trial solution. This process continues until the error function becomes considerably small. Then we accept that  trial solution as the estimated model parameters.

d = G m, where d is the matrix containing the arrival time data, m is the matrix containing the model parameters and G is the data kernel which maps the data to the model.

There are some instances where the matrix can become underdetermined and the least square would fail such as when the data contains only the P wave arrival times. In such cases the solution can become nonunique. We solve this problem as a damped least square case. Though it can also be using the singular value decomposition which allows the easy identification of the underdetermined case and then partitioning the overdetermined and underdetermined in upper and lower part of the kernel matrix and dealing with both part separately.

m = [G’G + epsilon]G’d

The parameter epsilon is chosen by trial and error to yield a solution that has reasonably small prediction error.

In this problem, we deduce the gradient of the travel time by examining the geometry of the ray as it leaves the source. If the earthquake is moved a small distance s  parallel to the ray in the direction of the receiver, then the travel time is simply decreased by s/v, where v is the velocity of the phase. If the earthquake is moved a small distance perpendicular to the ray, then the change in travel time is negligible since the new path will have nearly same length as the old one.

T = r/v

If the earthquake is moved parallel to the ray path,

T” = (r-s)/v = r/v – s/v

delta T = T” – T = r/v – s/v – r/v  = – s/v

If the earthquake has moved perpendicular to the raypath,

delta T = 0

So, delta T = – s/v, where is a unit vector tangent to the ray at the source and points toward the receiver.

This is the Geiger’s Method, which is used to formulate the data kernel matrix.

Let us take a hypothetical earthquake location problem and solve it using this method.

MATLAB Codes:

clear; close all;
global G epsilon;
epsilon=1.0e-6;

% Velocity parameters
vpvs = 1.78;
vp=6.5;
vs=vp/vpvs;

% defining the region of the problem which contains both the source and the receiver
%We take a 100x100 units^2 area and the depth of 100 units. The surface is 0 units depth.
xmin=-100;
xmax=100;
ymin=-100;
ymax=100;
zmin=-100;
zmax=0;

% stations: x, y, z coordinates (xi, yi, 0)
sxm = [-9, -8, -7, -6, -5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9]'*ones(1,9)*10;
sym = 10*ones(9,1)*[-9, -8, -7, -6, -5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9];
sx = sxm(:);
sy = sym(:);
Ns = length(sx); % num of stations
sz = zmax*ones(Ns,1); %zeros

% true earthquakes locations
Ne = 20; % number of earthquakes
M = 4*Ne; % 4 model parameters, x, y, z, and t0, per earthquake
extrue = random('uniform',xmin,xmax,Ne,1); %x0
eytrue = random('uniform',ymin,ymax,Ne,1); %y0
eztrue = random('uniform',zmin,zmax,Ne,1); %z0
t0true = random('uniform',0,0.2,Ne,1); %t0
mtrue = [extrue', eytrue', eztrue', t0true']'; %column matrix containing all the true model parameter values

Nr = Ne*Ns; % number of rays, that is, earthquake-stations pairs
N = 2*Ne*Ns; % total data; Ne*Ns for P and S each

% Generating the true data based on the true earthquake and station locations
dtrue=zeros(N,1); %allocating space for all data
for i = [1:Ns] % loop over stations
for j = [1:Ne] % loop over earthquakes
dx = mtrue(j)-sx(i); % x component of the displacement obtained by the difference between station and source location x component
dy = mtrue(Ne+j)-sy(i); % y component of the source- receiver displacement
dz = mtrue(2*Ne+j)-sz(i); % z component of displacement between source and receiver
r = sqrt( dx^2 + dy^2 + dz^2 ); % source-receiver distance
k=(i-1)*Ne+j;
dtrue(k)=r/vp+mtrue(3*Ne+j); %P arrival time for each station-source pair obtained by summing the travel time with the origin time of the earthquake
dtrue(Nr+k)=r/vs+mtrue(3*Ne+j); % S arrival time
end
end

%  Generating observed data by adding gaussian noise with standard deviation 0.2
sd = 0.2;
dobs=dtrue+random('normal',0,sd,N,1); % observed data

%% Determining the predicted arrival time using the Geiger's Method
% inital guess of earthquake locations
mest = [random('uniform',xmin,xmax,1,Ne), random('uniform',ymin,ymax,1,Ne), ...
random('uniform',zmin+2,zmax-2,1,Ne), random('uniform',-0.1,0.1,1,Ne) ]';
% Here, we take a random initial guess

%Formulating the data kernel matrix and estimating the predicted models
for iter=[1:10] %for 10 iterations (termination criteria)

% formulating data kernel
G = spalloc(N,M,4*N); %N- total num of data,2*Ne*Ns; M is total num of model
%parameters, 4*Ne
dpre = zeros(N,1); %allocating space for predicted data matrix
for i = 1:Ns % loop over stations
for j = 1:Ne % loop over earthquakes
dx = mest(j)-sx(i); % x- component of displacement obtained using the initial guess
dy = mest(Ne+j)-sy(i); % y- component of displacement
dz = mest(2*Ne+j)-sz(i); % z- component of displacement
r = sqrt( dx^2 + dy^2 + dz^2 ); % source-receiver distance for each iteration
k=(i-1)*Ne+j; %index for each ray
dpre(k)=r/vp+mest(3*Ne+j); %predicted P wave arrival time
dpre(Nr+k)=r/vs+mest(3*Ne+j); %predicted S wave arrival time
%First half of data kernel matrix correspoding to P wave
G(k,j) = dx/(r*vp); % first column of data kernel matrix
G(k,Ne+j) = dy/(r*vp); % second column of data kernel matrix
G(k,2*Ne+j) = dz/(r*vp); % third column of data kernel matrix
G(k,3*Ne+j) = 1; % fourth column of data kernel matrix
% Second half of the data kernel matrix corresponding to S wave
G(Nr+k,j) = dx/(r*vs);
G(Nr+k,Ne+j) = dy/(r*vs);
G(Nr+k,2*Ne+j) = dz/(r*vs);
G(Nr+k,3*Ne+j) = 1;
end
end

% solve with dampled least squares
dd = dobs-dpre;
dm=bicg(@dlsfun,G'*dd,1e-5,3*M); solving using the biconjugate method
%solving the damped least square equation G'dd = [ G'G + epsilon* I] dm
% We use biconjugate method to reduce the computational cost (see for the dlsfun at the bottom)
mest = mest+dm; %updated model parameter

end

% Generating the final predicted data
dpre=zeros(N,1);
for i = 1:Ns % loop over stations
for j = 1:Ne % loop over earthquakes
dx = mest(j)-sx(i);
dy = mest(Ne+j)-sy(i);
dz = mest(2*Ne+j)-sz(i);
r = sqrt( dx^2 + dy^2 + dz^2 );
k=(i-1)*Ne+j;
dpre(k)=r/vp+mest(3*Ne+j); % S-wave arrival time
dpre(Nr+k)=r/vs+mest(3*Ne+j); % P- wave arriavl time
end
end

% Calculating the data and model misfit
expre = mest(1:Ne); % x0
eypre = mest(Ne+1:2*Ne); %y0
ezpre = mest(2*Ne+1:3*Ne); %z0
t0pre = mest(3*Ne+1:4*Ne); %t0
dd = dobs-dpre; %residual of observed and predicted arrival time
E = dd'*dd; %error
fprintf('RMS traveltime error: %f\n', sqrt(E/N) );
Emx = (extrue-expre)'*(extrue-expre); %misfit for x0
Emy = (eytrue-eypre)'*(eytrue-eypre); %misfit for y0
Emz = (eztrue-ezpre)'*(eztrue-ezpre); %misfit for z0
Emt = (t0true-t0pre)'*(t0true-t0pre); %misfit for t0
fprintf('RMS model misfit: x %f y %f z %f t0 %f\n', sqrt(Emx/Ne), sqrt(Emy/Ne), sqrt(Emz/Ne), sqrt(Emt/Ne) );


dlsfun.m:

function y = dlsfun(v,transp_flag)
global G epsilon;
temp = G*v;
y = epsilon * v + G'*temp;
return

## Some utilities to deal with sac data format

In seismology, we usually have to deal with the sac data format (binary data format). This data format can be dealt easily and efficiently with the SAC software provided by IRIS. But if we want to use this data in other software for instance MATLAB, then we may need to convert the data in alphanumeric format or to install additional functions.

Here, we show how can we deal with the sac data file format.

1. If you have sac installed on your system, then you can make use of the sac’s “convert” command to convert the data from alphanumeric format to sac format, back and forth.
2. There are several other functions and libraries available online to deal with the sac data file format directly in MATLAB. For example, Mike Thorne’s Software, Frederik J. Simons Repository
3. Here we give some useful utilities:

prem is a simple program to return the PREM velocities and density for an input depth. Using this you don’t need to check manually what is the PREM velocity at a particular depths.

This can be compiled using the following steps:

g95 -c getprem_mod.f90

g95 prem.f90 -o prem ./getprem_mod.o

mkdir -p ~/bin

mv prem ~/bin

sachead returns the value of specified sac header. So, you don’t need to read the data with sac to get its header information. If you have sac already installed in your computer then you don’t need this utility as there is a utility “saclst” which does the same job.

To compile this utility,

g95 -c mod_sac_io.f90

mkdir -p ~/bin

sac2xy converts the sac file from binary to alphanumeric format

usage: >> sac2xy sacfile outputfile

For compilation,

g95 -c mod_sac_io.f90

g95 sac2xy.f90 -o sac2xy  mod_sac_io.o

mkdir -p ~/bin

mv sac2xy ~/bin

open “~/.bashrc” file using your favourite text editor

add the following line to it

export PATH=$PATH:~/bin close the .bashrc file and restart your terminal window or run “. ~/.bashrc” command Or directly from the terminal, type echo “export PATH=$PATH:~/bin” >> ~/.bashrc

Mac users can add the following lines to the file ~/.bash_profile

C-shell users can edit the ~/.cshrc file.

echo “setenv PATH \$PATH\:/~/bin” >> ~/.cshrc

We list some MATLAB functions which can be used directly to import sac data in MATLAB:

2. rmean : To remove mean from the read sac data
3. rtrend : To remove linear trend from the data
4. cos_taper : Applies a 10% cosine taper
5. bp_bu_co : Bandpass filter using butterworth filter implementation

## Statistical Analysis in MATLAB

Doing statistical analysis in MATLAB is lot easier than other computer languages. Here, we can make use of the intrinsic “plot” to visualize the statistics of the data.

Below are some programs for commonly used statistical analysis:

1. Calculating mean, standard deviation, median etc of the data and visualizing the data using the histograms. Here, we do the example for a random normal and non-normal data. : Statistics 1
2. Histogram Plotting
3. Hypothesis Testing
4. Monte-Carlo Simulations and calculating the p-value
5. Probability distribution plots of the data
6. t-test statistics

Data file for test run of some of the above programs: temp_month.mat

## Reading Complex Text Data (MATLAB)

Text data which is in one format all through the file is easier to read. But when the text data has numeric entries, string entries (different data formats), then it becomes little complicated to read and write. Sometimes, the data is also not arranged in regular rows and columns form which makes them even more tougher to read.

clear; close all; clc;

filename='stn_data.txt'; %giving the file name to be opened

fileID = fopen(filename); %defining the identifier for file

format='%s %f %f'; %format of the text file

data=textscan(fileID,format);

fclose(fileID); %close the file

data2=data{3};  %reading the third column

#### Writing Data:

clear; close all; clc;

%% Writing Data into the text file

%defining a number of strings

stnName={'ABC','DEF','GHI','JKL','MNO','PQR','STU','VWX','YZA'};

%defining variable data1 with values between 0-10

data1=10*rand(1,length(stnName));

%defining variable data1 with values between 0-100

data2=100*rand(1,length(stnName));

fileid=fopen('stn_data.txt','w'); %opening a file in writing mode

for i=1:length(stnName)

fprintf(fileid,'%s %.3f %.3f\n',stnName{i},data1(i),data2(i)); %writing data into the file iteratively

end

fclose(fileid); % closing the file

Writing data

## Using a Function (MATLAB)

If we need to run the same algorithms or formula for computations again and again, then instead of writing the same stuffs in our script file many times, we can simply write a function. This makes our script look simpler and faster to run.

Writing a function in MATLAB

Examples:

Calculating mean and standard deviation

Calculating distance from origin

Calculates distance (in km) between two coordinate points on the Earth’s surface