Plot geochemical data using python: An example of analyzing Adakite rock data from GEOROC

We download the precompiled data of adakite from GEOROC database.
For a simple impression of adakite, the wikipedia page gives some clue: Adakites are volcanic rocks of intermediate to felsic composition that have geochemical characteristics of magma thought to have formed by partial melting of altered basalt that is subducted below volcanic arcs.

In this example, we demonstrate how to use python to simplify the data, discard the null data, classify and plot the geochemical properties.

First, let’s look at the data, it is quite large and differs in the available data: some elements are there, some are not.

Screen Shot 2018-09-24 at 3.50.03 PM

Now we can start by importing some useful packages:

import pandas as pd
import matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemap
import warnings
warnings.filterwarnings("ignore")
plt.rcParams['figure.figsize'] = (20, 10)
plt.style.use('default')

We use pandas to import the data, the encoding is used to solve some font conflicts.

df = pd.read_csv("ADAKITE.csv", encoding="iso-8859-1")

We use dropna() method to discard columns with less than 50% data available.

df = df.dropna(axis=1, thresh=int(0.5*len(df)))

To have a look at the sample occurrence, let’s plot the data. We select the lon, lat column, drop null value and change all data to numeric format.

plt.figure(figsize=(10,10))
m = Basemap(lon_0=180,projection='hammer')
lon = df["LONGITUDE MIN"].dropna()
lat = df["LATITUDE MIN"].dropna()
lon = pd.to_numeric(lon, errors='ignore');
lat = pd.to_numeric(lat, errors='ignore');
lon_ = [];lat_ = []
for x, y in zip(lon,lat):
    try:
        xx, yy = m(float(x),float(y))
        lon_.append(xx);lat_.append(yy)
    except:
        pass
m.scatter(lon_, lat_, marker = "o" ,s=15, c="r" , edgecolors = "k", alpha = 1)
m.drawcoastlines()
plt.title('Adakite rocks sample')
plt.show()

download

We make a function called plot_harker() to plot Harker’s diagram:

def plot_harker(x,xlabel,y,ylabel,title=None,xlim=[40,80],ylim=None,color = "b",label=None):
    plt.scatter(x=x,y=y,marker="o", c=color,s=8,label = label)
    plt.xlabel(xlabel)
    plt.ylabel(ylabel)
    plt.xlim(xlim)
    try:
        plt.ylim(ylim)
    except:
        pass
    if title != None:
        plt.title(title)

… and then plot some elements using that function:

plt.figure(figsize=(12,12))
plt.subplot(321)
plot_harker(x=df["SIO2(WT%)"],xlabel=r'$SiO_2$ (wt%)',
            y=df["AL2O3(WT%)"],ylabel=(r'$Al_2O_3$ (wt%)'),title=r'$SiO_2$ vs $Al_2O_3$')
plt.subplot(322)
plot_harker(x=df["SIO2(WT%)"],xlabel=r'$SiO_2$ (wt%)',
            y=df["MGO(WT%)"],ylabel=(r'$MgO$ (wt%)'),title=r'$SiO_2$ vs $MgO$')
plt.subplot(323)
plot_harker(x=df["SIO2(WT%)"],xlabel=r'$SiO_2$ (wt%)',
            y=df["FEOT(WT%)"],ylabel=(r'$FeOt$ (wt%)'),title=r'$SiO_2$ vs $FeOt$')
plt.subplot(324)
plot_harker(x=df["SIO2(WT%)"],xlabel=r'$SiO_2$ (wt%)',
            y=df["TIO2(WT%)"],ylabel=(r'$TiO_2$ (wt%)'),title=r'$SiO_2$ vs $TiO_2$')
plt.subplot(325)
plot_harker(x=df["SIO2(WT%)"],xlabel=r'$SiO_2$ (wt%)',
            y=df["NA2O(WT%)"],ylabel=(r'$Na_2O$ (wt%)'),title=r'$SiO_2$ vs $Na_2O$')
plt.subplot(326)
plot_harker(x=df["SIO2(WT%)"],xlabel=r'$SiO_2$ (wt%)',
            y=df["K2O(WT%)"],ylabel=(r'$MgO$ (wt%)'),title=r'$SiO_2$ vs $K_2O$')
plt.suptitle(r'Harker diagram of Adakite vs $SiO_2$',y=0.92,fontsize=15)
plt.subplots_adjust(hspace=0.3)
plt.show()

download (1).png

We can try to see the tectonic settings of the rock:

plt.figure(figsize=(8,8))
tec = df['TECTONIC SETTING'].dropna()
tec = tec.replace('ARCHEAN CRATON (INCLUDING GREENSTONE BELTS)','ARCHEAN CRATON')
tec_counts = tec.value_counts()
tec_counts.plot(kind="bar",fontsize=10)
plt.title('Tectonic settings of Adakite')
plt.ylim([0,500])
plt.show()

download (2)

The following code demonstrates how to create new columns and divide the data. We divide the data in High Silica Adakite (SiO2 > 60%) and Low Silica Adakite (SiO2 < 60%)

df['SR/Y'] = (df["SR(PPM)"]/df["Y(PPM)"])
df['CAO+NA2O'] = df['CAO(WT%)'] + df['NA2O(WT%)']
df['CR/NI'] = df['CR(PPM)'] + df['NI(PPM)']
df_hsa = df[df["SIO2(WT%)"] > 60]
df_lsa = df[df["SIO2(WT%)"] < 60]

download (3).png

Bonus: Let’s see the publish about adakite in the GEOROC database by year:

cite = [df.CITATIONS[x] for x in range(0,len(df)) if len(df.CITATIONS[x]) > 20 and df.CITATIONS[x].count('[') < 3]
year = []
for i in range(0,len(cite)):
    year.append(int(cite[i].split('[')[2].split(']')[0]))

download (4)

The python code below is the full program of this example:

Nguyen Cong Nghia
IESAS

Using Git and Github to store our programs (tutorial)

Git & GitHub tutorial

Git is a version control system for tracking changes in computer files. It was initially created by Linus Torvalds (creator of Linux system) in 2005.
We can use it to store any kinds of programs. It is distributed version control system. It means that many developers can work on the same project without being on the same network. It tracks each of the changes made to the files in the project. The user can revert to any file at any time of it had been committed to the repository. We can even look the snapshots of the code at any particular time in history. We can upload (or push) the files to the remote repository.

Initializing local git repository

git init

Configuring the Username and Email

git config --global user.name 'utpalkumar'
git config --global user.email 'utpalkumar50@gmail.com'

We can have a look at the configuration using the following command

git config --list

If you need any help, you can just type:

git help

Now, to add the files to the index and working tree (our final aim)

we use git add command
git add filename.py

To add all the files in the current directory

git add .
git add *.html

We can get the information about the tracked and untracked files. Tracked files are those which have been added to the working directory.

git status

If we make any changes to the files, we can inspect the changes using the diff command

git diff

If we want to remove the file from the index (untrack the file), then we can simply type:

git rm --cached filename.py

To remove the files from the index and the working tree:

git rm filename.txt

If we want to rename the file then we can do it using the git command too. We don’t need to untrack the file and then rename the original file and add it again

git mv filname.txt newfilename.txt

Now, we can commit the files to add to our repository on the GitHub.

There are two ways of doing that:
1. First way opens the vi or default directory on the local computer, and the user is prompted to enter the message. It is safe to enter meaningful messages because it is useful to track the changes made to the file.
git commit
2. The user can also enter the message using the -m flag
git commit -m 'made some changes'

To make changes to the committed files, the command is –amend

git commit --amend

If we don’t want to include some files in the current directory into the index or working tree, we can add the name of those files in the .gitignore file.

touch .gitignore

We can obtain the log of the git actions

git log --pretty=oneline
We can make this better formatted
git log --pretty=format:"%h : %an : %ar : %s"
For all commits within a week
git log --since=1.weeks
For all commits since some given date
git log --since="2014-01-12"
All the commits of a given author
git log --author="utpalkumar"
All commits before a given date
git log --before="2014-04-30"

If we are a group of developers. We don’t wish to add any changes to the repository without finishing a particular sub-project. We can avoid that by working in a branch

To create a branch other the main branch (master)
git branch mybranch
To switch to the new branch
git checkout mybranch

We can make any change in this branch and merge these changes in the master branch, we need to first switch to the master

git checkout master

to merge the changes to the original file

git merge mybranch -m "added mybranch"

For adding files to the remote git repository, we need to make account on the Github website and then we need to start a new repository, name it, give some required details then, we can add files from our local repository to the remote repository using the following commands:

git remote add origin https://repositoryaddress.git
Replace the above fake URL with the URL you get from the repository you create on the Github website.
git push

If we want to add the same files to another repository, we need to remove the added remove origin using the command

git remote rm origin # to remove the remote origin

We can download the git directory by simply using the git clone command

git clone https://repositoryaddress.git

Introduction to Genetics Algorithm (GA) (Part 1)

I. Introduction

In daily life as well as in doing research, we might come to problems that require a lowest/highest value of variables, e.g.:  find the shortest way from home to work, buying household items with a fixed amount of money, etc. These problems could be called “optimization” and today we will introduce an algorithm to solve these kinds of problem: the Genetic Algorithm.

Genetic Algorithm has been developed by Prof. John Holland in 1975, which search algorithm that mimics the process of evolution. This method is based on the “Survival of the Fittest” concept (Darwinian Theory), i.e.: successive generations are becoming better and better.

II. Algorithm

  1. Initialization
  2. Fitness Calculation
  3. Selection
  4. Crossing Over
  5. Mutation
  6. Repetition of the steps from 2-5 for the new population generated.

GA_algorithm.jpg

The pdf version of this introduction could be found here: Genetic Algorithm

The 2nd part can be found here.

Utpal Kumar

IESAS

Introduction to Python Part II

I. Type of objects in Python:

In Python, every object has its own class – or type of data. The in-depth tutorial can be found on the web, for example, https://jeffknupp.com/blog/2014/06/18/improve-your-python-python-classes-and-object-oriented-programming/. In this tutorial, I will introduce some basic type in Python.

To check type of a variable, data, you can use function type(variable)

+ Numbers: most frequently use is float and int type. The float type uses decimal while the int rounds number. Below is the example of using these type of number. To convert to int and float type, we use int(variable) and float(variable). A number can take numeric operations like +(add), – (subtract), *(multiply), /(divide), % (modulo), ** (exponential)

screen-shot-2016-12-05-at-1-04-06-pm

+ Strings: any character information, that in between ‘ ‘ or ” “.  Strings can be joined together by using + operation.

screen-shot-2016-12-05-at-1-08-31-pm

+ List: Python often uses compound data types, used to group together with other values. The most versatile is the list, which can be written as a list of comma-separated values (items) between square brackets. Lists might contain items of different types: numbers, string or list itself (a list of lists). This type is a basic type to analyze data in Python because it makes you able to access data with ordering.

II. Function and method

In Python,methods are associated with object instances or classes; functions aren’t. When Python dispatches (calls) a method, then it binds the first parameter of that call to the appropriate object reference. A function often has the form of function(argument) while an argument can be any kind of data type (number, string, list). A method must be associated with a type of object and often have the form of object.method(argument) with an object is the suitable type to do a method.

Let’s do some practice and take the list as an example. Screen Shot 2016-12-05 at 1.29.13 PM.png

The functions used in here are len (return the length of an object – how many objects in a list) and print (display an object on screen). The methods used in here are append(add an object to a list) and reverse (reverse the order of objects in a list).