Professional Documents
Culture Documents
RAMAN SPECTROMETRY
Data classification algorithms on OpenRAMAN spec-
trometer to classify water solutions for SmartWater
monitoring
2022-2023
ENGINEERING SCIENCES
Acknowledgements
I extend my heartfelt gratitude to my esteemed promoter, prof. dr. ir. Abdellah Touhafi,
whose unwavering guidance and insightful mentorship have been instrumental in shaping the
course of this research journey. Your wisdom and encouragement have been invaluable.
I express my sincere appreciation to the members of my defense committee for their invaluable
insights, constructive feedback, and rigorous examination. Your expertise has enriched this
work beyond measure.
To the founders of OpenRAMAN, whose innovative creation paved the way for my research, I
extend my profound gratitude. Your contributions to the scientific community have been pivotal
in the advancement of my work. Also, to the wider scientific community for their assistance,
advice, and collaborative spirit, I am deeply thankful. The collective knowledge and support
have been invaluable.
To my classmates and colleagues who provided a supportive and collaborative environment,
thank you. Our shared experiences and discussions have been enlightening and inspiring. A
special note of thanks to my office mates at the university for the camaraderie, the exchange
of ideas, and the moments of respite amidst challenging times.
To my dear friends, your encouragement and understanding have been a constant source of
motivation. Your belief in me has been a guiding light.
Finally, to my family, my pillars of strength, and unwavering support, I am deeply grateful.
Your love, encouragement, and sacrifices have been the cornerstone of my journey.
Each of you has contributed to the realization of this achievement in your own unique way, and
for that, I am profoundly thankful.
Kwinten Depestel
Etterbeek, August 21, 2023
i
Abstract
ii
Contents
Acknowledgements i
Abstract ii
Contents iii
Nomenclature v
1 Introduction 1
1.1 Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Research question . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 Spectroscopy 3
2.1 The science behind Raman Spectroscopy . . . . . . . . . . . . . . . . . . . . . . 3
2.1.1 Elastic light scattering . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1.2 Inelastic light scattering . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.3 Raman scattering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Raman spectrometers through time . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3 The fundamentals of a Raman Spectroscope . . . . . . . . . . . . . . . . . . . . 10
2.3.1 Sample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3.2 Longpass and Rayleigh filters . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3.3 slit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3.4 grating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.3.5 CCD sensor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.4 Resulting data from Raman spectroscopy . . . . . . . . . . . . . . . . . . . . . 16
2.5 The analysis of the Raman effect . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3 OpenRAMAN spectrometer 19
3.1 build kit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2 parts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.2.1 Diode-Pumped Solid-State laser . . . . . . . . . . . . . . . . . . . . . . 20
3.2.2 Longpass Dichroic Mirrors . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.2.3 Sample cuvet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.2.4 Slit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.2.5 Hard-Coated Longpass Filters . . . . . . . . . . . . . . . . . . . . . . . . 23
3.2.6 Reflective Diffraction Gratings . . . . . . . . . . . . . . . . . . . . . . . 23
3.2.7 Monochrome Camera . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.2.8 Baseplate and mounting holders . . . . . . . . . . . . . . . . . . . . . . 25
3.2.9 connection and software . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.3 build and alignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.4 calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.5 Resulting data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
iii
Contents iv
4 Data Analytics 30
4.1 Saving measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.2 Machine learning algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.2.1 Shortcommings of Raman spectra . . . . . . . . . . . . . . . . . . . . . 31
4.2.2 K-Nearest Neighbors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.2.3 Naive Bayes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.2.4 Support Vector Classifier . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.3.1 By Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.3.2 By Artificial Intelligence . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.3.3 Cyanobacteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
5 Conclusions 39
6 Future work 40
Bibliography 45
Nomenclature
Acronyms
θ Hoek rad
λ Golflengte m
ω Hoeksnelheid rad · s−1
v
Chapter 1
Introduction
The history of spectroscopy can be dated back to Isaac Newton’s optics experiment. This was
done around 1670 and consisted of an experiment where light (in this case sunlight) passes
through a small hole and is followed by a prism. The result is a rainbow of colors which leads
to the discovery that the everyday sunlight that we observe as white is in fact a combination
of all colors. Newton used the term spectrum to describe the rainbow of colors that combine to
form white light.
From this point on, Newton’s experiment has been further studied and improved on. About
130 years later the setup has improved so much that some new discoveries have been made.
It included a lens to focus the sun’s spectrum on a screen which led to the discovery that not
all colors of the spectrum were present. There appeared black bands (or missing colors) in the
perceived spectrum. This evidently led to more accurate spectrometer setups becoming a more
precise and quantitative scientific technique in medicine, physics, chemistry, and astronomy.
The black bands, later called absorption lines, were examined and about 600 of these lines
appear in the solar spectrum. This discovery allows us to determine the composition of the sun
without physically going there. As each of the absorption lines or combination of absorption
lines relates to a specific chemical element.
Using either natural or artificial light, a spectrometer is an instrument used to probe a certain
property of light as a function of its portion of the EM1 spectrum, typically its wavelength,
frequency, or energy. The property measured is usually the intensity of light but it depends on
the application.
1.1 Context
Spectroscopy has established itself as an essential cornerstone in science, with several dozen of
different types of spectroscopy. All are distinguished by specific applications or implementa-
tions. As a method used to analyze samples of water, there has been opted for a very specific
type of spectroscopy using the Raman effect. As most spectrometers are very expensive, an
Open-Source project called OpenRAMAN will be used for the entire research project. It allows
1 Electromagnetic
1
Chapter 1. Introduction 2
for a low-cost solution to bring Raman spectroscopy to consumers without sacrificing perfor-
mance or precision. It comes in the form of a building kit with all the necessary elements
pre-selected for you. The entire kit is designed by professional optical engineers and comes at a
fraction of the cost of similar industrial solutions. Both the plans for the kit and the software
are open-source and readily available.
1.2 Objectives
The OpenRaman kit, being an open-source Raman spectrometer, offers a versatile platform
to analyze water samples at a fraction of the price of commercial solutions. The primary
application of Raman spectroscopy is to identify and characterize chemical compounds. With
the OpenRaman kit, we can analyze the Raman spectra of different samples. Even perform some
environmental monitoring on the basis of identifying unknown substances in water samples. The
goal is to get an as accurate as possible spectrometer built and tested. The data analysis will
include some ML2 algorithms.
Does the OpenRAMAN kit provide an accurate enough result that the price-to-performance
ratio can be justified? Is it possible to create a spectrometer that performs in line with the
theoretical quantum efficiency that is mentioned by the founders? Can we differentiate be-
tween different chemicals using generic or even advanced machine learning algorithms? Can we
build our own dataset from analyzed samples to generate a chemical library to train our own
classification algorithm?
2 Machine Learning
Chapter 2
Spectroscopy
– Emission Spectroscopy: Analyzes the light emitted by a sample after it has been excited
by an external energy source, such as heat or electricity. It is used to study the energy
levels and electronic transitions of atoms or molecules.
– Fluorescence Spectroscopy: Focuses on the emission of light by a sample that has absorbed
light of a shorter wavelength. It provides information about the structure and environment
of molecules.
– Mass Spectrometry: Determines the mass-to-charge ratio of ions produced from a sample.
It is used to identify and quantify the composition of molecules.
From this point onward, any mention of spectroscopy will exclusively refer to Raman spec-
troscopy unless stated otherwise.
The central theory behind the word spectrum is that light is made of different wavelengths (see
chapter 1) and that each wavelength corresponds to a different frequency. Every element in the
periodic table has a unique light spectrum that refers to the pattern of light that is emitted or
absorbed by atoms or ions of that element. Different spectroscopic procedures use the idea that
each atomic element has its unique spectral signature. In general, it is based on the principles
of quantum mechanics and the behavior of atoms and molecules. The interaction of electro-
magnetic radiation with a certain sample results in absorption, transmission, or scattering of
certain wavelengths of the radiation.
1 Electromagnetic
3
Chapter 2. Spectroscopy 4
The basic necessities for the spectroscopic technique are a light source, a sample, and a detector.
The light source emits a range of wavelengths that can either be narrow or broad depending on
the application. The sample is what is tried to be measured and interacts with the source. A
detector is needed to visualize or capture the intensity of the light after it has interacted with
a given sample.
As different spectroscopy techniques utilize different regions of the EM spectrum, the most
commonly used regions are the visible (VIS), ultraviolet (UV), and infrared (IR) spectra. Each
region of the EM spectrum provides unique information about the sample. The interpretation
of the results of spectroscopic data involves comparing experimental results with known spectra
or theoretical models. These unique features can nowadays be found in globally maintained
public databases and are expanded upon if a new feature is established. An example of such a
database is Kramida et al., 1995 which contains precise measurements of atomic energy levels.
For example, when the scattered light is directed back toward the source (back-scattering),
the scattering angle (θ) in eq. (2.1) is π (180 degree). In this case, the intensity of scattering
is maximized. Similarly, forward scattering (small scattering angles) also exhibits stronger
intensity compared to scattering in transverse directions.
The directional dependence (see fig. 2.3) of Rayleigh scattering arises from the nature of the
oscillation of the scattering particles, which are typically dipoles. These dipoles oscillate (see
fig. 2.1a) in directions perpendicular to the propagation direction of the incident light. Conse-
quently, they cannot radiate along the direction of their oscillation, leading to stronger scat-
tering in the forward and backward directions. This dipole radiation effect is the core physics
behind the scattering as it results from the electric polarizability of the particles.
Rayleigh scattering is responsible for various atmospheric phenomena, such as the blue color of
the sky during the day. It occurs when sunlight interacts with molecules and small particles in
the upper earth’s atmosphere, causing higher-frequency visible light (blue and violet parts of
the visible spectrum) to scatter more (see fig. 2.2) than the lower-frequency parts of the visible
spectrum. This preferential scattering of blue and violet light contributes to the bluish-white
appearance of the sky when viewed from the earth’s surface. This principle also applies to
sunrises and sunsets, where the scattering of the sun’s higher-frequency light components is
extensive, causing their dispersion. This dispersion results in a prominent yellow and orange
appearance due to the sun’s light having to traverse a longer path through the upper atmo-
sphere. Most to all of the blue and violet corresponding wavelengths are scattered away from
the sunlight.
Chapter 2. Spectroscopy 5
In addition to Rayleigh scattering, there are other types of elastic scattering, each with its own
characteristics and applications. Some of these include:
– Mie Scattering: Mie scattering occurs when the size of the scattering particles is compa-
rable to the wavelength of the incident light. Unlike Rayleigh scattering, which is mainly
associated with small particles, Mie scattering applies to larger particles. It is commonly
observed when light interacts with aerosols, fog, or other media containing particles that
are on the order of the incident light’s wavelength. Mie scattering can cause light to be
scattered in various directions, leading to phenomena like the halo around the sun or
moon.
– Tyndall Scattering: Tyndall scattering is a form of elastic scattering that occurs when light
interacts with colloidal particles or larger suspended particles in a medium. It is named
after the 19th-century physicist John Tyndall, who extensively studied this phenomenon.
Tyndall scattering is responsible for the visible beam of light passing through a colloidal
solution, fog, or mist. The suspended particles in the medium scatter the light, making
the beam visible.
commonly observed in plasma physics and astrophysics, providing information about the
properties of the scattering particles, such as their density, temperature, and velocity.
Figure 2.3: Elastic scattering: Rayleigh and Mie scattering intensities and their scattering direction
(a) Energy diagram showing signal after irra- (b) Scattering spectrum showing the photon
diation energetic difference
Figure 2.4: Raman scattering
Brillouin scattering, on the other hand, is a type of inelastic scattering that primarily occurs
in materials due to acoustic phonons, which are lattice vibrations or sound waves. When light
interacts with a material, the acoustic phonons can exchange energy with the photons, causing a
shift in frequency or wavelength of the scattered light. Brillouin scattering is commonly used in
the study of mechanical properties of materials, such as the determination of elastic constants,
acoustic modes, and the measurement of sound velocities.
Both Raman and Brillouin scattering involve energy exchange between light and matter, leading
to a change in the frequency, energy, and wavelength of the scattered light. By analyzing the
Chapter 2. Spectroscopy 7
resulting scattered light spectrum, scientists can obtain valuable information about the atomic
and molecular structure, dynamics, and physical properties of materials.
Inelastic light scattering techniques have numerous applications across various scientific fields.
In materials science, they are used for material characterization, including the identification of
chemical compounds, the study of crystal structures, and the investigation of phase transitions.
In biology and biomedical research, inelastic scattering techniques like Raman spectroscopy
are employed for non-destructive analysis of tissues (Jinadasa et al., 2021), identification of
biomolecules, and the study of cellular processes.
Furthermore, inelastic light scattering plays a significant role in areas such as spectroscopy,
photonics, environmental monitoring, and semiconductor research. It enables the study of
fundamental physics phenomena, including the behavior of light-matter interactions, phonon
dynamics, and the understanding of energy transfer processes in nanoscale systems.
Overall, inelastic light scattering techniques provide a powerful means to explore the fundamen-
tal properties of matter, offering valuable insights into the composition, structure, and behavior
of materials at microscopic and atomic scales.
Figure 2.5: All scattering types when a sample is illuminated with incident light
Table 2.1: Vibrational bonds and corresponding Raman shifts for a 532 nm incident light
In the present day, Raman spectroscopy has evolved into a widely used analytical technique with
applications spanning various fields such as pharmaceuticals, forensic science, environmental
monitoring, and nanotechnology. Over time, it has undergone significant advancements and
refinements to enhance its capabilities and broaden its scope.
The effect used in the Raman spectroscopy was first theoretically predicted in 1923 by A.
Smekal. The theoretical physicist described the inelastic scattering of light. A year previous
the Indian physicist C.V. Raman was working on a publication on the molecular diffraction of
light (Raman, 1922). This led to a series of studies where they ultimately observed the inelastic
scattering of light by molecules, where a small fraction of photons undergo a frequency shift
due to molecular vibrations. This is now more commonly known as the Raman effect. Named
after the physicist who received the 1930 Nobel Prize for his discovery and was the first Asian
to ever receive one.
Chapter 2. Spectroscopy 9
(a) The first Raman Spectrometer used (b) The first commercial Raman
by Prof. C.V. Raman (1922) Laser microscope under the name
MOLE™ (1976)
One of the experiments that predate the discovery of the Raman effect showed that there is
a possibility of other light rays forming in addition to the incident ray of sunlight when it is
filtered through violet glass. Later the discovery of the Compton effect led Prof. C.V. Raman
to believe that there must be an optical equivalent to the Compton effect. He and his associate
referred to the Compton effect as unmodified scattering and their newly discovered phenomenon
as modified scattering.
Once discovered, Raman spectroscopy was primarily studied and developed as a laboratory
technique for investigating molecular vibrational modes and providing insights into molecular
structure. Early instruments were relatively crude and had limited sensitivity. Further techno-
logical advancements in optics and instrumentation led to improved Raman spectrometers. The
introduction of laser technology in the 1960s further enhanced the sensitivity and practicality
of Raman spectroscopy.
The invention of the ruby laser in 1962 by T. H. Maiman provided a coherent, monochromatic
light source for Raman spectroscopy. This enabled more efficient excitation of Raman scattering
and improved signal detection.
From the 1970s, Raman spectroscopy started finding applications in various scientific fields, in-
cluding chemistry, materials science, and biology. Researchers explored its potential in studying
crystalline materials, characterizing polymers, analyzing biological samples, and investigating
Chapter 2. Spectroscopy 10
chemical reactions. Later, the development of fiber optic probes and miniaturized Raman sys-
tems expanded the versatility and accessibility of Raman spectroscopy. These advancements
allowed for in situ and non-destructive analysis of samples in different environments and remote
locations.
In the last decade, Raman spectroscopy continued to evolve with the integration of comple-
mentary techniques and technologies. For example, Raman microscopy combined Raman spec-
troscopy with microscopy, enabling spatially resolved analysis of samples at the microscopic
level. Additionally, the emergence of Raman imaging techniques facilitated the mapping and
visualization of chemical composition and distribution within samples.
As Raman spectroscopy became a widely used analytical technique in various fields. It continues
to benefit from advances in laser technology, detectors, data analysis algorithms, and instrument
miniaturization. Future developments aim to enhance sensitivity, speed, and spatial resolution
as well as open up new possibilities in scientific fields.
The spectroscopy is based on the Raman scattering phenomenon, where a small fraction of
incident photons undergoes inelastic scattering, resulting in a shift in energy and frequency.
The key components of the Raman spectroscopy are:
– A monochromatic light source is typically used, such as a laser, to irradiate the sample.
The choice of the laser wavelength depends on the nature of the sample and the desired
molecular information. The incident photons should have sufficient energy to induce the
Raman scattering process.
– The Raman effect occurs when a photon interacts with a molecule or material and trans-
fers energy to or from its vibrational or rotational states. This interaction leads to a
change in the energy and wavelength of the scattered light. The effect is a result of the
interaction between the incident light and the polarizability of the molecule, which causes
the scattering to occur at different energy levels.
– The scattered light from the sample is collected and analyzed to obtain the Raman spec-
trum. The scattered light consists of both Rayleigh scattered light and Raman scattered
light, either the same or shifted in energy as the incident light due to molecular interac-
tions. A spectrometer with a high-resolution detector is used to separate and measure
the intensities of the different energy-shifted Raman scattering components.
Chapter 2. Spectroscopy 11
The resulting spectrum is a graph that represents the intensity of the Raman scattered light
as a function of the energy shift from the incident light. It provides information about the
vibrational and rotational modes of the molecules in the sample. The Raman spectrum consists
of peaks corresponding to different transitions, which are specific to the molecular composition
and structure of the sample.
It is particularly sensitive to molecular vibrational modes. When a molecule absorbs energy
from the incident photons, it can undergo vibrational transitions, resulting in characteristic
shifts in the Raman spectrum. These shifts correspond to the vibrational frequencies and
modes of the molecular bonds, providing information about the chemical composition, molecular
structure, and bonding interactions within the sample.
Sample preparation for Raman spectroscopy depends on the nature of the sample and the
desired analysis. Generally, the sample should be optically transparent or have a thin layer for
effective light interaction. Solids can be analyzed directly, while liquids and gases may require
containment in suitable sample cells. However, the Raman spectra can be obtained from bulk
solids, liquids, tablets, polymers, paper, etc. with little or no sample preparation. The analysis
can also be carried out through many containers such as glass bottles, reaction vessels, plastic
containers, etc.
2.3.1 Sample
Raman spectroscopy is a powerful analytical technique used to detect vibrational, rotational,
and other states in a molecular system. It can be applied to samples in any of the major states
of matter, including gases, liquids, and solids. However, it is most commonly used with liquid
or solid samples. Analyzing samples in a gaseous state can be more challenging due to the low
concentration of molecules.
Water lacks distinctive Raman features and therefore becomes a useful tool to identify dissolved
compounds in a given water sample. Since water is unidentifiable by Raman spectroscopy.
Aside from this, little to no sample preparation is required, making it a convenient technique
for sample analysis. It can typically be analyzed without any major alterations. Only in cases
where a higher-power laser is used or the laser power is focused on a small point, there may be
a risk of sample burning. Nevertheless, the technique is generally non-destructive, preserving
the integrity of the sample.
The analysis of a sample can be executed in many different types of containers such as glass
bottles, Pyrex, reaction vessels, plastic containers, etc. For better results, a fully automated
Chapter 2. Spectroscopy 12
baseline-removal method for Raman spectra (Schulze et al., 2011 can be executed. This allows
the result to be more precise as the scattered spectrum of the container is removed from the
resulting spectrum. As well as the fluorescent background on a faint signal. This is on fig. 2.9
visible as the left part of the initial spectrum is compensated for to only visualize the spectral
peaks and not the background as part of the intensity.
Raman spectroscopy is a versatile tool applicable to various fields such as chemistry, materials
science, pharmaceuticals, biology, and geology. It provides valuable insights into molecular
structures, compound identification, crystallographic orientations, chemical reaction monitor-
ing, and the analysis of biological samples. This technique offers the advantage of providing
molecular information without the need for extensive sample handling, cleaning, or chemical
treatments that could introduce artifacts.
It is important to note that Raman spectroscopy has some limitations. It is not particularly suit-
able for detecting low concentrations of compounds due to the relatively weak Raman scattering
signal. Additionally, fluorescence from materials can interfere with Raman measurements, ob-
scuring the Raman spectra. However, techniques such as resonance Raman spectroscopy or
SERS 2 can mitigate these limitations and enhance the sensitivity and specificity of Raman
analysis.
In summary, Raman spectroscopy is a valuable tool in the realm of spectroscopic analysis, pro-
viding molecular insights across different states of matter. Its non-destructive nature, minimal
sample preparation requirements, and versatile applications make it a widely used technique
in various scientific disciplines. With ongoing advancements, Raman spectroscopy is poised to
contribute even more to our understanding of molecular systems and facilitate new discoveries
in the future.
These filters have a transmission and reflection band that are divided by a cut-on wavelength. It
is highly reflective below the cut-on wavelength and highly transmissive above it. (see fig. 3.3)
This arrangement helps to minimize the detection of the intense laser light, which could oth-
erwise overwhelm the weaker Raman signal. By using a longpass dichroic filter, the Raman
signal can be efficiently directed towards the detector while the laser light is reflected away,
improving the signal-to-noise ratio and enhancing the sensitivity of Raman spectroscopy.
It’s important to note that the specific optical components and setup used in Raman spec-
troscopy can vary depending on the instrument design, laser wavelength, and desired experi-
mental parameters. Different filters, including longpass dichroic filters, may be used in combi-
nation with other optical elements to optimize the Raman signal detection and reject unwanted
light sources in the system.
A Rayleigh filter on the other hand is specifically designed to block or suppress the Rayleigh
scattered light, which is the elastic scattering of light by particles or molecules. It is used in
applications where the goal is to separate the Rayleigh scattered light from other components,
such as in Rayleigh scattering measurements or experiments where only a specific scattered
light component is of interest.
Between both filtering mechanics are some similarities in terms of their function in separating
light. A Rayleigh filter is specifically designed for blocking or suppressing Rayleigh scattered
light, while a longpass dichroic filter is designed for separating wavelengths in a broad sense and
can be used for various applications including Raman spectroscopy. In this kind of application,
the longpass dichroic filter is used to suppress the wavelengths of the source light and also
suppress the Rayleigh backscattered light which is additionary to the Raman spectrum.
2.3.3 slit
The slit in a Raman spectrometer plays a crucial role in controlling the amount of light that
reaches the detector and determining the spectral resolution of the instrument. The slit acts as
an aperture through which the light from the sample is collected. By adjusting the width of the
slit, the amount of light that enters the spectrometer can be controlled. A narrower slit allows
for better spatial filtering, reducing the amount of stray light and improving the signal-to-noise
ratio.
Different excitation wavelengths are also better suited to different slit sizes. For example, a
532 nm laser will give much more Raman signal than a 785 nm laser and so a smaller slit size can
Chapter 2. Spectroscopy 14
be used without concern for the signal intensity being too low with the visible laser compared
to analyzing the same sample with a 785 nm excitation.
As an example, the spectrum of tryptophan was acquired (Ltd., 2023) using two different
slit sizes. Clearly, the 100 µm slit provides higher Raman intensity due to the higher laser
throughput with a larger slit size. However, reducing the slit size to 20 µm reveals more details in
the Raman spectra, therefore, increasing the spectral resolution. Peaks become better defined,
for example, the shoulders at approximately 1350 cm−1 and 1580 cm−1 in fig. 2.11 are much
more defined as individual peaks when using the smaller slit size. Furthermore, the background
fluorescence has been reduced upon closing the slit down.
The slit can help suppress background signals and unwanted scattering from the sample or
surrounding environment. By carefully positioning the slit in the optical path, it is possible to
minimize contributions from scattered laser light, fluorescence, or other sources of background
noise. The width is an essential parameter for instrument calibration and ensuring accurate
spectral measurements. The known width of the slit can be used to calibrate the instrument’s
response and accurately determine the Raman shift values.
2.3.4 grating
In a Raman spectroscope, the grating assumes a critical role as it disperses the incoming light
into its constituent wavelengths, facilitating the separation and analysis of Raman scattered
light. Comprised of closely spaced parallel lines or grooves, the grating exploits the principle of
diffraction to bend light at different angles based on its wavelength. This wavelength-dependent
diffraction leads to the spatial separation of the various wavelengths present in the incoming
light. This results in the dispersion of the Raman scattered light according to its wavelength.
The dispersion achieved by the grating not only enables the separation of different wavelengths
but also plays a fundamental role in determining the spectral resolution of the Raman spectro-
scope. The groove density of the grating determines the degree of wavelength differentiation,
measured in the number of grooves per millimeter (g/mm). Typical gratings start at 300 g/mm
through to 1800 g/mm and highly specialized gratings are available with a groove density of
over 2400 g/mm.
The theoretical wavelength limit for a grating with groove density n is λ = 2/n. Making the
highly specialized ruled diffraction gratings (2400 g/mm and above) limited up to the green
end of the spectrum and further below towards the yellow end of the spectrum. Making those
high-density grating suited, and only suited, to UV 3 excitation.
3 Ultra violet
Chapter 2. Spectroscopy 15
Grating is necessary to allow for the identification and distinction of closely spaced Raman
peaks. A higher groove density provides finer resolution, facilitating the detection and char-
acterization of subtle spectral features, while a lower groove density yields broader spectral
features suitable for capturing broader spectral information.
Once the grating disperses the Raman scattered light, a detector placed at a specific position
or angle captures the dispersed light for further analysis. By selecting an appropriate position
or angle for the detector, specific wavelength ranges can be targeted for measurement, allowing
for flexibility in capturing desired spectral information. This flexibility in detector positioning
is made possible by the wavelength dispersion achieved by the grating.
In summary, the grating in a Raman spectroscope assumes a pivotal role in the dispersion, sepa-
ration, and detection of Raman scattered light. By dispersing the incoming light and separating
it into its constituent wavelengths, the grating allows for the identification and characterization
of Raman peaks. Its groove density determines the spectral resolution, facilitating the detection
of subtle spectral features. The grating also enables flexible detection by positioning a detector
at a specific angle. Furthermore, the grating’s properties are utilized in the calibration process
to ensure accurate wavelength measurements. Altogether, the grating is an indispensable com-
ponent in Raman spectroscopy, contributing to the precise and reliable analysis of molecular
vibrational and rotational information.
Chapter 2. Spectroscopy 16
– Sensitivity
– Low Noice
– two-dimensional imaging
A CCD sensor type is capable of detecting and amplifying the necessary faint signals to detect
even small Raman shifts. Allowing for the analysis of samples with low concentrations.
It has a high quantum efficiency that ensures that a significant portion of the Raman scattered
light is detected, improving the overall sensitivity of the Raman spectrometer.
This sensor has low inherent noise levels, which is important for capturing and detecting weak
Raman signals. The low noise characteristics of CCD sensors contribute to the high signal-to-
noise ratio necessary for accurate Raman spectral measurements.
Two-dimensional imaging offers simultaneous detection of multiple points or regions of interest
within the Raman spectrum. This enables the acquisition of Raman images or mapping of
samples, providing spatial information about the distribution of Raman-active molecules.
These sensors are widely used and readily available in various sizes and formats, making them
compatible with different Raman spectrometer configurations. They can be easily integrated
and offer flexibility in design and optimization for specific experimental requirements.
The electrical signal obtained from the detector in Raman spectroscopy is known as a Raman
spectrum, which is a graphical representation of the intensity of Raman scattered light plotted
against the Raman shift or wavelength. These electrical signals directly correspond to the
captured Raman scattered light intensity at specific pixels on the sensor.
In Raman spectroscopy, a ruled diffraction grating is employed to disperse the different wave-
lengths of light and direct them onto the sensor, enabling the measurement of their respective
intensities. This allows the position and intensity of the pixels on the sensor to be converted
into the corresponding intensities of specific wavelengths resulting from the interaction between
the light source and the sample. The separation of wavelengths is achieved through the disper-
sion properties of the grating, allowing the distinct Raman shifts or wavelengths to be detected
and analyzed. Valuable information about the vibrational and rotational modes of molecules
within the sample can be obtained.
In a Raman spectrum, the x-axis represents the Raman shift, which is the difference in energy
between the incident and scattered light. It is typically measured in units of wavenumber
4 Charge-Coupled Device
Chapter 2. Spectroscopy 17
(cm-1 ) or wavelength (nm). The y-axis represents the intensity of the Raman scattered light at
each Raman shift. The intensity indicates the strength of the Raman scattering process and is
related to the concentration and characteristics of the molecules in the sample.
The Raman spectrum contains a series of peaks or bands, each corresponding to specific molec-
ular vibrations or rotations. These peaks are characterized by their position (Raman shift),
shape, and intensity. The positions of the peaks provide valuable information about the molecu-
lar structure and chemical composition of the sample. The shape and intensity of the peaks can
reveal details about molecular interactions, crystal structures, and other physical properties.
By analyzing the spectrum, researchers can identify and characterize the molecular species
present in a sample, determine the presence of impurities or contaminants, study molecular
dynamics, and structural changes, and gain insights into chemical bonding and molecular be-
havior. The Raman spectrum serves as a fingerprint of the sample’s molecular composition.
A Raman spectrum is typically analyzed to extract valuable information about the molecular
composition, structure, and properties of the sample. The analysis involves several key steps:
– The first step is to identify and assign the peaks in the Raman spectrum. This is often
done by comparing the observed peaks with reference spectra or databases of known
molecular vibrations. The positions and intensities of the peaks provide clues about the
functional groups and chemical bonds present in the sample.
– The relative intensities of the peaks in the Raman spectrum can provide information
about the concentration of different molecular species in the sample. By comparing peak
intensities, it is possible to determine the relative abundance of different components or
monitor changes in concentration over time.
– The shape of the Raman peaks can yield insights into molecular interactions, crystal
structures, and other physical properties. Factors such as peak width, symmetry, and
splitting can provide information about molecular dynamics, crystallographic symmetry,
and the presence of multiple molecular configurations or conformations.
– The shifts in the Raman peaks relative to a reference position (usually a known stan-
dard) can indicate changes in molecular structure, stress, or environmental factors. By
monitoring peak shifts, it is possible to detect changes in chemical bonding, molecular
interactions, or the presence of external influences.
Chapter 2. Spectroscopy 18
– In some cases, Raman spectra can be quantitatively analyzed to determine the concen-
tration or composition of specific components within a sample. This may involve calibra-
tion curves or multivariate analysis techniques to correlate Raman signal intensities with
known concentrations or properties.
– The Raman spectrum and analyzed results are often visualized in various ways, such as
plots, graphs, or spectral overlays. Visualization aids in the interpretation and commu-
nication of the data, allowing researchers to observe trends, patterns, and correlations.
The analysis of Raman spectra often involves a combination of manual interpretation, spectral
databases5 , and advanced analytical techniques. It’s important to note that the specific analysis
techniques and methods employed may vary depending on the research objectives, the nature
of the sample, and the complexity of the Raman spectrum.
5 Some available databases for spectral information can be found through Lafuente et al., 2015, Linstrom
OpenRAMAN spectrometer
The type of spectrometry that is handled here, commercially goes from a few thousand dollars
for a machine to several hundred thousand dollars. It is suitable for a range of form factors
from portable handheld devices to tabletop or even human-sized machines. All of them provide
analysis based on the same principles. More advanced features, higher precision, and better
sensitivity lead to more accurate and reliable results. This is what accounts for the increase in
price per unit. As these prices are outside the limits for general educational institutes this type
of technology is not considered for educational research with in-house machinery.
It has been almost a decade since a duo of optical engineering PhD students started to build
their own DIY1 Raman spectrometer. This build solely contained readily available compo-
nents. Although the resolution was poor, the essence of Raman spectroscopy was visualized
and measurable. During the span of a few years, the DIY build became more optimized and
they continued to improve the system. Since October 2019 the project became an open-source
project on their website. They provide build plans to recreate the DIY build yourself, as well as
a lot of information on their process and progression. Next to that, they opened dedicated pages
and fora for the growing community in open-source Raman spectrometry. The OpenRAMAN
initiative allows for an affordable and accessible Raman spectrometer.
At this point in time, there are three different build plans available. The breadboard version
is a take on the legacy version that started it all. This version is almost certainly in need
of upgrades for development with the spectrometer. It is a bare minimum with off-the-shelf
components that are pieced together like LEGO™. When sampling with common solvents it is
said to work great and provide semi-accurate results.
The starter edition is a fine-tuned entry-level spectrometer. This can be built for a little
over e 2000 with some additional 3D manufactured components. It is a combination of the
breadboard version with some necessary alterations to improve performance. They also provide
plans for a performance edition, this is a more powerful and higher-performance solution in
comparison to the other two models. The base price for the performance edition is over e 3000.
The main and final two published editions can be further built upon and altered to fit any
project better.
The choice to go for the starter edition was made in advance, but after personal research about
the system, it should theoretically be accurate enough for the type of measurements that are
wanted. From the research on the type and the kit, it became clear that the resolution of the
Raman spectrometer is all determined by the alignment, the resistance to outside interference,
and the right settings for a singular sample. So the resolution of the machine can only be
determined once it is built and tested.
1 Do It Yourself
19
Chapter 3. OpenRAMAN spectrometer 20
The starter kit provides both pdf and CAD2 models of the parts that need to be machined.
Here you can choose to do it yourself or do an online search for companies that do it for you.
As all machined parts can be produced using the in-house tools from Fablab Brussels on the
VUB3 campus, this choice was easily made. The part consists of several holders that need
to be used during the build process as well as a precisely machined baseplate to mount the
spectrometer. Next to the machined parts, the optical components need to be purchased as
well. Changing some components is possible but is not recommended for a first-time user.
Making alterations to the setup takes sufficient knowledge of optics and chemistry and can not
be assured of compatibility.
The instructions are detailed enough to follow along if all the parts are purchased and man-
ufactured as provided. Some insights will be gained from the build process as well as the
continued use of the spectrometer. This will be needed as the spectrometer will not work on
first use. Some tinkering and fidgeting with both the hardware and the software is inevitable.
The community and the creators are a big help for the more challenging problems.
3.2 parts
All the parts in the BOM4 are carefully selected and tested by a team of optics engineers in
order to create the best and most accurate Raman spectrometer with off-the-shelf parts. The
current parts list is a combination of years of work by the engineering team that developed the
kit from scratch. Their research and findings from over the years are mostly made available on
their website. To this day the team is still posting updates both on parts and on design choices,
this to improve the resolution of the Raman spectrometer. The community also provides its
own findings and alterations to the system. All of this makes a growing community that is
centered around the same type of spectroscopy with similar design. As they all start from the
same build schematics.
The provided documentation about the build both digitally and in video format are well writ-
ten and comprehensive. It provides knowledge about the system alongside the setup process.
Following these instructions will guide you through the correct order in which all the parts need
to be assembled. It also provides with clear instructions on how to focus each of the optical
lenses by making use of the sensor and a bright generic light source like a lamp.
Figure 3.2: OpenRAMAN spectrometer kit (annotations correspond to subsection numbering below)
light of the DPSSL in the center of the cuvet. The focused DPSSL source interacts with the
sample through the processes of elastic and inelastic scattering of light.
Two main processes occur simultaneously once the light interacts with the sample. These are
explained in section 2.1.1 and section 2.1.2. Both elastic and inelastic light scattering, scatter
light in all directions once the sample is hit with the 532 nm laser. Only a fraction of the light
is back-scattered in the preferred direction. This going through both of the lenses integrated
into the lens holder and towards the sensor.
Figure 3.4: Sample holder section with Cuvet (1), Achromatic Doublet (2), Plano-Concave Cylindrical
Lens (3)
Having a monochromatic laser beam as a light source in Raman spectroscopy is very important.
The decision of the wavelength of the incoming light and the preferred Raman shifted spectrum
for observation determine the wavelength of the DPSSL. For OpenRAMAN the founders deter-
mined a wavelength of 532 nm was giving good results for testing generic samples and chemicals
in water.
Due to the fact that elastic scattering in the back-scattering direction has such a small percent-
age of occurring (see section 2.1.3) the collimated light from the source needs to be focussed
on a singular point on the sample. There are 2 focusing lenses in place built into the sample
holder. These are positioned to their optimal focussing distance and help give better results as
this increases the occurrence of the elastic scattering effect.
Chapter 3. OpenRAMAN spectrometer 23
3.2.4 Slit
The slit in a Raman spectrometer serves the crucial purpose of controlling the amount and
direction of incoming light. By restricting the width of the light beam before it enters the
spectrometer, the slit helps to enhance the spectral resolution and signal quality of the Raman
spectrum. It prevents excess light from entering, which can lead to distortion and decreased
accuracy, while also minimizing background noise and interference. Essentially, the slit acts
as a precision filter, ensuring that only the relevant light is collected and analyzed, thereby
improving the overall performance and reliability of the Raman spectrometer. It is placed in a
cage housing in combination with two optical lenses. It is crucial that the focal points for these
lenses are precisely matched to the slit itself for optimal resolution.
Following this, once the back-scattered light hits the slit, all of the following parts are covered
underneath an additional plastic cover. This is to eliminate any and all unwanted inferences
and only measure the relevant light on the camera sensor.
The primary function of the grating is to disperse the incoming light into its different component
wavelengths. When light hits the grating, it undergoes a process called diffraction. This causes
the light to be spread out into its various spectral components, similar to how a prism disperses
light into a spectrum of colors. The dispersed light exits the grating at different angles (θ0 in
fig. 2.12), depending on its wavelength. This spatial separation of wavelengths is crucial for
creating a spectrum. Each angle corresponds to a specific wavelength, and the resulting pattern
of dispersed light forms the basis of the Raman spectrum. The design and properties of the
grating determine the spectral resolution of the spectrometer. Spectral resolution refers to the
ability of the spectrometer to distinguish between closely spaced wavelengths. A high-quality
grating allows for finer separation of wavelengths, leading to higher spectral resolution and
more accurate measurement of Raman shifts.
The dispersed light from the grating is focused onto the sensor. This records the intensities of
the different wavelengths of light, allowing for the creation of a Raman spectrum.
as it scans the image frame by frame. Also, there is no combing effect present in case of
progressive scanning. Therefore, the video produced by progressive scanning is of high quality.
The Sony sensor is fitted with a global shutter function and a 3.45 µm pixel that is the smallest
class in the industry. Higher sensitivity and lower noise than that of the existing 5.86 µm pixel
products. It achieves high picture quality, high resolution, and high-speed imaging without
focal plane distortion.
The camera is directly connected to a computer where dedicated software visualizes and stores
the results.
Once there is a basic understanding of the parts that go into a spectrometer the build itself is
logical. You place all the parts yourself by following the guide that is provided or follow along
with the build video that can be found online. To start off you have to have the baseplate
machined. The kit was already tested out and had a baseplate printed in plastic, which was
divided into parts as it was bigger than the generic print surfaces of 3D printers. The parts
were linked together with nuts and bolts. This baseplate soon become unusable as it was not
sturdy enough and the alignment of all the parts became inaccurate beyond fixing. A new
baseplate machined in a piece of wood seemed the better route. The CNC-machined part gave
the entire structure some more stability alongside a fresh build surface to start from.
The order of the build is crucial as you build and align certain parts as you go. The compactness
of the entire kit depends on this specific order, some parts are becoming impossible to reach
in the final product. All in all the build is easy to follow, does not require a lot of tools, and
is well documented. During assembly, some parts are aligned while you mount them to the
baseplate. For example, finding the focal points for the lenses around the slit is done while the
cage structure (fig. 3.8a) is being assembled. Step by step you add a new lens and match its
focal point to the position of the slit. Here you use the FLIR camera, its software, and a generic
light source to focus it as best you can before screwing both pieces tight to the cage assembly.
The same goes for the secondary lens in this cage setup. Once everything for this cage setup
is aligned, it is mounted to the baseplate. The accuracy of the assembly and alignment will
directly translate into the accuracy of the spectrometer itself so do take note of this.
(a) Cage assembly 50 µm slit and focussing lenses (b) Cross section of cage assembly
Figure 3.8: OpenRAMAN cage assembly for slit
The path from the sample to the diffraction grating needs to be a straight path. To help achieve
this, the long 6 mm rods are needed. This allows the slit cage to be in-line with the cuvet holder
mount. This is important as the collimated portions of the light rays stay centered when all
of the spectrometer parts are perpendicular and aligned to each other. These collimated rays
then hit the lenses throughout the spectrometer in their optimal position. The light rays do
not deviate more than what they are supposed to (see fig. 3.5) and all of the optical parts in
place to counteract this deviation keep the rays as centered as possible.
The final adjustments on the hardware are made by turning the dials on the kinematic mirror
mount. This mount allows for both horizontal and vertical adjustment of the inserted lens/mir-
ror. This allows for very precise adjustments the Thorlabs design is unique and helpful in the
OpenRAMAN kit.
Chapter 3. OpenRAMAN spectrometer 27
3.4 calibration
The digital guide explains the basics of both hardware and software calibration. For the hard-
ware, this needs to be done while assembling the kit. As some calibration and alignment steps
need some additional structures built before mounting the final assembly to the baseplate. You
are required to have a computer with the dedicated Spinview software and SDk8 . This piece of
software is provided by FLIR, the manufacturer of the proposed monochrome CMOS sensor.
It shows the sensor data from the Blackfly camera in real time. Using most of the equipment
in the kit, the alignment of it all can be done on your own with the use of the camera visuals.
Like in fig. A.1 both the orientation of the slit and the focal point of the lenses in front and
behind the slit can be set with a simple construction as shown to align the sensor, the lenses,
and the slit to their most optimal position. Afterward, you can tighten it all to the cage mount
system to only set this up once.
The software calibration is for the Spectrum Analyzer software that was made for the kit. In
order to run this software the Spinview API9 needs to be present on the machine. Apart from
8 Software development kit
9 Application programming interface
Chapter 3. OpenRAMAN spectrometer 28
a well-aligned spectrometer kit, you will also need a calibration light source. These lights are
generally very expensive but provide you with the reliability and the emitted spectrum from
the source. A mercury and argon light source were purchased from the manufacturer Ocean
Insight as a dedicated spectrometer wavelength calibration source. In the range of 253 to 923 nm
multiple gas-discharge emission lines are emitted from the device. A list of the strong emission
lines can be found at table A.1 and table A.2.
As a starting point, a small neon gas lamp was used. Here the emitted light from the lamp
was inserted into a fiber cable and redirected to the center path of the spectrometer. The fiber
termination is only 600 µm in diameter so a small rig was made to fine-adjust it to its correct
positioning. The slit is 50 µm wide so a way to align them both is very necessary.
(a) Emission lines with neon light (b) Emission lines with Mercury-Argon light
Figure 3.11: Difference light sources for calibration of Spectrum Analyzer Suite
This calibration setup is later used with the official calibration light source. For preliminary
testing, the neon lamp worked fine although no manufacturer could provide or confirm the
spectrum of the emitted light. The software calibration is based on generalized information
about the emitted spectrum of a neon gas lamp. While if the calibration light source is used,
the data can be compared to the table of emission lines that is provided by the manufacturer.
The best results were found when the calibration setup is done using the neon light source.
After this it is checked with the Mercury-Argon calibration light source as it has a dedicated
specification sheet to compare the findings.
The software calibration tool is robust and semi-automated. A common problem with cali-
brating a spectrum is dealing with missing peaks. Sometimes peaks are just too dim to be
measured on the sensor or get clipped by the spectral range of the spectrometer. One of the
other problems while dealing with light is compensating for environmental light pollution. This
can be caused by overhead artificial lighting or natural sunlight either direct or reflected on
various materials around the test area. This is why in the Spectrum Analyzer you can set a
blank reading so that the software can compensate for the environment.
Some precautions are taken to prevent ambient lighting from having a significant effect on the
resulting data. For example, OpenRAMAN provides a model for a plastic cover that can be
3D printed to shield the light rays from the point they pass through the slit, getting reflected
on the diffraction grating until being picked up by the sensor. In addition, the spectrometer
was covered as much as possible from the environment. This, to a point where all the lights
in the room were dimmed, the blinds were shut, the computer brightness was turned down as
low as it can go and a box was placed overtop of the spectrometer. All of this is to shield any
ambient interference while doing either the calibration or the sample measurements itself.
Chapter 3. OpenRAMAN spectrometer 29
After calibration, the visualized data is the conversion of the intensity measured on the sensor to
a wavelength or wavenumber value and presented as a clear graph. While converting the specific
pixel that is illuminated by the back-scattered Raman spectrum to a specific wavelength. The
accuracy of the calibration is directly correlated to this conversion. If the calibration is off the
conversion can not happen properly.
1 1
v[cm−1 ] = 107 − (3.1)
λe λ
If we measure λ experimentally, the only way to get a good estimation of v is to have a precise
and unique value of λe . For the current setup, this value is set at 532 nm by the choice of
DPSSL.
Data Analytics
Measuring and storing data in the Spectrum Analyzer suite is intuitive and well explained in
the documentation about the software. Getting good results depends most on the build and
calibration of the kit. But it also depends on the setting at which you try and measure a
Raman spectrum. The acquisition of data depends on all of these things so it takes a while to
understand how to gather good results. After measuring a spectrum with the kit the data can
be stored in a singular CSV1 file that looks like fig. 4.1. The entire pixel-width of the sensor
(1920 pixels for the current model) is converted to the set x-axis labels and stored alongside
their measured intensity in the file.
Figure 4.1: Saved spectrum in Spectrum Analyzer Suite - header and layout of file
Each saved file is only a singular measurement. In general circumstances, this will suffice as the
Raman spectrum holds all the information about the sample that is measured at that time. The
unique sample fingerprint can then be interpreted as it is a peak based. Compare the findings
with known Raman databases or literature to deduce the possible chemical components. For
singular samples and measurements, this would be sufficient if done by hand. But for a lot
of different samples or if the sample needs to be measured through time this would become a
serious task. Not to mention the abundance of singular measurements also means a lot of files.
As there is currently no continuous measuring option in the software available that allows for
a fixed number of measurements to be taken and combined into a singular file.
The data analysis requests a lot of measurements from different samples. To continue to use the
spectrometer some other programs (written in Python) were written for this research project
in order to make the data collection easier on the user. One of which was a data gathering
application. Here the number and interval of measurements could be set to automatically save
sample measurements. This basic autoclicker automatically saved measurements but it was
still a new file for every measurement. To get around this and format all the data into a single
file that then could be analyzed, a secondary program was written. Here, all the CSV files were
read and combined to one file that could be appended to after every batch of measurements.
With machine learning in mind the opportunity of labeling all these measurements during this
process was also added as a feature to the program. After the measurements are taken from
the spectrometer and the two programs were run, all the data is combined and labeled ready
for analysis.
1 Comma-separated values
30
Chapter 4. Data Analytics 31
Not many publications can be found between Raman spectra and machine learning as it is still
in the development stage. Some algorithms, which have been applied by others, are categorized
under supervised, unsupervised, and hybrid learning methods.
In supervised learning methods, we attempt to represent the connections and dependencies
between the input features and the target prediction output. Predicting the output values for
new data using the associations discovered from earlier data sets is the aim. Consequently,
task-driven supervised algorithms are used. Regression and classification tasks are carried out
using supervised learning. Convolution neural networks are a popular illustration of supervised
deep learning techniques.
Unsupervised learning uses training datasets without supervising the models. Instead, models
themselves decipher the unlabeled data to reveal hidden patterns and insights. It is comparable
to the learning process that occurs in the human brain while learning something new. It is
known as a data-driven strategy and enables users to carry out more complicated processing
tasks as compared to supervised learning. An unsupervised machine learning platform may do
tasks including dimension reduction, grouping, and association.
[Jinadasa et al., 2021] The ability to apply deep learning algorithms for unsupervised
learning tasks is an important benefit because in big data sets, unlabeled data are
more abundant than the labeled data. Autoencoder, sum product network, recur-
rent neural network, and Boltzmann machine can be considered unsupervised deep
learning algorithms. Supervised learning algorithms seek to answer the questions
like “Based on the Raman fingerprint of this new sample I have just collected, which
class in my database does it (most likely) belong to?” Meanwhile, unsupervised
learning algorithms seek to answer the questions like “How similar to one another
are these samples based on their Raman fingerprints?”
Figure 4.2: Unsupervised and supervised algorithms commonly used in deep learning applications for
Raman spectra (from: Jinadasa et al., 2021)
spectral analysis and integrate the contributions of deep learning to overcome these obstacles.
Presented below are some difficulties that researchers have encountered during the analysis of
Raman fingerprints.
An obstacle faced during the analysis of Raman fingerprints arises when dealing with weakly
Raman-active samples, which require high spectral resolution, minimal spectral background,
and high sensitivity for accurate examination. The proportional strengths of the Raman bands
in the analytes shift in correspondence with the solvents and are tied to the shift in absorption
peaks. The presence of peaks stemming from the matrix is a common occurrence in numerous
biomolecular Raman applications. To illustrate, paraffin-fixed tissue might manifest a similar
peak to a C − H stretch. Thus, distinguishing the authentic spectra from the matrix constitutes
an equally vital precursor to the analysis process.
One of the greatest challenges in Raman spectroscopy is that it is influenced by the turbidity,
color, and fluorescence of the sample. The presence of strong fluorescence background has so far
constrained its application in various potential fields, including agriculture, food, oil industries,
security control, and criminal investigations. its use in many otherwise potential applications,
for example, in the agricultural, food and oil industries, security control, and crime investiga-
tions. It is the most challenging because of the complex biological matrices and the associated
fluorescence. This type of fluorescence intensity typically surpasses the Raman scattering signal
by several orders of magnitude, particularly when dealing with biological samples. This con-
trast arises from the inherently lower probability of Raman scattering compared to fluorescence.
As a result, Raman spectroscopy encounters hurdles, especially in scenarios involving intricate
biological systems, making the differentiation between Raman signals and fluorescence hard.
A strong fluorescence background comes with problems. It becomes the dominant element in
the photon shot noise and thus detracts from the SNR2 . Errors in the mathematical estimation
and removal (background subtraction) of the fluorescence increase with increasing fluorescence
levels. This results in increasing errors in both material identification and concentration mea-
surement applications.
Computational methods can have a substantial impact on deciphering chemical Raman spectra
amidst the presence of fluorescence signals. Instances of such techniques encompass polynomial
fitting, wavelet transformation, and derivative analysis. It is mentioned (Wei et al., 2015) that
the optimal selection of polynomial order for fitting varies and the effectiveness is subject to
the user’s familiarity with the process. The application of derivatives to a recorded Raman
spectrum effectively eliminates background components, enhancing the clarity of the Raman
signal. However, this approach often amplifies high-frequency noise and might distort the
spectrum due to the inherent nature of the derivative process.
from the training set based on the chosen distance metric. The most prevalent class among
these k-nearest neighbors is assigned to the new spectrum.
The choice of the hyperparameter "k" significantly impacts KNN’s performance. A smaller
k value might lead to noisy classifications, whereas a larger k might lead to over-smoothed
decisions. The optimal k value often depends on the dataset’s characteristics and requires
experimentation or cross-validation.
Its simplicity is both a strength and a limitation. It is easy to implement and can capture
complex decision boundaries. However, it can be computationally expensive for large datasets,
as it requires calculating distances to all training samples. Furthermore, KNN might struggle
when data is imbalanced, and it doesn’t handle irrelevant features well.
In Raman spectroscopy, KNN can be applied to classify unknown samples into different chemical
or biological categories based on their Raman spectra. Its flexibility and ease of implementation
make it a valuable tool, particularly for exploratory analysis and initial hypothesis testing. Like
other machine learning methods, KNN’s success depends on proper data preprocessing, feature
selection, and tuning of hyperparameters for the specific application at hand.
P (f eatures|class) ∗ P (class)
P (class|f eatures) = (4.1)
P (f eatures)
Due to its efficiency and speed, Naive Bayes is particularly suitable for situations where the
dataset is voluminous, making it well-suited for the complex data generated by Raman spec-
troscopy. However, the "naive" assumption of feature independence might not hold true in all
cases, which can impact the algorithm’s accuracy. Despite this limitation, Naive Bayes remains
a valuable and interpretable tool for rapid classification tasks, aiding in identifying the chemical
composition of samples based on their distinctive spectral features.
In the context of Raman spectroscopy, SVC aids in classifying samples based on their unique
Raman scattering patterns. It accomplishes this by identifying a hyperplane that optimally
separates data points belonging to different classes in the high-dimensional feature space. The
algorithm not only seeks an optimal division but also maximizes the margin between the classes,
enhancing its ability to generalize to new, unseen spectra.
What distinguishes SVC is its adaptability to non-linearly separable data, which is a common
scenario in Raman spectroscopy. By employing the kernel trick, SVC transforms the data into
a higher-dimensional space, where classes could potentially be linearly separable. Kernels such
as polynomial, RBF5 , and sigmoid are pivotal in shaping the transformation strategy.
Training an SVC involves identifying key parameters that define the hyperplane and the mar-
gin, including support vectors. These vectors serve as crucial data points for defining the
separating boundary. Once trained, the SVC can proficiently categorize new Raman spectra by
placing them within the appropriate class, relying on their position relative to the established
hyperplane.
SVC is skilled in handling intricate decision boundaries making it another favored choice for
Raman spectroscopy tasks like sample classification. However, for optimal results, careful
4 Support Vector Classifier
5 Radial Basis Function
Chapter 4. Data Analytics 35
selection of hyperparameters and mitigation of overfitting are essential steps. Through its
fusion of advanced mathematical principles and spectral analysis, SVC is a formidable tool for
discerning between different classes of materials or compounds.
4.3 Results
After numerous attempts to calibrate the OpenRAMAN kit to its most optimized and ac-
curate configuration, it was time to generate samples. Without the extra tools in place for
this research project, this would become a repetitive and tedious task. But in order to build
up enough samples it was necessary for some ML algorithms to be tested. The extra time
spent on streamlining the gathering process has paid off. The actual spectral resolution of the
spectrometer would become very crucial in this stage of the research.
4.3.1 By Reference
OpenRAMAN provides some reference spectra (see fig. A.3 to fig. A.6) on the website for a
handful of chemicals. One of the first steps was to see if these results could be replicated
using our very own Raman spectrometer. There was reference data to check if the machine
was performing as it should. It should be noted that these reference spectra are taken on the
performance edition. Although they measure the same range (500 to 3500 cm−1 ) their spectral
resolution measured as FWHM6 is more than half for the improved performance edition against
the system tested here. A slight decrease in spectral resolution should be taken into account
while analyzing this result.
Figure 4.5: Visualization of difference between starter and performance edition by OpenRAMAN
A few of these reference chemicals were tested as singular measurements against their reference
data. While doing these measurements the final adjustments were made to the spectrometer
to further improve the overall resolution. The results have similar spectral features and sug-
gest that the spectrometer is well calibrated and offers a usable result while analyzing these
chemicals.
The similarity highly depended on the measurement that is looked at. Not all spectral features
are prominently visible in every measurement. Some of these features appear or disappear
when the exposure time of the measurement is increased. It is good practice to re-evaluate all
the configuration values for each new sample, known or unknown. This means adjusting the
exposure time, gain factor, or even setting a new blank sample between measurements. Almost
all of this is manual adjustment in the software itself. The more you get to know the system
and analyze results the better understanding you get from how samples are dependent on all
of these adjustments.
6 Full width at half maximum
Chapter 4. Data Analytics 36
– Precision measures the accuracy of the positive predictions made by the model. It is
the ratio (eq. (4.2)) of the correctly predicted positive observations to the total predicted
positives. In other words, precision tells you how many of the items that the model
predicted as positive are actually positive. A high precision indicates that the model’s
positive predictions are trustworthy.
– Recall measures the model’s ability to identify all relevant instances of a class in the
dataset. It is the ratio (eq. (4.3)) of the correctly predicted positive observations to the
actual positives. In simpler terms, recall tells you how many of the actual positive items
the model has successfully predicted. A high recall indicates that the model is good at
capturing all instances of the positive class.
– F1-score is the harmonic mean of precision and recall. It provides a balanced measure
that considers both false positives and false negatives. f1-score (eq. (4.4)) is especially
useful when you have an imbalanced dataset, where one class might have significantly
more samples than the other. It ranges from 0 to 1, where 1 is the best possible f1-score.
To summarize, precision is inversely correlated to the false positive rate. While the recall value
is similarly correlated to the false negative rate. Finally, a balanced combination of both values
is represented by the f1-score.
T rueP ositives
P recision = (4.2)
(T rueP ositives + F alseP ositives)
T rueP ositives
Recall = (4.3)
(T rueP ositives + F alseN egatives)
(P recision ∗ Recall)
F 1 − Score = 2 ∗ (4.4)
(P recision + Recall)
There are a number of metrics to look at when analyzing the performance of a classification
algorithm. The models are all trained from a dataset that is almost 3000 entries long. Each
entry is a specific measurement for a specific chemical done on the spectrometer. Afterward,
the performance is represented in both the confusion matrix and a classification report for the
KNN algorithm. The matrix (eq. (4.5)) gives us insight into how the test data has performed
on the algorithm. The main diagonal represents the data where the suggested label correctly
corresponds to the given label. The other values represent the wrongly classified data and what
it was classified as by the algorithm. The report gives us insight into the performance metrics
Chapter 4. Data Analytics 37
mentioned in section 4.3.2. Here it can be noted that most of the labels have a precision value
higher than 85%
The same data was subjected to a different classification algorithm other than KNN. The final
accuracy score of SVC measured was 87.8%. While the test of Naive Bayes only resulted in
a 41.2% accuracy with 333 mislabeled points out of the total 567 points. This would suggest
that Naive Bayes is not as accurate as KNN or SVC. As this is just a quick metric using some
generic classification algorithms, this is not a definitive result. Other algorithms could be used
like CNN7 for example or the data could be optimized even further. Even the already tested
AI8 models have room for improvement.
84 18 0 3 0 0
6 76 0 5 0 0
2 1 67 0 0 0
11 15 1
(4.5)
88 0 2
5 4 1 4 51 6
4 5 0 1 7 100
4.3.3 Cyanobacteria
In addition to the chemical substances and their corresponding reference data or the AI models,
a practical assessment involving Cyanobacteria was conducted. This real-world test exemplifies
the spectrometer’s operational capabilities and analysis potential. The objective was to discern
between samples containing the bacteria and those devoid of it, evaluating the spectrometer’s
effectiveness in distinguishing between the two conditions.
Cyanobacteria or blue-green algae are photosynthetic microorganisms that play a vital role
in ecosystems by producing oxygen and serving as a primary food source. However, under
certain conditions, they can form harmful algal blooms, releasing toxins that pose a threat to
aquatic life and even human health. Cyanobacteria possess distinctive molecular compositions,
including pigments like chlorophyll and phycocyanin, which can be detected through their
characteristic Raman spectra. By understanding the unique spectral signatures of cyanobacteria
and their pigments, researchers and environmental managers can detect these organisms early,
enabling proactive interventions to mitigate the potential ecological and public health impacts
of cyanobacterial blooms.
The bacteria strains should theoretically show three characteristic bands in the region between
1600 and 1000 cm−1 . These bands are specifically located at 1500, 1160 and 1000 cm−1 , as-
signed respectively to (C C) and (C − C) stretching and (C − CH3) deformation. These bands
7 Convolution Neural Networks
8 Artificial Intelligence
Chapter 4. Data Analytics 38
are slightly different from those of pure beta-carotene. These three bands are attributed to vi-
brational modes of the polyenic chain, which could also incorporate substituents at the terminal
positions, like methyl groups and aliphatic or aromatic rings.
·10−2
Before
7 After
5
Intensity (a.u.)
540 560 580 600 620 640 660 680 700 720 740
Wavelength (nm)
Figure 4.6: Raman spectra of water sample before and after addition of Cyanobacteria
Like other biological organisms, cyanobacteria have various molecules such as pigments, pro-
teins, and nucleic acids that can interact with light. When the laser light interacts with
cyanobacterial cells, it can induce both Raman scattering and fluorescence emissions. This
kind of fluorescence can be seen on fig. 4.6 as the wide and intense characteristic in the center.
The three characteristic bands in the wavenumber graph could not be measured, the inference
from the fluorescence was too high. Yet, this could be seen as a sign that the bacteria are
present in the sample.
Chapter 5
Conclusions
In the realm of scientific investigation, spectroscopy stands as a fundamental pillar that has rev-
olutionized our understanding of matter and the universe. From the classical methodologies of
Rayleigh and Raman scattering to the intricate machinery of modern spectrometers, this com-
prehensive journey through the world of spectroscopy unveils a realm where light interactions
reveal the hidden secrets of molecules and materials.
Spectroscopy, in its various forms, has become an invaluable tool across multiple scientific
disciplines. It offers a unique window into the molecular world, enabling us to decipher the
intricate dance of atoms and the vibrational modes that define their identities. Through elas-
tic and inelastic light scattering, scientists can gather information about molecular structure,
interactions, and even elusive characteristics like quantum efficiency.
Raman spectroscopy, with its unique ability to pinpoint molecular vibrations, is an exemplary
technique that has found applications in numerous fields. Its power to identify substances
without destructive processes has enabled rapid and precise analyses. The journey through the
historical development of Raman spectroscopy leads us to the contemporary era where open-
source initiatives like OpenRAMAN have democratized this technology, enhancing accessibility
and promoting collaborative research.
However, every advancement brings its challenges. In Raman spectroscopy, the persistent fluo-
rescence background looms as a formidable obstacle. This challenge necessitates sophisticated
computational methods to separate the true Raman signal from the overwhelming fluorescence
noise. Here, deep learning algorithms emerge as powerful allies, enabling the extraction of
meaningful information from complex spectra.
This research delves into the core components of Raman spectrometers, illustrating the intri-
cacies of optical systems, detectors, and filters. The significance of the slit, grating, and sensor
is unveiled, each playing a crucial role in ensuring optimal spectral acquisition. The heart of
the spectrometer lies in the data it generates. The Raman spectrum, a graphical representa-
tion of intensity against Raman shift or wavelength, serves as a fingerprint of the molecular
interactions. The data offers insights into molecular composition, vibrational modes, and more.
Machine learning algorithms, particularly Support Vector Classifier and K-Nearest Neighbor
emerge as indispensable tools. The utilization of SVC empowers researchers to efficiently classify
samples into discrete categories, leveraging their unique spectral patterns. KNN complements
this approach by navigating high-dimensional data, enhancing the accuracy of classification and
identification processes. These machine learning algorithms, SVC and KNN, stand as beacons
of progress, steering Raman spectroscopy toward comprehensive and precise analyses, further
unraveling the intricate molecular landscapes.
39
Chapter 6
Future work
In my opinion, this research should only be seen as foundational work around the OpenRA-
MAN spectrometer and all of its capabilities. At the core is an extensive tool for molecular
composition. It has the ability to quickly adapt to a specific project or approach as it is a kit
and not a professional machine. The possibilities reach far and wide as displayed in the growing
community. Here some interesting spins have been shared already and are encouraged by the
founders. They are still improving the system to this day and are open to collaboration in any
shape or form.
In the near future, the spectrometer will help provide a doctoral candidate at the VUB with the
accumulation of Raman spectra. While finishing this research some collaboration was already
put in place so both parties could benefit from the same results. As well as sharing knowledge
about numerous topics related to ML, spectroscopy, and general research. This collaboration
should be kept in place as it helps both parties tremendously.
There could be even some new dissertation or equivalent project built upon the foundation
research that is done here. In the years to come, students at the VUB could take the oppor-
tunity to improve the current system and enhance the OpenRAMAN spectrometer itself or
try and expand the capabilities of the machine by dedicating some time to strive towards the
best artificial intelligence data analyzer for Raman spectroscopy. The SmartWater Monitoring
project in Brussels could also benefit from the advancements that were made in this research
project.
40
Appendix A
41
Appendix A. Drawings, tables and plots 42
·106
1.2
0.8
Intensity (a.u.)
0.6
0.4
0.2
·106
1.6
1.4
1.2
Intensity (a.u.)
0.8
0.6
0.4
0.2
·106
1.8
1.6
1.4
1.2
Intensity (a.u.)
0.8
0.6
0.4
0.2
−0.2
3,500 3,000 2,500 2,000 1,500 1,000 500
Wavenumber (cm−1 )
Figure A.5: Reference spectrum for ethyl acetate from OpenRAMAN website
Appendix A. Drawings, tables and plots 44
·106
1.2
0.8
Intensity (a.u.)
0.6
0.4
0.2
(Dec. 2021). url: https : / / www . jasco - global . com / principle / 1 - what - is - raman -
spectroscopy/.
Ball, D. W. (2006). ‘Basics - Spectrometer, Spectroscope, and Spectrograph’. In: Field Guide
to spectroscopy. SPIE Press, pp. 1–2.
Campbell and White (1999). IR Spectroscopy and Raman Scattering. url: https://www.eng.
uc.edu/~beaucag/Classes/Analysis/Chapter5.pdf.
Cinta Pinzaru, S., Csilla, M. M., Ioana, B., Glamuzina, B., and Barchewitz, D. (June 2016).
‘Cyanobacteria Detection and Raman Spectroscopy Characterization with a Highly Sensi-
tive, High Resolution Fiber Optic Portable Raman System’. In: Studia UBB Physica 61,
p. 99.
de Oliveira, V. E., Neves Miranda, M. A., Soares, M. C. S., Edwards, H. G., and de Oliveira,
L. F. C. (2015). ‘Study of carotenoids in cyanobacteria by Raman spectroscopy’. In: Spec-
trochimica Acta Part A: Molecular and Biomolecular Spectroscopy 150, pp. 373–380. issn:
1386-1425. doi: https://doi.org/10.1016/j.saa.2015.05.044. url: https://www.
sciencedirect.com/science/article/pii/S1386142515006332.
Edwards, H., Garcia-Pichel, F., Newton, E., and Wynn-Williams, D. (2000). ‘Vibrational Ra-
man spectroscopic study of scytonemin, the UV-protective cyanobacterial pigment’. In: Spec-
trochimica Acta Part A: Molecular and Biomolecular Spectroscopy 56.1, pp. 193–200. issn:
1386-1425. doi: https : / / doi . org / 10 . 1016 / S1386 - 1425(99 ) 00218 - 8. url: https :
//www.sciencedirect.com/science/article/pii/S1386142599002188.
Edwards, H. G., Moody, C. D., Jorge Villar, S. E., and Wynn-Williams, D. D. (2005). ‘Ra-
man spectroscopic detection of key biomarkers of cyanobacteria and lichen symbiosis in
extreme Antarctic habitats: Evaluation for Mars Lander missions’. In: Icarus 174.2. Mars
Polar Science III, pp. 560–571. issn: 0019-1035. doi: https : / / doi . org / 10 . 1016 / j .
icarus.2004.07.029. url: https://www.sciencedirect.com/science/article/pii/
S0019103504003574.
Environmental awareness (n.d.). url: https://pachamama.org/environmental-awareness#:
~:text=Environmental%5C%20awareness%5C%20is%5C%20having%5C%20an, the%5C%
20importance%5C%20of%5C%20its%5C%20protection..
Jinadasa, M. W. N., Kahawalage, A. C., Halstensen, M., Skeie, N.-O., and Jens, K.-J. (2021).
‘Deep Learning Approach for Raman Spectroscopy’. In: Recent Developments in Atomic
Force Microscopy and Raman Spectroscopy for Materials Characterization. Ed. by C. S.
Pathak and S. Kumar. Rijeka: IntechOpen. Chap. 5. doi: 10.5772/intechopen.99770.
url: https://doi.org/10.5772/intechopen.99770.
John, N. and George, S. (2017). ‘Chapter 5 – Raman Spectroscopy’. In: url: https://api.
semanticscholar.org/CorpusID:103881197.
Kramida, A., Ralchenko, Y., and Reader, J. (Mar. 1995). ‘Atomic Spectra Database’. In: url:
https://www.nist.gov/pml/atomic-spectra-database.
Lafuente, B., Stone, N., Yang, H., and Downs, R. T. (2015). ‘Highlights in Mineralogical Crys-
tallography’. In: url: https://rruff.info/.
Linstrom, P. J. and Mallard, W. G. (n.d.). NIST Standard Reference Database Number 69. url:
https://doi.org/10.18434/T4D303.
45
Bibliography 46