Professional Documents
Culture Documents
net/publication/267630142
CITATION READS
1 851
11 authors, including:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Kostas Anagnostou on 02 April 2017.
Abstract—The present work describes the telemetric wireless measurements in order to extract statistically significant
communication system of a measuring station, implemented at inferences.
the hot water source place. The telemetric unit of the measuring
station collects continuously data of various physicochemical
factors. The integrated systems is able to continuously measure,
process and transmit via a radio transmitter data regarding
radon, OPR, conductivity, water temperature and PH. An
Data System Communica
additional unit receives and stores data in convenient form for Sensors logger manager tions
data analysis. The second part of the work presents a statistical
study of the transmitted data with a novel general linear model
analysis that selects the best fitting polynomial model for future
prediction. Power
manager
220V Power
Keywords—component: Telemetric communication system,
Path Loss models, Measuring station, Hydrogeology Engineering, Grid Inverters –
Pv DC Power
panels
Modeling/Statistics ) accumulators
Control-data
I. INTRODUCTION
This work describes the first outputs from a research
concerning a measuring station collecting continuously data of
physicochemical characteristics from ground thermal waters.
Fig. 1. The measuring station with its telemetric sytem
This research project is a major research program that has
received so far two national and EU research grants. A quite general platform, has been built responsible for the
The present work reports a) the novel telemetric module of collection, pretreatment, broadband transmission and storage of
the measuring station and b) a novel statistical analysis of the data. This platform includes an independent source of energy
transmitted data based on a new best for prediction fitting via photovoltaic elements and a power management electronic
model technique. The physical layer concerns the simultaneous module, see Fig. 1
measurement of natural parameters and indicators of thermal -
metallic sources of Thermopylae hot water springs. Some of II. TELEMETRIC SYSTEM AND OUTDOOR PATH LOSS
the measured parameters are the concentration levels of radon MODELS
in water, the water temperature to study the flow rate and depth
variations, PH to study the acidity variations, Redox potential The data collection measuring station, transmits the digital
to study the biologic load variations, and electrical conductivity data through a wireless radio network (using the radio modem).
to study the salinity variations. The receiver main unit for data processing has been placed in
the campus of Technological Educational Institute (ATEI) of
The main objective of the whole project is to develop an Central Greece, Lamia, Greece. A functional and a schematic
innovative apparatus i.e. a collection of equipments; hardware representation of Thermopylae data collection station can be
and software modules capable to perform environmental seen in Figures 1.
measurements and collect maintain and transmit data. The
whole dataset of the measurements will be available in the In order to find the best location for placing Thermopylae
local authorities, decision policy makers and the scientific measuring station for optimal operation, we have also
community in general. A secondary objective is to perform a considered the following tasks: data sampling frequency
modern statistical analysis of the data derived from distance between the two stations (transmitting and receiving),
123
topologies [1]-[9] and it has been deemed that the free space interval will contain. If the first value (how sure) is equal to
model is suitable for LOS (Line-of-Sight) open areas, whereas 50%, then the tolerance interval is the same as the prediction
the Log-Distance model can be employed in both open and interval. If it is set to a higher value then the tolerance interval
urban scenarios, as well as in suburban topologies, due to the is wider.
zero-mean Gaussian variable that allows for a more reliable
prediction, taking into consideration the site-specific conditions The most popular statistical criteria in model selection for
every time in its mathematical formula. applications [13], are designed to the target: find the model,
which "best", under some criteria, fits the data. However, these
160
Free Space vs. Log-distance path loss model for open/rural areas criteria are, in principle, functions of the residual sum of
squares [14]. Nevertheless, these best fitting studies are applied
140
for prediction too, although these “distance” criteria have not
designed for this purpose.
120 In the present data analysis a new method for best for
prediction fitting model is followed [15],[16],[17],[18]. The
path loss (dB)
Path loss modeling is necessary in order to provide reliable It should be mentioned that the cross validation method
estimations for the average signal strength in distances up to 40 [19] should not be confused with the technique in use. In this
km, such as the one in this case study, separating the base method the framework is completely different since the
station and the receiver. algorithm tests how well a polynomial performs for prediction
using a portion of data. On the other hand, the used algorithm
utilizes all data and selects the best polynomial for prediction.
The well known general linear model (GLM), can be
III. ANALYSIS expressed as
First it is necessary to give a brief remind of the three
distinct intervals that appear in statistical analysis of data. For
Y = Xθ + σ ε (4)
the problem of fitting a parameter to a model, the accuracy or
precision may be described as a confidence interval, a
prediction interval or a tolerance interval which are quite where Y∈ ℜ
n×1
is the observed vector of responses,
distinct. n×( p +1)
X∈ ℜ is a matrix of known constants based on the p
Information coming from confidence intervals concerns
input variables, θ ∈ℜ
( p +1)×1
how well the best-fit parameter determined by regression has
θ is a vector of unknown
been estimated. If one analyzes many samples from a Gaussian parameters that the least squared will determine and
distribution it is expected that about 95 % of those intervals to ε ∈ ℜn×1 is an unobserved random vector of errors, σ2>0
include the true value of the population best fit parameter. The unknown and with ℜ
k ×l
being the set of k × l matrices. The
important feature is that confidence intervals provide unobserved random vector of errors ε satisfies
information regarding the likely location of the true population
parameter.
Prediction intervals are quite different since they inform E(ε ) = 0, E(ε ε ') = I n (5)
where next data point sampled is expected to be found.
Considering many samples (Gaussian distribution), it is with 0 ∈ℜnx1 a vector of zeros and In= diag(1,1,…) is the unit
expected next value to lie within that prediction interval in 95% matrix. It is a common assumption, when statistical inference is
of the samples. The crucial point is that the prediction interval performed, that errors follow a normal distribution.
refers to the distribution of values, not the uncertainty in
determining the population parameter.
However the richest interval is the tolerance interval. It is ε ~ N(0, I n ) (6)
defined with the help of two different percentages. The first
ratio determines “how sure” it is required the value to be and The algorithm we are a going to follow in the present work
the second percentage expresses what fraction of the values the is based on the concrete method described in [15][16][17][18].
In simple steps it consists of three steps.
124
The first step is to read transmitted data X=T concern time
Y = − 274.4 − 192.8t + (12)
intervals. The time interval that will be analyzed has to be
normalized in the range [-1,1]. Next transmitted data Y that +192.8t 2 + 489t 3 − 155.5t 4 − 364.2t 5
concern PH, Radon, Redox and conductivity are inserted in the
algorithm in the form of matrix. This is the model which according to the proposed
prediction criterion (based on tolerance regions framework)
The second step is to evaluate vector θ with the estimators “best” predicts the future observation, see Fig. 3. However, this
of GLM model of p-th order polynomial. For every p=0 to k is not the model according to the traditional distance RMS
we should evaluate criterion. The latter criterion selects the seventh order
polynomial. Most of the methods which use a distance criterion
θˆ = (T' Τ)-1 T' Υ (7) for choosing the most appropriate polynomial have a tendency
to peak the largest possible order polynomial. It is obvious that
The third step is to evaluate the length Lp at the point to
even for this very simple case the proposed method suggests a
non trivial difference. If the designer of an experiment wills to
peak a polynomial that best fits data, as far prediction is
L p (t o ) = concerned, then the discussed method gives distinctive and
(8)
2t n-p;1-β/2 (n-p)-1/2 s p1/2 {( I-T'op (T'T+T'opTop )-1 Top )-1}1/2 correct results.
The applied algorithm differs nontrivially from all other
where methods that best fit the data with a distance criterion.
Furthermore, it was demonstrated that it suggests in many
cases different models than those derived with the ordinary
s p = RSS = (Y-T θˆ)' (Y-T θˆ) (9) method. (6)
In summary, the numerical study of the methodology
with the point to being the point that gives maximum value as reveals that the selected polynomial model according to the
follows best prediction criterion does not have as inherit property to
peak the largest order polynomial as the best model. This is a
well known problem associated with methods using distance
Max[T'op (T' Τ) -1 Τop ] with − 1 ≤ t ≤ 1 (10) criteria (RMS). Furthermore, the analysis shows that for data
with large dispersion, the proposed method peaks always a
polynomial of different order from this suggested by an RMS
Then, choosing the minimum of the stored maximum criterion.
“lengths”, we find the corresponding p value which is the
degree of the response function for the best predictive model.
In order to compare with the ordinary best fitting model 0
finally we also evaluate RMS for the best fitted polynomial of
p-th order -50
-100
Redox Potential
The best fit model according to the conventional method is the -250
one with the minimum RMS.
-300
For all measured quantities we have applied the method for
evaluating the best for prediction polynomial for several -350
-1.0 - 0.5 0.0 0.5 1.0
different time durations. Results reveal that for many datasets
Normalised Time
the best fitting for prediction polynomial is different from the
one suggested in the ordinary method
As an example we present the best fitting model for the
case of Redox potential for a randomly chosen time interval of Fig. 3. Two days data for Redox Potential ans their best fitting model for
two days. A detailed analysis based on the explained algorithm, prediction.
using as models, up to 7th order polynomials, provides
interesting results. Based on the procedure, the value
min{ max [L p (x)] }= 135 for p = 5 is estimated and IV. COCLUSIONS
hence the corresponding best model for future prediction is the In the present work first results are reported concerning the
fifth order polynomial: integrated measuring station placed at Thermopylae, Greece. In
particular the path loss modeling of the communication unit is
125
analysed and presented in detail as it plays a central role in the [2] J. D. Parsons, The Mobile Radio Propagation Channel. Hoboken, NJ:
whole functioning. Wiley Interscience, 2000.
[3] T. Rappaport, Wireless Communications: Principles & Practice. Upper
For the LOS wireless system a study was performed in Saddle River, NJ: Prentice Hall, 1999.
order to find out the link budget parameterization. The analysis [4] C. Chrysanthou, H.L. Bertoni, “Variability of sector averaged signals for
of the path loss modeling was presented. Path loss modeling in UHF propagation in cities”, in IEEE Transactions on Vehicular
outdoor propagation topologies required, in order to provide Technology, Volume 39, Issue 4, pp. 352–358, November 1990.
reliable predictions of the local mean value of the received [5] V. Erceg, L.J. Greenstein, S.Y. Tjandra, S.R. Parkoff, A. Gupta,
B. Kulic, A.A. Julius, R. Bianchi, “An Empirically Based Path Loss
signal power, knowledge of the geographical irregularities and Model for Wireless Channels in Suburban Environments”,
intrinsic channel characteristics. Two fundamental path loss in IEEE Journal on Selected Areas in Communications, Volume 17,
models were employed in order to predict the local mean value No. 7, July 1999.
for distances beyond 20 km concerning the transmitter-receiver [6] Y. Oda, R. Tsuchihashi, K. Tsunekawa, M. Hata, “Measured path loss
(T-R) separation: the Free Space and the Log-Distance model. and multipath propagation characteristics in UHF and microwave
frequency bands for urban mobile communications” Vehicular
The study of the transmitted measurement data revealed an Technology Conference, 2001. VTC 2001 Spring. IEEE VTS 53rd
affirmative conclusion for using the proposed method in the Volume 1, 6-9 May 2001 pp. 337-341 vol.1.
case under study. There is a strong theoretical background [7] J. Salo, L. Vuokko, H. M. El-Sallabi, and P. Vainikainen, “An additive
[15][16][17][18], that ensures the success of the method to any model as a physical basis for shadow fading”, IEEE Transactions on
Vehicular Technology, vol.56, no.1, pp. 13-26, January 2007.
applied field. It was found that for several datasets the selected
[8] T. Chrysikos and S. Kotsopoulos, “Site-specific validation of the path
polynomial differs from the commonly selected one, if the loss models and large-scale fading characterization of large-scale fading
choice respects the criterion of “the best predictive model”. for a complex urban propagation topology at 2.4 GHz”. The 2013
IAENG International Conference on Communication Systems and
As a result, in cases such as the ones outlined in this study, Applications (IMECS 2013), March 13-15, 2013, Hong Kong.
it is not considered reasonable and correct modeling to find a [9] J. Seybold, Introduction to RF Propagation. Hoboken, NJ: Wiley
curve as closely as possible to the data. Instead, it is desirable Interscience, 2005.
to guarantee that a curve is going to fit to the given set of data [10] W. C. Jakes (Ed.), Microwave mobile communications. New York, NY:
based on a best fit polynomial that is most adequate for the Wiley Interscience, 1974.
prediction of the Y value for a certain X (within the [11] M. Hata, “Empirical Formula for Propagation Loss in Land Mobile
experimental region), establishing a specific degree of high Radio Services”, in IEEE Transactions on Vehicular Technology,
probability. Volume 29, No 3, pp. 317–325, August 1980.
[12] Y. Okumura, E. Ohmori, T. Kawano, K. Fukuda, “Field strength and its
In future, it would also be interesting to generalize the variability in VHF and UHF Land-Mobile radio service”, in Review of
proposed method for problems with multiple independent the Electrical Communication Laboratory, Volume 16, No. 9-10,
variables or for cases like [20], [21]. Another interesting pp. 825–873, September-October 1968.
investigation is to develop a prescription handling extrapolation [13] Maddala, G., ‘Introduction to Econometrics”. 2nd Edn. Macmillan, New
and time series analysis. Extrapolation needs attention for the York, USA. Pages: 663.1992
proposed method since an extension of the [14] Stigler, S. M., Gauss and the Invention of Least Squares. The Annals of
Statistics, Vol.9, No3, 465-474, 1981.
experiment/measurement space is needed.
[15] Kitsos, C.P., An Algorithm for Construct the Best Predictive Model. In:
Softstat’ 93: Advances in Statistical Software, Faulbaum, F. (Eds.).
Stuttgart, New York, pp: 535-539, 1994.
[16] Ellerton, R.R.W., C.P. Kitsos and S. Rinco, Choosing the optimal order
of a response polynomical-structural approach with minimax criterion.
Commun. Stat. Theory Meth., 15: 129-136, 1986.
[17] Muller, C.H. and C.P. Kitsos, Optimal Design Criteria Based on
ACKNOWLEDGMENT Tolerance Regions. In: mODa 7-Advances in Model-Oriented Design
and Analysis, Bucchianico, A., H. Lauter and H.P. Wynn (Eds.).
Authors acknowledge that this work has been financially Physica-Verlag, USA., pp: 107-115, 2004.
supported by the research program, entitled: “Measurement of [18] C.P. Kitsos, Vasilios Zarikas, "On the Best Predictive General Linear
Environment Physical-Chemical Parameters by Development Model for Data Analysis: A Tolerance Region Algorithm for
Autonomous Data Collection Processing Transmission Prediction", Journal of Applied Sciences 01/2013; 13(4):513-524.
Systems with use of green Power and most optimal DOI:10.3923/jas.2013.513.524, 2012
management”, MIS 380360, within the research activity [19] Shao, Jun, Linear Model Selection by Cross-Validation. Journal of the
American Statistical Association, Vol. 88, No. 422: 486-494, 1993.
“Archimedes III”, funded by the NSRF 2007-2013.
[20] Gikas, V. and J. Stratakos, A novel geodetic engineering method for
accurate and automated road/railway centerline geometry extraction
based on the bearing diagram and fractal behavior. IEEE Trans. Intell.
Transp. Syst., 13: 115-126. 2012.
[21] V. Zarikas, V. Gikas, C.P. Kitsos, «Evaluation of the optimal design
“cosinor model” for enhancing the potential of robotic theodolite
REFERENCES kinematic observations» , Measurement, Volume 43, Issue 10,
December 2010, Pages 1416-1424
[1] A. Goldsmith, Wireless Communications. Cambridge: Cambridge
University Press, 2005.
126