Professional Documents
Culture Documents
https://doi.org/10.1007/s10055-020-00468-0
ORIGINAL ARTICLE
Abstract
By now, the realm of virtual reality is abuzz with high-quality visuals, enough to simulate a real-world scene. The use of
intelligence in virtual reality systems, however, is a milestone yet to be achieved to make possible seamless realism in a
virtual environment. This paper presents a model, rational ubiquitous navigation to improve believability of a virtual envi-
ronment. The model intends to augment maturity of a virtual agent by inculcating in it the human-like learning capability.
A novel approach for automated navigation and searching is proposed by incorporating machine learning in virtual reality.
An intelligent virtual agent learns objects of interest along with the paths followed for navigation. A mental map is molded
dynamically as a user navigates in the environment. The map is followed by the agent during self-directed navigation to
access any known object. After reaching at a location where an object of interest resides, the required object is selected on
the basis of front-facet feature. The model is implemented in a case-study project learn objects on path (LOOP). Twelve
users evaluated the model in the immersive maze-like environment of LOOP. Results of the evaluation assure applicability
of the model in various cross-modality applications.
Keywords Machine learning in VR · Automated navigation · Object-based searching · Intelligent virtual reality systems
13
Vol.:(0123456789)
Virtual Reality
frame may contain more than one virtual object, therefore, markers. The system works within a specific lighting condi-
selection of the required object is performed in the second tion and depends on the shape and size of the markers. In
step. The K-nearest neighbors (KNN) classifier is used in another path planning approach for navigation (Li and Ting
the second step to select objects on the basis of front-facet 2000), a suitable path is opted from a number of possible
feature. In real-world scenarios, name of object/place is paths, but a user has to specify check points before starting
important to recognize and remember it. To simulate this in navigation. The path finding model of (Badler et al. 1996)
VR, explicit entry for object name is required once an object suggests intelligence-based animation in a VR setup. Some
is discovered. Similarly, by feeding name, the IVA initiates state-of-the-art research works have successfully mingled
autonomous navigation to find out a discovered object in the algorithms of AI with VR. The behavioral animation
application phase. The proposed model is implemented in a model (Conde et al. 2003) uses ML algorithms to ensure
case-study project, learn objects on path (LOOP). Using the rules inside VR like keeping distance from neighbors and
LOOP project, the model is evaluated in terms of accuracy gaining a specific velocity. Machine vision has also been
and applicability in IVRS. utilized in human–computer interfacing where position, pose
The paper is organized into 5 sections. Related work and actions of users are used for interaction (Hämäläinen
is discussed in Sect. 2. Section 3 elaborates the proposed and Johanna 2002). The KNN model of (Cai et al. 2010) cat-
RUN model. Section 4 covers implementation and evalua- egorizes objects or places into different classes using group
tion details. In the last section, Sect. 5 is about conclusion analysis and pattern recognition.
and future work.
13
Virtual Reality
Virtual brain
Virtual Scene
Distance (D),
M. Map
Vector (V)
SVM Classifier Navigation
DV
OL(x,y,z)
KNN Virtual Eye 0
OOIImg
Classifier OOI?
OL(x,y,z)
User
Name 1
Input
OOIImg
OOIImg
13
Virtual Reality
Distance Vector
M = (N, E) (1)
{ }
N = vi , ..vn (2)
3.1.2 Extraction of OOI
13
Virtual Reality
level T. The constant T is calculated as a mean of the intensi- the OOIs is stored at run time. Once an object is discovered
ties of the background and foreground pixels. by the IVA, the system queries about a string input. In the data
∑n ∑n structure, the string input is saved under the name attribute
Fp(r, c) ∈ Background (Object_Name). To uniquely represent each entry of ON, an
𝜇B = r=1 c=1 (5)
r∗c index (from whole numbers) is assigned dynamically. The data
structure is updated each time an OOI is traced. In the virtual
∑n ∑n
Fp(r, c) ∈ Foreground brain, the binary contour image (OOIImg) representing the front
𝜇F = r=1 c=1
(6) facet of an OOI is saved with the same input name. The ON
r∗c
structure is shown.
𝜇B + 𝜇F
T= (7) Index Object_Name
2
0 Input-String1
. .
The Ω image is obtained as,
. .
{ . .
0 𝜓(x, y) < T
Ω(x, y) = N Input-Stringn
1 Otherwise
From the Ω of the scene, O OIImg with rows ‘m’ and col-
umns ‘n’ is extracted as, 3.1.4 Identification and classification algorithms
(⋃Tm−5 ⋃Lm−5 )
(Ω), (Ω) (8) To learn and recall objects, the virtual brain makes the use
r=Bm + 5 c=Bm + 5
of the two well-known classifiers, KKN and SVM. The
where Tm, Lm, Rm and Bm represent top-most, left-most, KNN classifier is used to recognize OOI based on front-
right-most and bottom-most white pixels of the object’s con- facet shape and is therefore trained by OOIImg. The SVM
tours. In order to fully grasp the image, five extra pixels are classifier deals with 3D location and is trained by the OLs.
extracted at each boundary. The entire process of extracting One unique label per OOI is assigned to represent the image
OOIImg from a frame image ( 𝜓 ) is shown in Fig. 7. class by KNN and its respective 3D position in the SVM.
3.1.3 The ON data structure 3.1.4.1 The SVM classifier SVM is an efficient ML classifier
(Hu et al. 2016) which learns the sets of prototypes based on
Akin to the use of name in real world, to learn and recall a separating hyperplane. The classifier needs to be trained
objects, a unique name is required to reference a discovered with features xi ∈ Rd and class labels yi ∈ Y = {1, 2, … , n} .
OOI. With the ON data structure, the name information about After the completion{(of training,
) ( the classifier
)} predicts a
class from a set S = x1 , y1 … xn , yn . On the basis of
obtained features, the classifier builds an optimal hyperplane
such that to predict a class label class yi |yi ∈ {+1, −1} for
i = {1, 2, … , n} . By design, the classifier predicts a unique
class label yx if most of the unknown features belonging to
yx lies on one side of the hyperplane (Lu and Qihao 2007).
The classifier computes inner product space X ∶ X ⊆ ℝd for
xi ∈ X and yi ∈ Y = {1, 2, .., n} for S of cardinality n . If H
(a) (b) is the prototype space and xi ∈ X an input instance, then the
scoring function f for SVM is given as,
f ∶X×H→ℝ
13
Virtual Reality
{ }
In the case of multiclass SVM, where cardinality n > 2 , C = OOIImg1 , OOIImg2 … OOIImg𝜉 (15)
searching in prototype matrix H ∈ ℝ|𝜌|×d is made for an
instance of( features
) vector xi ∈ 𝜌 . A class label yx is pre- The features of an OOIImg are represented in the form
dicted if 𝜔 xx results into positive prototype 𝜌 for the class of fixed size BT. A BT with rows ‘r’ and columns ‘c’ rep-
yx |𝜌 = {xx ∈ 𝜌 ∶ yix = 1} , where 1 ≥ i ≤ n. resents pixels values of an O OIimg. The BT of an OOI is
As the RUN model deals with learning of multiple 3D stored in a classification file (CF) as given below,
objects, multiclass SVM is used to learn variable number of
OOI . Therefore, the set of classes is given as, ⎛ BT1(0, 0), … .BT1(0, c) ⎞
⎜ BT1(1, 0), … .BT1(1, c) ⎟
{ } CF = ⎜ ⎟ (16)
Y = 1, 2, … , OOIξ (10) ⎜…………….…….……⎟
⎝ BT1(r, 0), … .BT1(r, c) ⎠
where OOI𝜉 is the total number of traced 3D objects. { }ξ
To train the classifier, object names are used as class To query an image{ℚ , }BTxi i=1 ∈ ℚ , the KNN classi-
ξ
labels and 3D positions as features. The dataset D of the fier searches in set ℙ , BTi i=1 ∈ ℙ . The CF having closest
classifier associates the discovered{ object OOIi with resemblance with the query image is returned using the
} its loca-
tion OLi where OLi ∈ R3 for i = 1, 2, … , OOIξ following formula,
{ √
√ ξ
D = (OLi, OOIi)|OLi ∈ R3 }i=1 ξ
OOI
(11) √∑ ( )2
d(ℚ, ℙ) = √ ℚi − ℙi (17)
( ) i=1
A total of OOIξ OOIξ − 1 ∕2 hyperplanes are set for
‘ OOI𝜉 ’ number of classes to separate each class from the
rest of the classes. Each hyperplane of points x is satisfying
the equitation, 3.2 Application phase
w.x + b = 0 (12)
The twofold classification is used to reduce the chances
where ‘ w ’ is the weight vector and ‘ b ’ the slope intercept of of falsehood in searching out an OOI in the VE. After the
the hyperplane. For an input instance of voxel 𝜑i , the deci- exploration of VE, a user needs only to input name of any
sion function f is given as, known object to initiate auto-navigation and to locate the
( ) [ ] object. With the SE module, the string input (name) is
f 𝜑i = arg max wi .𝜑i + bi (13) recursively compared with the Object_Name entry in the
{ } ON data structure, see Fig. 8. On the successful match-
where i = 1, 2, … OOIξ .
ing, the name entry of the desired object is fed to the
SVM classifier. In case of failed matching, the process is
3.1.4.2 The KNN classifier K-nearest neighbors are the sim-
repeated by incrementing the index to check the next entry.
ple learning algorithm with high accuracy rate for image-
The designed VE is deemed as a voxels grid across x-,
based analysis (Gonzalez and Richard 2002; Roberts et al.
y- and z-axis as shown in Fig. 9. To reach the location
2007). As the KNN classifier suits well for pattern recog-
(OL) of an object, the DV structure is followed for self-
nition (Franti et al. 2006) and image analysis (Stork et al.
directed navigation. Current entries of the distance and
2012), the classifier is followed in the proposed model for
vector arrays are followed to travel along a path and to
facet recognition. With KNN, if 𝛽 represents set of features
take turns, respectively. The coordinates of virtual cam-
and 𝛾 the target class then for an unknown 𝛽x KNN follows
era: CC(x, y, z) are changed according to the vector entries
the function h to conditionally predict y.
vi ∈ V to travel distance di ∈ D.
( )
h 𝛽x ∶ 𝛽 → y The self-directed navigation of the IVA is stopped
{ ( ) ( ) ( )} whenever the camera’s coordinates CC(x, y, z) match the
For a set of classes C = c1 x1 , c2 x2 , … , cn xn rep- OL of the target object. The algorithm follows for auto-
resenting the trained dataset, the solution for an instance 𝛽x mated navigation is shown in Fig. 10. To locate a required
is computed by the KNN algorithm as, object in the rendered viewport, classification on the basis
( ) [{ ( ) }] of front-facet features is performed by the KNN classifier.
h 𝛽x = argmaxc∈C y|y ∈ KNN 𝛽x , h(y) = c (14) The OOIImg from the virtual brain is checked with the
counters image ( Ω ) of the last rendered frame. The sec-
In the proposed model, the features data points are the
tion of Ω having maximum similarity with the O OIImg is
binary tuple (BT) representing pixels of the OOIimg. Let C
thus selected.
be the collection of a total of ξ binary images representing
the discovered OOIs, then
13
Virtual Reality
1 0
Panning
about +ve x OL(x)>CC(x)?
Navigation
about +ve z
Set C Files SVM OL 0
Panning
about –ve x Navigation
XXX KNN OOIImg about –ve z
Fig. 8 Graphic representation of the search engine (SE) module Fig. 10 Flowchart of the navigation and panning algorithm
4.1 Interface of the project
13
Virtual Reality
Rout-1
Rout-3 Rout-2
Fig. 12 State machine of the LOOP project objects, each participant had to take at least one turn during
exploration. In application phase, the system was examined
box for name feeding is set cleared. The ON data structure is by randomly feeding names of the known objects. As retrain-
updated by the new entry, while the features, OL and OOIimg ing is required only once (in learning phase), users availed
for the selected OOI are extracted for training. On hitting the option of pressing the Escape key for repeating the pro-
the Escape key, the system switches to application phase. cess of searching different OOI.
At the top left, label for the input box changes to ‘Enter
name of Object to navigate to.’ By pressing the Enter key,
the input string (name) is incrementally compared with the 4.3 Accuracy assessment
entries under the Name field in the ON data structure. The
SE module pursues OL and OOIimg by calling the SVM and With the case-study project, the model was evaluated for a
KNN classifier. Next, the map M is followed for automated total of sixty times. Each time the classifiers were trained
navigation. At the attainment of the object’s location (OL) in and tested by different combinations of the 3D objects. Over-
the VE, navigation is stopped. At the successful classifica- all accuracy rate for automated navigation and searching, as
tion by KNN, the required OOI is highlighted by a rectan- shown in Table 1, was 90.8%. Incorrect navigation or selec-
gle around its edges. State machine of the LOOP project is tion (false positive and false negative) (Najadat et al. 2019)
shown in Fig. 12. was counted as false detection.
For an unknown object/incorrect name, the system
4.2 Evaluation tasks remains in passive mode. In the passive mode no action is
performed in the VE. However, user is informed by dis-
The 3D VE of LOOP application contains fifteen objects of playing a message about entering a valid name. To properly
varying surface attributes like solid-filled, wire-filled and evaluate the use of SVM and KNN classifiers in the VR
textural facets, as shown in Fig. 13. Participants were asked setup, outcomes of the classifiers are analyzed separately by
to select and name any five 3D objects in learning phase. In the confusion matrices, see Figs. 14 and 15.
the designed environment, there are exactly four objects on Three 3D positions were wrongly classified by the
each track (straight, left and right). Therefore, to select five SVM: false negative (FN = 3). Some objects in the
13
Virtual Reality
Predictive
Total
Classification
1 0
Actual 1 57 3 60
Classification 0 0 0 0
Total 57 3 60
Predictive
Total Fig. 17 Result of the four-factor subjective
Classification
1 0
1 52 0 52
Actual 5 Conclusion and future work
0 8 0 8
Classification
Total 60 0 60 AI adds in augmenting 3D interaction (Dobrzański and Rafal
2016) in a VE. With this research work, we propose a model
Fig.15 Confusion matrix of the KNN classifier to enhance interactivity with an IVRS. The propose model
intends to enhance the ability an IVA such that to learn navi-
gational experience from a VR user. As a human learns in
ACVR=0.90 exploring an unknown place, the IVA learns different objects
1 along with different tracks. Once an object is discovered, a
name entry is fed which is used by the IVA as a reference
to trace the object. In application phase, the IVA accesses
APR
any of the known objects by inputting name of the desired
Accuracy
0.5 object.
ARR
Following a mental map of the scene, the IVA performs
auto-navigation to the position of the desired object. The
model was implemented and evaluated to testify its appli-
0 cability in VR systems. Satisfactory accuracy of the LOOP
SVM KNN
project confirms wider applicability of the model. For a nov-
ice user, the system ensures feasible view and visit of objects
Fig.16 Accuracy results of the classifiers
with less chances of disorientation. The model introduces a
path-finding approach that can be followed for simplifying
environment have similar facet features; therefore, the navigational tasks in complete VE such as virtual representa-
KNN classifier falsely identified eight objects: false posi- tion of brain, structure of DNA or galaxies. Moreover, the
tive (FP = 8). model can be extended for the automation of other 3D tasks
The calculated average precision ratio (APR), aver- inside a VE.
age recall ratio (ARR) and average cross-validation ratio Although the model is appropriate for autonomous navi-
(ACVR) for evaluating the classifier response (Caceres gation, front facets of objects should be distinguishable.
2014) are shown in Fig. 16. Accuracy of objects selection depletes with the increase of
similar objects in a single rendered frame. Moreover, the
system does not support jumping or flying of the IVA. An
4.3.1 Subjective analysis additional research is required to overcome the said chal-
lenges. In future, we are determined to enhance the model
After the evaluation session, a three-factor measuring ques- for the emerging augmented and mixed VR setups. The
tionnaire was presented to the participants. The factors authors plan to enhance the model so that the knowledge
assessed were ease of use, suitability in IVRS and natural- of an IVA can be shared with other IVAs. With this an IVA
ism. The users’ response about the three factors is shown will be able to intelligently respond against any query about
in Fig. 17. The postassessment questionnaire is shown in a 3D VE.
Table 2.
13
Virtual Reality
13
Virtual Reality
Pontil M, Verri A (1998) Support vector machines for 3D object recog- virtual reality software and technology, pp 1–7. ACM. https://
nition. IEEE Trans Pattern Anal Mach Intell 20(6):637–646. https doi.org/10.1145/502391.502392
://doi.org/10.1109/34.683777 Stork DG, Duda RO, Hart PE, Stork DG (2012) Pattern classifica-
Raees M, Ullah S, Rahman SU, Rabbi I (2016) Image based recogni- tion. John Wiley & Sons Publication, New Jersey. https://doi.
tion of Pakistan sign language. J Eng Res 4(1):22–41. https://doi. org/10.1007/s00357-007-0015-9
org/10.7603/s40632-016-0002-6 Van L, Jeroen, Anton N (2001) A dialogue agent for navigation support
Rivas E, Koutarou K, Kazuhisa M, Genci C (2015) Image-based navi- in virtual reality. In: CHI’01 Extended abstracts on human factors
gation for the snoweater robot using a low-resolution USB camera. in computing systems, pp 117–118. https: //doi.org/10.1145/63413
Robotics 4(2):120–140. https://doi.org/10.3390/robotics4020120 3.634138
Roberts A, McMillan L, Wang W, Parker J, Rusyn I et al (2007)
Inferring missing genotypes in large snp panels using fast Publisher’s Note Springer Nature remains neutral with regard to
nearest-neighbor searches oversliding windows. Bioinformatics jurisdictional claims in published maps and institutional affiliations.
23(13):i401–i407. https://doi.org/10.1093/bioinformat ics/btm220
Senger S (2005) Visualizing Volumetric Data Sets Using a Wireless
Handheld Computer. Studies in health technology and informat-
ics: 447–450.
Sheridan TB (2000) Interaction, imagination and immersion some
research needs. In: Proceedings of the ACM symposium on
13