3 Cheatsheets: Simple Select MATCH ('Full-Text Query Expression') Group by Insert

3 Cheatsheets
SELECT Index manipulation Data manipulation
Simple select ALTER TABLE index {ADD|DROP} COLUMN column_name INSERT

SELECT * FROM myindex [{INTEGER|INT|BIGINT|FLOAT|BOOL|MULTI|MULTI64|JS INSERT INTO myindex(id,field1,field2,col1,col2)
ON|STRING}] VALUES(1,'title','content',10,100);
MATCH('full-text query expression')
SELECT * FROM myindex WHERE MATCH('find me') Change index structure REPLACE
ATTACH INDEX diskindex TO RTINDEX rtindex REPLACE INTO myindex(id,fiel
GROUP BY
SELECT * FROM myindex WHERE MATCH('...') GROUP move data from a regular disk index to a RT index d1,field2,col1,col2) VALUES
(1,'title','content',10,100);
BY col1; ALTER TABLE delta KILLLIST_TARGET='main'
GROUP n BY Change killlist_target of an index UPDATE
UPDATE myindex SET col2=200 WHERE id=1;
return n rows from each group RELOAD INDEX idx [ FROM '/path/to/index_files' ]
SELECT * FROM myindex WHERE MATCH('...') GROUP Rotate specific index DELETE
DELETE FROM myindex WHERE id=1;
5 BY col1; RELOAD INDEXES
WITHIN GROUP ORDER BY Rotate indexes
non-standard SQL, allows ordering rows inside a group
TRUNCATE RTINDEX rtindex [WITH RECONFIGURE]
SELECT * FROM myindex WHERE MATCH('...') GROUP Truncate RealTime data Special statements
by col1 WITHIN GROUP BY ORDER by col2 DESC OPTIMIZE INDEX index_name
order by col3 ASC Optimize a RealTime index
CALL PQ(data, index[, opt_value AS opt_name[,
HAVING {DESC | DESCRIBE} index [ LIKE pattern ]
...]])
SELECT * FROM myindex WHERE MATCH('...') GROUP Lists index columns and their associated types
BY col1 HAVING col2>10 performs percolate query
{DESC | DESCRIBE} index TABLE
CALL {SUGGEST|QSUGGEST}(word, index [,options])
ORDER BY Lists expected document schema (PQ only)
SELECT * FROM myindex WHERE MATCH('...') ORDER performs word correction
FLUSH RTINDEX rtindex
BY WEIGHT() DESC, col1 ASC CALL KEYWORDS(text, index [, options])
Flushes RT index RAM chunk contents to disk
tokenization tester
LIMIT FLUSH RAMCHUNK rtindex
SELECT * FROM myindex LIMIT 10,20 CALL SNIPPETS(data, index, query[, opt_value AS
Create new disk chunk from RT index RAM chunk
opt_name[, ...]])
FACET {expr_list} [BY {expr_list}] [ORDER BY {expr |
FACET()} {ASC | DESC}] [LIMIT [offset,] count] highlighting tool
returns additional result sets by using grouping on the
matched and filtered result set.
DBA
SELECT * FROM test FACET brand_name BY brand_id
ORDER BY brand_name ASC FACET property;
Special SELECT functions
OPTION SHOW STATUS [ LIKE pattern ]

fine tunning query settings Display daemon performance counters GEODIST(lat1, lon1, lat2, lon2, { option=value,
SELECT * FROM myindex LIMIT 10,20 OPTION SHOW AGENT ['agent'|'index'] STATUS [ LIKE pattern ... })
opt1=value1,opt2=value2 ] Calculate geodistance between 2 sets of coordinates
Displays the statistic of remote agents or distributed SNIPPET(('documents'), 'my.query', 'limit=100')
index wrapper for snippets functionality, similar with CALL
SHOW INDEX index_name STATUS SNIPPETS statement
Query additional information and profiling Displays various per-index statistics EXIST ( "attr-name", default-value )
SHOW INDEX index_name[.N | CHUNK N] SETTINGS replaces non-existent columns with default values
Displays per-index settings in a sphinx.conf compliant WEIGHT()
SHOW META [ LIKE pattern ]
file format. It can be also specified a particular chunk returns relevancy score. Note that relevancy score
shows additional meta-information about the latest number in case of a RT index.
query because of it's nature is relative to the query.
SHOW TABLES [ LIKE pattern ]
SHOW WARNINGS GROUP_CONCAT()
Display all active indexes produces a comma-separated list of the attribute
retrieve the warning produced by the latest query
SET [GLOBAL] server_variable_name = value values of all documents in the group
SHOW PLAN
SET a global variable REMOVE_REPEATS ( result_set, column, offset, limit
displays the execution plan of the previous SELECT
statement (requires SET profiling=1) SHOW THREADS [ OPTION )
columns=width|format=sphinxql ] removes repeated adjusted rows with the same
SHOW PROFILE ‘column’ value.
List active client threads
detailed execution profile of the previous SQL ZONESPANLIST()
statement (requires SET profiling=1) SHOW [{GLOBAL | SESSION}] VARIABLES [WHERE
Returns pairs of matched zone spans in case
variable_name='xxx']
ZONESPAN match operator is used.
Display global variables
SELECT * FROM (SELECT … ORDER BY cond1 LIMIT X)
FLUSH HOSTNAMES
ORDER BY cond2 LIMIT Y
Connecting using SphinxQL interface Renew IPs associates to agent host names Manticore allows limited subselects. The outer select
FLUSH LOGS can only have ordering and limit.
Initiate reopen of log files
$ mysql -P9306 -h0

or FLUSH ATTRIBUTES
Flushes all in-memory attribute updates Replication commands
$ mysql -P9306 -h127.0.0.1
retrieve the warning produced by the latest query SHOW PLUGINS
Displays all the loaded plugins and UDFs
No credentials are needed. Database selection is not
needed, USE dbname; is a dummy command in CREATE FUNCTION udf_name RETURNS {INT | INTEGER | CREATE CLUSTER name ['/path/' as path
Manticore. BIGINT | FLOAT | STRING} SONAME 'udf_lib_file' [,'ip1:port1,ip2:port2' as nodes]]
Installs a user-defined function (UDF) with the given Initialize a cluster
SphinxQL use the MySQL as transport protocol, but
doesn't implement the full set of MySQL commands. name and type from the given library file. The library JOIN CLUSTER name AT 'ip1:port1'
file must reside in a trusted plugin_dir directory.
Join a cluster by connecting to one of it's nodes
CREATE PLUGIN plugin_name TYPE 'plugin_type'
JOIN CLUSTER name 'ip1:port1,ip2:port2; as
SONAME 'plugin_library'
nodes,'/path/to' as path
Loads the given library (if it is not loaded yet) and
loads the specified plugin from it. Join a cluster by explicit define cluster nodes and
custom path to metadata files
DROP FUNCTION udf_name
ALTER CLUSTER name UPDATE nodes
DROP FUNCTION statement deinstalls a user-defined
function (UDF) with the given name. Issue updating the list of active nodes across the
cluster
DROP PLUGIN plugin_name TYPE 'plugin_type'
DELETE CLUSTER name
Markes the specified plugin for unloading.
Delete a cluster (all nodes are released)
ALTER CLUSTER name ADD|DROP index
Add or remove an index from a cluster
SET CLUSTER name GLOBAL 'setting'='value'
Set a galera-cluster setting
© 2017-2019 Manticore Search, version v3

https://github.com/manticoresoftware/manticoresearch Interested on getting help?
Check our support plans at
https://manticoresearch.com/professional-support/
contact@manticoresearch.com
3 Cheatsheets

Relevancy rankers (for OPTION ranker=xyz) Full-text search operators Numeric functions
proximity_bm25 operator AND ABS()

sum(lcs*user_weight)1000+bm25 always implicit Returns the absolute value of the argument.
bm25 Hello World BITDOT()
bm25 BITDOT(mask, w0, w1, …) returns the sum of products
operator OR of an each bit of a mask multiplied with its weight.
none has higher precende than AND
1 CEIL()
Hello | World Returns the smallest integer value greater or equal to
wordcount the argument.
sum(hit_count*user_weight) operator MAYBE
similar to OR, but doesn't return only right subtree CONTAINS(polygon, x, y)
proximity expression checks whether the (x,y) point is within the given
sum(lcs*user_weight) polygon, and returns 1 if true, or 0 if false. The polygon
Hello MAYBE World
matchany has to be specified using either the POLY2D() function
sum((word_count+(lcs-1)max_lcs)user_weight) field search operator or the GEOPOLY2D() function.
limit subsequent search to a given field
fieldmask COS()
field_mask @title hello @body world Returns the cosine of the argument.
sph04 multiple-field search operator DOUBLE()

sum((4*lcs+2(min_hit_pos==1)+exact_hit)*user_weight) @(title,body) hello world Forcibly promotes given argument to floating point
*1000+bm25 type.
ignore field search operator
expr @!title hello world EXP()
user custom ranking formula Returns the exponent of the argument
ignore non-existing field name( specify at the
export beginning of the query) FIBONACCI(N)
same as expr, but calculates all field level factors @@relaxed @nosuchfield my query Returns the N-th Fibonacci number, where N is the
available and outputs with RANKFACTORS() integer argument.
all-field search operator (implicit)
@* hello FLOOR()
Returns the largest integer value lesser or equal to the
phrase search operator argument.
"hello world"
Date and time functions GEOPOLY2D(x1,y1,x2,y2,x3,y3…)
proximity search operator roduces a polygon to be used with the CONTAINS()
"hello world"~10 function. This function takes into account the Earth’s
curvature by tessellating the polygon into smaller
DAY() quorum matching operator ones.The function expects coordinates to be in
Returns the integer day of month (in 1..31 range) from "the world is a wonderful place"/3 degrees
a timestamp argument, according to the current strict order operator
timezone.. IDIV()
aaa << bbb << ccc Returns the result of an integer division of the first
MONTH() exact form modifier argument by the second argument.
Returns the integer month (in 1..12 range) from a raining =cats and =dogs
timestamp argument, according to the current LN()
timezone.. field-start and field-end modifier Returns the natural logarithm of the argument
^hello world$ LOG10()
NOW()
Returns the current timestamp as an INTEGER.. keyword IDF boost modifier Returns the common logarithm of the argument .
boosted^1.234 boostedfieldend$^1.234 LOG2()
YEAR()
Returns the integer year (in 1969..2038 range) from a NEAR, generalized proximity operator: Returns the binary logarithm of the argument.
timestamp argument, according to the current hello NEAR/3 world NEAR/4 "my test"
timezone. MAX()
NOTNEAR, negative assertion operator Returns the bigger of two arguments..
YEARMONTH() Church NOTNEAR/3 street
Returns the integer year and month code (in MIN()
196912..203801 range) from a timestamp argument, SENTENCE operator: Returns the smaller of two arguments
according to the current timezone. all SENTENCE words SENTENCE "in one sentence" POLY2D(x1,y1,x2,y2,x3,y3…)
YEARMONTHDAY() PARAGRAPH operator produces a polygon to be used with the CONTAINS()
Returns the integer year, month, and date code (in "Bill Gates" PARAGRAPH "Steve Jobs" function. .
19691231..20380119 range) from a timestamp POW()
argument, according to the current timezone. ZONE limit operator
ZONE:(h3,h4) only in the titles Returns the first argument raised to the power of the
SECOND() second argument.
Returns the integer second (in 0..59 range) from a ZONESPAN limit operator
ZONESPAN:(h2 only in a (single) title) SIN()
timestamp argument, according to the current Returns the sine of the argument.
timezone.
SQRT()
MINUTE() Returns the square root of the argument..
Returns the integer minute (in 0..59 range) from a
timestamp argument, according to the current Comparison functions UINT()
timezone. Forcibly reinterprets given argument to 64-bit
unsigned type..
HOUR()
Returns the integer hour (in 0..23 range) from a IF()
timestamp argument, according to the current It takes 3 arguments, check whether the 1st argument
timezone. is equal to 0.0, returns the 2nd argument if it is not
zero, or the 3rd one when it is. Note that unlike String functions
comparison operators, IF() does not use a threshold

IN(expr,val1,val2,…)
takes 2 or more arguments, and returns 1 if 1st CONCAT()
argument (expr) is equal to any of the other arguments concatenates two or more strings into one. Non-string
(val1..valN), or 0 otherwise. arguments must be explicitly converted to string using
INTERVAL(expr,point1,point2,point3,…) TO_STRING() function
takes 2 or more arguments, and returns the index of SUBSTRING_INDEX(string,delimiter,number)
the argument that is less than the first argumen returns a substring of a string before a specified
number of delimiter occurs
REGEX(attr,expr)
perform regular expression matching on attribute
value

3 Cheatsheets
CALL PQ options CALL SNIPPETS options Miscellaneous functions
docs before_match ALL()

provide per query documents matched at result set( A string to insert before a keyword match. A ALL(cond FOR var IN json.array) - applies to JSON
default 0 - disabled) %PASSAGE_ID% macro can be used in this string. arrays and returns 1 if condition is true for all elements
in array and 0 otherwise.
docs_id after_match
use a property from JSON doc to be used as A string to insert after a keyword match. ALL(mva) - special constructor for multi value
document id( default disabled, only to be used with attributes. When used in conjunction with comparison
docs=1) chunk_separator operators it returns 1 if all values compared are found
A string to insert between snippet chunks (passages). among the MVA values.
docs_json Default is …
input documents are raw strings (0) or JSON objects (1 ANY()
- default) limit ANY(cond FOR var IN json.array) - works similar to
Maximum snippet size, in symbols (codepoints). ALL() except for it returns 1 if condition is true for any
CALL PQ ('index_name', ('multiple documents', Integer, default is 256. element in array.
'go this way'), 0 as docs_json );
around ANY(mva) is a special constructor for multi value
How much words to pick around each matching attributes. When used in conjunction with comparison
CALL PQ ('pq', ('{"title":"header text",
keywords block. Integer, default is 5. operators it returns 1 if any of the values compared are
"body":"post context", "timestamp":11
exact_phrase found among the MVA values.
}','{"title":"short post", "counter":7 }') );
Whether to highlight exact query phrase matches only ATAN2()
mode instead of individual keywords. Boolean, default is false. Returns the arctangent function of two arguments,
how to distribute docs about members of a dist index. use_boundaries expressed in radians.
Possible values: 'sparsed' (default) and 'sharded' Whether to additionally break passages by phrase CRC32()
query boundary characters. Boolean, default is false. Returns the CRC32 value of a string argument
provide all query fields stored, such as query, tags, weight_order
filters (default 0 - disable) GREATEST()
Whether to sort the extracted passages in order of takes JSON array or MVA as the argument, and returns
skip_bad_json relevance or in order of appearance in the document. the greatest value in that array.
in case of bad json document skip it and continue (1) Boolean, default is false.
or end with error (0,default) INDEXOF(cond FOR var IN json.array)
query_mode iterates through all elements in array and returns index
skip_empty Whether to handle $words as a query in extended of first element for which ‘cond’ is true and -1 if ‘cond’
in case of empty json document silent ignore and syntax, or as a bag of words (default behavior). is false for every element in array.
continue (1) or treat depending on skip_bad_json (0, force_all_words
default) LEAST()
Ignores the snippet length limit until it includes all the takes JSON array or MVA as the argument, and returns
shift keywords. Boolean, default is false. the least value in that array.
a number to be added to doc ids if no 'docs_id' limit_passages
defined.Only for dist. setups LENGTH()
Limits the maximum number of passages that can be LENGTH(attr_mva) function returns amount of
verbose included into the snippet. Integer, default is 0 (no limit). elements in MVA set.
provide extended info on matching at SHOW META limit_words
(default 0 - disabled) LENGTH(attr_json) returns length of a field in JSON.
Limits the maximum number of words that can be Return value depends on type of a field.
included into the snippet.
MIN_TOP_SORTVAL()
start_passage_id Returns sort key value of the worst found element in
Specifies the starting value of %PASSAGE_ID% macro the current top-N matches if sort key is float and 0
CALL SUGGEST options (that gets detected and expanded in before_match, otherwise.
after_match strings). Integer, default is 1
MIN_TOP_WEIGHT()
load_files Returns weight of the worst found element in the
limit Whether to handle $docs as data to extract snippets current top-N matches.
returned N top matches, default is 5 from (default behavior), or to treat it as file names, and
load data from specified files on the server side. PACKEDFACTORS()
max_edits can be used in queries, either to just see all the
keep only dictionary words which Levenshtein distance load_files_scattered weighting factors calculated when doing the matching,
is less or equal, default is 4 For distributed snippets generation with remote agents or to provide a binary attribute that can be used to
and load_files. Whenever to be be sure that all write a custom ranking UDF. works only with expression
result_stats snippets are actually created.Boolean, default is false. rankers
provide Levenshtein distance and document count of
the found words, default is 1 (enabled) html_strip_mode REMAP(condition, expression, (cond1, cond2, …),
HTML stripping mode setting. Defaults to index, which (expr1, expr2, …))
delta_len means that index settings will be used. The other allows to make some exceptions of an expression
keep only dictionary words whose length difference is values are none,strip and retain. values depending on condition values. Condition
less, default is 3 expression should always result integer, expression can
allow_empty
max_matches Allows empty string to be returned as highlighting result in integer or float.
number of matches to keep, default is 25 result when a snippet could not be generated.By
default, the beginning of original text would be
reject returned instead of an empty string. Boolean, default is
defaults to 4; rejected words are matches that are not false.
better than those already in the match queue GEODIST(lat1, lon1, lat2, lon2, { option=value, ... })
passage_boundary
result_line Ensures that passages do not cross a sentence,
alternate mode to display the data by returning all paragraph, or zone boundary.String, allowed values
suggests, distances and docs each per one row, default are sentence, paragraph, and zone. in = {deg | degrees | rad | radians}
is 0 specifies the input units
emit_zones
non_char Emits an HTML tag with an enclosing zone name out = {m | meters | km | kilometers | ft | feet | mi |
do not skip dictionary words with non alphabet before each passage. Boolean, default is false. miles}
symbols, default is 0 (skip such words) specifies the output units;
method = {adaptive | haversine}
specifies the geodistance calculation method. Default
Type conversion functions is 'adaptive', more precise and faster than 'haversine'
CALL KEYWORDS options
stats BIGINT()
show statistics of keywords, default is 0 Forcibly promotes the integer argument to 64-bit type,
and does nothing on floating point argument.
fold_wildcards
fold wildcards, default is 1 INTEGER()
Forcibly promotes given argument to 64-bit signed
fold_lemmas type.
fold morphological lemmas, default is 0
SINT()
fold_blended Forcibly reinterprets its 32-bit unsigned integer
fold blended words, default is 0 argument as signed, and also expands it to 64-bit type
(because 32-bit type is unsigned).
expansion_limit
override expansion_limit defined in configuration,
default is 0 (use value from configuration)
sort_mode
sort output by 'docs' or 'hits', default no sorting

3 Cheatsheets

Specific RealTime index settings Index common settings Index common settings
rt_mem_limit type index_zones

RAM chunk size limit. Optional, default is 128M Index type. Known values are plain, distributed, rt, A list of in-field HTML/XML zones to index. Optional,
template and percolate. Optional, default is ‘plain’ default is empty (do not index zones).
rt_field
Full-text field. Multi-value, mandatory path max_substring_len
Index files path and file name (without extension). Maximum substring (either prefix or infix) length to
rt_attr_bigint Mandatory for plain,rt and percolate. index. Optional, default is 0 (do not limit indexed
BigInt attribute. Multi-value, optional substrings). Applies to dict=crc only.
bigram_freq_words
rt_attr_bool List of bigram words min_infix_len
Boolean attribute. Multi-value, mandatory Minimum infix prefix length to index and search.
bigram_index Optional, default is 0 (do not index infixes), and
rt_attr_float bigram index mode, can be all, first_freq or both_freq
Float attribute. Multi-value, optional minimum allowed non-zero value is 2.
blend_chars min_prefix_len
rt_attr_json Blended characters list
JSON attribute. Multi-value, optional Minimum word prefix length to index. Optional, default
blend_mode is 0 (do not index prefixes).
rt_attr_multi Blended tokens indexing mode, can be trim_none |
MVA attribute. Multi-value, optional min_stemming_len
trim_head | trim_tail | trim_both | skip_pure Minimum word length at which to enable stemming.
rt_attr_multi_64 charset_table Optional, default is 1 (stem everything).
bigInt MVA attribute. Multi-value, optional Accepted characters table, with case folding rules. min_word_len
rt_attr_string Optional, default value are latin and cyrillic characters. Minimum indexed word length. Optional, default is 1
String attribute. Multi-value, optional embedded_limit (index everything).
rt_attr_uint Embedded exceptions, wordforms, or stopwords file morphology
Unsigned integer attribute. Multi-value, optional size limit. Optional, default is 16K. A list of morphology preprocessors (stemmers or
exceptions lemmatizers) to apply. Optional, default is empty (do
Tokenizing exceptions file. Optional, default is empty. not apply any preprocessor).

expand_keywords morphology_skip_fields
Specific Distributed index settings Expand keywords with exact forms and/or stars when A list of fields there morphology preprocessors do not
possible. apply. Optional, default is empty (apply preprocessors
to all fields).
global_idf
The path to a file with global (cluster-wide) keyword ngram_chars
agent_blackhole N-gram characters list. Optional, default is empty.
Remote blackhole agent declaration in the distributed IDFs. Optional, default is empty (use local IDFs).
index. Multi-value, optional, default is empty. ngram_len
hitless_words
Hitless words list. Optional, allowed values are ‘all’, or a N-gram lengths for N-gram indexing. Optional, default
agent_connect_timeout is 0 (disable n-gram indexing). Known values are 0 and
Remote agent connection timeout, in milliseconds. list file name.
1 (other lengths to be implemented).
Optional, default is 1000 (ie. 1 second).
html_index_attrs
A list of markup attributes to index when stripping access_blob_attrs
agent_persistent Control how var-length attributes (string,json,mva) are
Persistently connected remote agent declaration. HTML. Optional, default is empty (do not index
markup attributes). accessed, pre-read and loaded into memory. Possible
Multi-value, optional, default is empty. values mmap, mmap_preread , mlock
agent_query_timeout html_remove_elements
A list of HTML elements for which to strip contents access_plain_attrs
Remote agent query timeout, in milliseconds. Optional, Control how numeric attributes are accessed, pre-read
default is 3000 (ie. 3 seconds). along with the elements themselves. Optional, default
is empty string (do not strip contents of any elements). and loaded into memory. Possible values mmap,
agent_retry_count/mirror_retry_count mmap_preread , mlock
Integer, specifies how many times manticore will try to html_strip
Whether to strip HTML markup from incoming full-text access_doclists
connect and query remote agents before report fatal Control how doclists file is accessed. Possible values
query error. data. Optional, default is 0. Known values are 0
(disable stripping) and 1 (enable stripping). file,mmap
agent access_hitlists
Remote agent declaration in the distributed index. ignore_chars
Ignored characters list. Optional, default is empty. Control how hitlists file is accessed. Possible values
Multi-value, optional, default is empty. file,mmap
ha_strategy index_exact_words
Whether to index the original keywords along with the overshort_step
Agent mirror selection strategy, for load balancing. Position increment on overshort (less that
Possible values: stemmed/remapped versions. Optional, default is 0
(do not index). min_word_len) keywords. Optional, allowed values are
random(default),roundrobin,nodeads,noerrors 0 and 1, default is 1.
local index_field_lengths
Enables computing and storing of field lengths (both phrase_boundary
Local index declaration in the distributed index. Multi- Phrase boundary characters list. Optional, default is
value, optional, default is empty. per-document and average per-index values) into the
index. Optional, default is 0 (do not compute and empty.
store). phrase_boundary_step
Phrase boundary word position increment. Optional,
index_sp
Whether to detect and index sentence and paragraph default is 0.
Specific Plain index settings
boundaries. Optional, default is 0 (do not detect and preopen
index). Whether to pre-open all index files, or open them per
stored_fields each query. Optional, default is 0 (do not preopen).
dict A list of fields to be stored in the index.
keywords dictionary type, only for plain index, can be regexp_filter
keywords(default) or crc(deprecated) docstore_block_size Regular expressions (regexps) to filter the fields and
The size of the block of documents used by document queries with. Optional, multi-value, default is an empty
killlist_target storage. Optional, default is 16kb. list of regexps.
List of indexes to have kill-list applied. It can operate in
3 modes: index:k1 (kill-list is used), index:id (document docstore_compression rlp_context
ids are used), index (both kill-list and document ids are Compression used to compress blocks of documents in RLP context configuration file. Mandatory if RLP is
used) document storage.Values:'lz4', 'lz4hc' and 'none'. used.
Optional, default is ‘lz4’. stopwords
infix_fields
The list of full-text fields to limit infix indexing to. docstore_compression_level Stopword files list (space separated). Optional, default
Applies to dict=crc only. Compression level when lz4hc is used.Values between is empty.
1-12, default 9. stopword_step
inplace_enable
Whether to enable in-place index inversion. Optional, Position increment on stopwords. Optional, allowed
default is 0 (use separate temporary files). values are 0 and 1, default is 1.
inplace_hit_gap stopwords_unstemmed
Controls preallocated hitlist gap size. Optional, default Whether to apply stopwords before or after stemming.
is 0. Optional, default is 0 (apply stopword filter after
stemming).
inplace_reloc_factor
Controls relocation buffer size within indexing memory wordforms
arena. Optional, default is 0.1. Word forms dictionary. Optional, default is empty.
inplace_write_factor
Controls in-place write buffer size within indexing
memory arena. Optional, default is 0.1.
prefix_fields
The list of full-text fields to limit prefix indexing to.
Applies to dict=crc only.
source
Adds document source to local index. Multi-value,
mandatory.

3 Cheatsheets

searchd directives searchd settings (continue) searchd settings (continue)
agent_connect_timeout net_workers rt_merge_maxiosize

Instance-wide defaults for agent_connect_timeout Number of network threads for workers=thread_pool A maximum size of an I/O operation that the RT
parameter. mode, default is 1. chunks merge thread is allowed to start. Optional,
default is 0 (no limit).
agent_query_timeout net_wait_tm
Instance-wide defaults for agent_query_timeout Control busy loop interval of a network thread for snippets_file_prefix
parameter workers=thread_pool mode, default is 1, might be set A prefix to prepend to the local file names when
to -1, 0, positive integer. generating snippets. Optional, default is empty.
agent_retry_count
Integer, specifies how many times manticore will try to net_throttle_accept net_throttle_action sphinxql_state
connect and query remote agents in distributed index Control network thread for workers=thread_pool Path to a file where current SphinxQL state will be
before reporting fatal query error. Default is 0 (i.e. no mode, default is 0. serialized.
retries).
access_blob_attrs sphinxql_timeout
agent_retry_delay Control how var-length attributes (string,json,mva) are Maximum time to wait between requests (in seconds)
Integer, in milliseconds. Specifies the delay sphinx rest accessed, pre-read and loaded into memory. System- when using sphinxql interface. Optional, default is 15
before retrying to query a remote agent in case it fails. wide setting. Possible values mmap, mmap_preread , minutes.
Default is 500 mlock
subtree_docs_cache
attr_flush_period access_plain_attrs Max common subtree document cache size, per-query.
Interval in seconds between flushes to disk of updated Control how numeric attributes are accessed, pre-read Optional, default is 0 (disabled).
attributes data. Default is 0, flushing only occurs on and loaded into memory. System-wide setting.
daemon shutdown. Possible values mmap, mmap_preread , mlock subtree_hits_cache
Max common subtree hit cache size, per-query.
binlog_flush access_doclists Optional, default is 0 (disabled).
Binary log transaction flush/sync mode. Optional, Control how doclists file is accessed. System-wide
default is 2 setting. Possible values file,mmap thread_stack
Per-thread stack size. Optional, default is 1M.
0, flush and sync every second. access_hitlists
Control how hitlists file is accessed. System-wide unlink_old
1, flush and sync every transaction. setting. Possible values file,mmap Whether to unlink .old index copies on successful
rotation. Optional, default is 1 (do unlink).
2, flush every transaction, sync every second. persistent_connections_limit
The maximum # of simultaneous persistent watchdog
binlog_max_log_size Threaded server watchdog. Optional, default is 1
Maximum binary log file size. Optional, default is 0 (do connections to remote persistent agents.
(watchdog enabled).
not reopen binlog file based on size). pid_file
searchd process ID file name. Mandatory. workers
binlog_path Multi-processing mode (MPM). Optional; allowed
Binary log (aka transaction log) files path. Optional, predicted_time_costs values are thread_pool, and threads. Default is
default is build-time configured data directory. Costs for the query time prediction model, in thread_pool.
client_timeout nanoseconds. Optional, default is “doc=64, hit=48,
skip=2048, match=64” (without the quotes). node_address
Maximum time to wait between requests (in seconds) Specify network address of a node (used in replication
when using persistent connections. Optional, default is preopen_indexes clusters)
five minutes. Whether to forcibly preopen all indexes on startup.
Optional, default is 1 (preopen everything). server_id
collation_libc_locale unique server identification (used in replication
Server libc locale. qcache_max_bytes clusters)
collation_server Integer, in bytes. The maximum RAM allocated for
cached result sets. Default is 0, meaning disabled. mysql_version_string
Default server collation. Optional, default is libc_ci. A server version string to return via MySQL protocol.
dist_threads qcache_thresh_msec Optional, default is empty (return Manticore version).
Max local worker threads to use for parallelizable Integer, in milliseconds. The minimum wall time
threshold for a query result to be cached. Defaults to seamless_rotate
requests. Optional, default is 0, which means to disable Prevents searchd stalls while rotating indexes with
in-request parallelism. 3000, or 3 seconds. 0 means cache everything.
huge amounts of data to precache. Optional, default is
docstore_cache_size qcache_ttl_sec 1 (enable seamless rotation). On Windows systems
Maximum size of document blocks from document Integer, in seconds. The expiration period for a cached seamless rotation is disabled by default.
storage that are held in memory. result set. Defaults to 60, or 1 minute. The minimum
possible value is 1 second. shutdown_timeout
expansion_limit searchd –stopwait wait time, in seconds. Optional,
The maximum number of expanded keywords for a query_log_format default is 3 seconds.
single wildcard. Optional, default is 0 (no limit). Query log format. Optional, allowed values are ‘plain’
and ‘sphinxql’, default is ‘plain’.
grouping_in_utc
Specifies whether timed grouping in API and SphinxQL query_log_min_msec
will be calculated in local timezone, or in UTC. Limit (in milliseconds) that prevents the query from indexer settings
Optional, default is 0 (means ‘local tz’). being written to the query log. Optional, default is 0
(all queries are written to the query log).
ha_period_karma
Agent mirror statistics window size, in seconds. query_log lemmatizer_cache
Optional, default is 60. Query log file name. Optional, default is empty (do not Lemmatizer cache size. Optional, default is 256K.
log queries).
ha_ping_interval max_file_field_buffer
Interval between agent mirror pings, in milliseconds. query_log_mode Maximum file field adaptive buffer size, bytes.
Optional, default is 1000. Permission for log files, default is 600. Optional, default is 8 MB, minimum is 1 MB.
hostname_lookup queue_max_length max_iops
Hostnames renew strategy. Default is to cache at Maximum pending queries queue length for Maximum I/O operations per second, for I/O throttling.
daemon start. Setting this option to ‘request’ disabled workers=thread_pool mode, default is 0 (unlimited). Optional, default is 0 (unlimited).
the caching and queries the DNS at each query. read_buffer max_iosize
listen_backlog Per-keyword read buffer size. Optional, default is 256K. Maximum allowed I/O operation size, in bytes, for I/O
TCP listen backlog. Optional, default is 5. (Windows read_timeout throttling. Optional, default is 0 (unlimited).
only) Network client request read timeout, in seconds. max_xmlpipe2_field
listen Optional, default is 5 seconds. Maximum allowed field size for XMLpipe2 source type,
This setting lets you specify IP address and port, or query_log bytes. Optional, default is 2 MB.
Unix-domain socket path, that searchd will listen on. Query log file name. Optional, default is empty (do not mem_limit
( address ":" port | port | path ) [ ":"
log queries). Indexing RAM usage limit. Optional, default is 128M.
protocol ] [ "_vip" ] query_log_mode on_file_field_error
listen_tfo Permission for log files, default is 600. How to handle IO errors in file fields. Optional, default
Allows using TCP_FASTOPEN on all listeners. Default is queue_max_length is ignore_field.
enabled (1) Maximum pending queries queue length for write_buffer
log workers=thread_pool mode, default is 0 (unlimited). Write buffer size, bytes. Optional, default is 1 MB.
Log file name. Optional, default is ‘searchd.log’. All read_buffer
searchd run time events will be logged in this file. Per-keyword read buffer size. Optional, default is 256K.
max_batch_queries read_timeout
Limits the amount of queries per batch. Optional, Network client request read timeout, in seconds.
default is 32. Optional, default is 5 seconds.
max_children read_unhinted
Maximum amount of worker threads (or in other Unhinted read size. Optional, default is 32K.
words, concurrent queries to run in parallel). Optional,
default is 0 (unlimited) in workers=threads, or 1.5 times rt_flush_period
the CPU cores count in workers=thread_pool mode RT indexes RAM chunk flush check period, in seconds.
Optional, default is 10 hours.
max_filters
Maximum allowed per-query filter count. Optional, rt_merge_iops
default is 256. A maximum number of I/O operations (per second)
that the RT chunks merge thread is allowed to start.
max_filter_values Optional, default is 0 (no limit).
Maximum allowed per-filter values count. Optional,
default is 4096.
max_packet_size
https://github.com/manticoresoftware/manticoresearch
Maximum allowed network packet size. Optional,
Interested on getting help?
default is 8M. Check our support plans at
attr_update_reserve https://manticoresearch.com/professional-support/
Reserved space for blob attribute updates. Optional,
default is 128K. contact@manticoresearch.com
3 Cheatsheets

common settings Common Searchd daemon usage Common Indexer daemon usage
lemmatizer_base It is recommended to install and run the searchd as a Indexer should run under the same user the searchd runs.
Lemmatizer dictionaries base path. service to ensure in case of reboots, searchd will start at This is to make sure searchd can rotate indexes created by
boot. indexer.
progressive_merge
Merge Real-Time index chunks during OPTIMIZE
operation from smaller to bigger. Enabled by default. If searchd should run under a specific user. Linux packages $ indexer --config /path/to/sphinx.conf index --
disabled, chunks are merged from first to last created. create specific user (manticore or manticoresearch), as well rotate
as they install a service file The most common use of indexer is to issue an
json_autoconv_keynames indexing followed by a rotation, while searchd is
Whether and how to auto-convert key names within $ service manticore start
running
JSON attributes. Known value is ‘lowercase’. Optional,
default value is unspecified (do not convert anything).
$ service manticore stop Is searchd is not running, parameter --rotate should be
omitted. The new index version will replace existing
json_autoconv_numbers or one at searchd startup.
Automatically detect and convert possible JSON $ systemctl start manticore
strings that represent numbers, into numeric attributes. $ systemctl stop manticore Indexer can be used to extract a dictionary of most
Optional, default value is 0 (do not convert strings into common words from an index, using --buildstops. --
numbers). buildfreqs provides additionally the quantity presence.
Searchd can also be run manually.Here are several
on_json_attr_error common commands used:
What to do if JSON format errors are found. Optional, $ indexer myindex --buildstops word_freq.txt 1000
default value is ignore_attr (ignore errors). Applies only $ searchd --buildfreqs
to sql_attr_json attributes. start searchd using a defined configuration file. It start
plugin_dir daemonized, except Windows. Indexer can merge two indexes into one. This can be used
Trusted location for the dynamic libraries (UDFs). $ searchd --stopwait
in case of main+delta setups
Optional, default is empty (no location). Synced stop
rlp_environment $ indexer --merge main delta --rotate
--config /path/to/sphinx.conf
RLP environment configuration file. Mandatory if RLP is Specify config file. In absence of it, depending on how
used. was compiled, searchd can try to locate the config file Other useful parameters include --print-queries for
in current folder or a specific folder set at compilation printing the SQL queries run by a plain index with SQL
rlp_max_batch_docs source and --dump-rows to dump the fetched rows from
Maximum number of documents batched before $ searchd --status a SQL source to a file
processing them by the RLP. Optional, default is 50. Prints status and performance counters (same as
This option has effect only if morphology = SHOW STATUS)
rlp_chinese_batched is specified.
$ searchd --logdebug|--logdebugv|logdebugvv
rlp_max_batch_size Enabled additional debug output in daemon log
Maximum total size of documents batched before Configuration file
processing them by the RLP. Optional, default is 51200. $ searchd --iostats --cpustats
Provide io/cpu counter stats
rlp_root
Path to the RLP root folder. Mandatory if RLP is used. $ searchd --install Config use a custom plain format. Each section starts with
Install searchd as service on Windows name and the list of directives are enclosed by curly
brackets.

Searchd supports receiving signals.
$ kill -TERM cat /var/run/searchd.pid Sections:
HTTP JSON API
Sends a shutdown signal
searchd
$ kill -HUP cat /var/run/searchd.pid
Holds searchd settings. Only one section allowed
Initiate index rotation. This will also reload the
/json/insert configuration and can be used to activate new indexes
creates new document indexer
(like a new distributed index). Holds indexer settings. Only one section allowed
/json/replace
replace existing document
$ kill -USR1 cat /var/run/searchd.pid common
Force reopen of log files, useful for implemention log Holds several settings common to searchd and indexer.
New updates are put first in a fixed size memory called file rotation Only one section allowed
RAM chunk
index
/json/delete Index definition section.Multiple allowed
delete existing documents
Quick facts about index types source
/json/bulk Plain index source definiton. Multiple allowed
Allows bulking /insert,/replace,delete. Unlike other
endpoints, this requires request body in NDJSON
format plain
immutable text data.Require full reindexing. Attributes
/json/search can be updated.
Perform searches
Can use one or more defined sources
/json/pq
Perform percolate queries Can be converted to a RealTime index
RealTime
Starts empty,data can be added/changed/deleted
similar to a SQL table.New updates are put first in a
fixed size memory called RAM chunk
When filled, the RAM chunk is discarded to disk as a
disk chunk (which is almost identical as a plain index)
As disk chunk increase, performance is affected and
OPTIMIZE needs to be run.
distributed
Holds no data
Acts as master to local and/or remote indexes, by
sending queries to them and merging back the results.
In case of locals,dist_threads needs to be used to allow
creating parallel query threads
supports mirrors of remote indexes and load balancing
template
Holds no data
Can be used to test tokenization rules or highlighting
incoming data
percolate
Stores queries and not documents.It's based on
RealTime type.
The only index type that supports CALL PQ command


3 Cheatsheets: Simple Select MATCH ('Full-Text Query Expression') Group by Insert

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

3 Cheatsheets: Simple Select MATCH ('Full-Text Query Expression') Group by Insert

Uploaded by

Copyright:

Available Formats

SELECT Index manipulation Data manipulation

Simple select ALTER TABLE index {ADD|DROP} COLUMN column_name INSERT

OPTION SHOW STATUS [ LIKE pattern ]

© 2017-2019 Manticore Search, version v3

proximity_bm25 operator AND ABS()

sph04 multiple-field search operator DOUBLE()

© 2017-2019 Manticore Search, version v3

CALL PQ options CALL SNIPPETS options Miscellaneous functions

docs before_match ALL()

© 2017-2019 Manticore Search, version v3

rt_mem_limit type index_zones

© 2017-2019 Manticore Search, version v3

searchd directives searchd settings (continue) searchd settings (continue)

agent_connect_timeout net_workers rt_merge_maxiosize

© 2017-2019 Manticore Search, version v3

You might also like