Linguistic
and numerical indexing in accessing pictorial databases
By
Mark
A. Holmes
CSE
580.01 Winter 2016
Here I discuss various ways of efficiently
retrieving objects from pictorial databases.
Retrieval of objects from pictorial
databases can be done efficiently and using a multilevel index and dense
indexing, although secondary indexing and sparse indexing could be used for
fuzzy logic searches. Disk access and saving storage space would also be a
consideration, of course.
Linguistic or numerical indexing
alone
Maybe linguistic indexing alone is not
that efficient
Linguistic indexing might include names of
people in the picture, what or who is depicted in the picture, the type of
event, the location, the time, or the general mood of the picture. You would
want to have an abstract, or concise description of the contents of the
picture, if this is how you want to organize your images.
It
isn’t always possible to obtain this information and there might also be
linguistic barriers; the information may not be in a language the searcher
understands, for example, the abstract may be in English, but the searcher may
not be an English speaker.
Typographical errors may also impede
linguistic indexing.
Maybe numerical indexing alone is
not that efficient, either
Difficulty in remembering numbers
It would not be realistic to expect people
to remember index numbers to use as search terms. People want to search by
image attribute (“dog”, “Denali”, “smiling baby”, etc.), not by some number.
Limitations in memory would likely require
reference-based indexing.
Update anomalies
There would be times when one would have
to delete or modify an image stored in the database. Deleting an entry would
require the creation of null values for the database entry if one doesn’t want
to update the index number when the image to which the index refers is deleted.
Abstracts would also have to be deleted when the image they describe is
deleted, or edited, to avoid update anomalies in the form of inaccurate text or
“orphaned” text that refers to nothing.
Multiple languages
As I have said earlier, not everybody in
the world is a native English speaker. Database languages such as MongoDB can
handle translations of abstracts, where abstracts exist.
Efficiency of numerical indexing
You would probably want to use a table.
How this might work
Universal translators
Database languages such as MongoDB can
handle translations of abstracts, at least potentially, as I said earlier.
Potential
problems
Image content not described by abstracts
Just because an abstract exists doesn’t
mean it completely and accurately describes the contents of the image. Poorly
written abstracts, therefore, can be an impediment to accurate image searches.
Image content described by abstracts, but
inadequately
In
fact, shape-based indexing based on neither abstracts nor incremented
integer-based numerical indexing and often using hashes is what is used right
now.
Glossary
Dense indexing:
Dense indexing involves the use of a dense
index, a file with pairs of keys and pointers for every record in the data
file. Every key in this file is associated with a particular pointer to a
record in the sorted data file. In clustered indices with duplicate keys, the
dense index points to the first record with that key.
Sparse indexing:
Sparse indexing involves the use of a
sparse index, a file with
pairs of keys and pointers for every block in the data file. Every key in this file
is associated with a particular pointer to
the block in the sorted data
file. In clustered indices with duplicate keys, the sparse index points to the lowest search key in each block.
A unique index does not allow any duplicate values to be inserted into
the table. The basic syntax is as follows:
CREATE
UNIQUE INDEX index_nameon table_name
(column_name);
A composite index is an index on two or more columns of a table. The
basic syntax is as follows:
CREATE
INDEX index_nameon table_name
(column1, column2);.
Implicit indexes are automatically created by the database server when
an object is created. Indexes are automatically created for primary key
constraints and unique constraints.
Hash table indexing:
In hash table indexing,
the column value will be the key to the hash table and the actual value mapped
to that key would be a pointer to the row data in the table. The value you
would look up, say, “Denali”, would be the left side of the hash table entry
and the right side would be an alphanumeric sequence that would refer to the
table row where Denali, based on your photo’s abstract, is stored in memory. It
would look something like “Denali => 0x27799″, These keys are not stored in any
particular order and can only be used for queries that check for equality
(e.g., WHERE subject = ‘Denali’).
Binary index tree:
Also known
as a B-tree or Fenwick tree, after New Zealander computer scientist Peter
Fenwick, who first proposed it in 1994. They provide a method for calculation
and manipulation of the prefix sums of a table of values (for example, a
database index table). The binary tre calculates
prefix sums and modifies the table at time,
where
is
the size of the table.
References
Castellano, Giovanna, Anna M Fanelli, and
Maria A Torsello.
“Incremental Indexing of Objects in Pictorial Databases .”
Journal of Visual Languages and Sentient Systems 1 (2015): 23–28. Web. 20 Mar.
2016.
Faloutsos, Christos,
“Indexing Multimedia Databases”, Advanced Course on Multimedia Databases In Perspective, University of Twente, the Netherlands, 1995,
pp. 239-278.
Super, Boaz J. “Fast Retrieval of Isolated
Visual Shapes.” Computer Vision and Image Understanding 85.1 (2002): 1–21. Web.
20 Mar. 2016.
N. K. Ratha, K. Karu, Shaoyun Chen and A. K.
Jain, "A real-time matching system for large fingerprint databases,"
in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 18, no.
8, pp. 799-813, Aug 1996.
https://docs.oracle.com/cd/B12037_01/appdev.101/b10795/adfns_in.htm