Overview
FAQs

Support > Frequently Asked Questions

Do you have a support question that you would like MetaCarta to answer? Let us know.

1. Do documents that mention multiple locations obtain multiple index entries?

2. If a document mentions a city such as Springfield without any other nearby reference, does MetaCarta leave it un-indexed since there are Springfields in many states and other countries? If it does index the resource, how is it done? For example, if Massachusetts is mentioned further down the page is the assumption made that Springfield Mass. is the correct location?

3. Does the system use any logic to distinguish a reference to a reporting organization, such as the Utica, New York Bugle (where the Bugle may be reporting on Timbuktu and Utica is irrelevant or of secondary importance) from the content of a report about Utica, New York?

4. Are search results sorted by relevancy based on keyword usage, geospatial references, or both? If both, how is each category weighted?

5. If geographic coordinates are included in the text of the Web page, does MetaCarta index the document for the referenced location?

6. If geographic coordinates are indexed, does MetaCarta parse all the combinations of dashes, slashes, and spaces, as well as degree, minute and second symbols that are used by various data providers? If not, what convention is it able to parse?

7. How large is your gazetteer?

8. If a Web page makes a reference to a city, which it follows with a set of geo-coords, does MetaCarta index both the provided geo-coords and the geo-coords (which may be different) from MetaCarta's own gazetteer database or is there some logic that determines which of the geo-coords to use?

9. For a Web page that is indexed to several nearby locations, if the bounding box used in the search query encloses several of the locations, do the search results display the same resource several times within the map window? If not, which location is used for the display?

Technical:

1. Do documents that mention multiple locations obtain multiple index entries?

Yes, very much so. MetaCarta aims at assigning a longitude/latitude to each location that appears in the document. There are two main exceptions:
(a) Material can be specifically excluded as part of the header, e.g. a user doesn't want to tag each and every document they have to a specific place
(b) A location is mentioned in the text only help the reader identify which place was meant. For example, in "Paris, TX" we only geolocate the town (make sure the long/lat is for the Paris in Texas not the one in France) but not the state, because the town gives us more specific coordinates, and whenever we can have specific information it always overrides the generic information (otherwise we'd pretty much have to tag everything for "Earth").

<<Back to Top>>

2. If a document mentions a city such as Springfield without any other nearby reference, does MetaCarta leave it un-indexed since there are Springfields in many states and other countries? If it does index the resource, how is it done? For example, if Massachusetts is mentioned further down the page is the assumption made that Springfield Mass. is the correct location?

In the worst case, when absolutely no other clue is given in the text, MetaCarta returns a list of Springfields, ranked in order of confidence. For such cases, confidence is primarily determined by size (population), and textual frequency (number of times the place cropped up in a billion-document collection).

Such cases are rare: typically the document contains many clues as to which alternative was meant. Typical clues include

        (a)explicit marking (as in the "Paris TX" case)
        (b) mention of enclosing region (as in your Springfield Mass example) 
        (c) mention of nearby places. For example, in "The Berlin and New Britain fire departments will hold a joint  fundraiser next week" we are pretty sure that the big Berlin and Britain don't play, it's two adjacent small towns in Connecticut.

MetaCarta spends lots of algorithmic effort on readjusting the confidence ranking according to the presence/absence and weight of such clues. In principle, all Springfields remain accessible at all times, but unless you use the "zero confidence" setting low confidence stuff gets dropped from the output ranking.

<<Back to Top>>

3. Does the system use any logic to distinguish a reference to a reporting organization, such as the Utica, New York Bugle (where the Bugle may be reporting on Timbuktu and Utica is irrelevant or of secondary importance) from the content of a report about Utica, New York?

Yes, but the logic is tunable. To the extent place information appears in the header, MetaCarta already offers the option to ignore it entirely. We are developing techniques for handling typical cases, such as news report bylines, which are not explicitly tagged as such in the input, and will give this option to our customers in a future release. Still, there are customers who take such documents to be relevant for Utica as well, so we have to keep open the option to tag these as well.

<<Back to Top>>

4. Are search results sorted by relevancy based on keyword usage, geospatial references, or both? If both, how is each category weighted?

Results are sorted by relevance, which combines several factors, the most important ones being (a) the system's confidence in a particular result (b) the textual emphasis (e.g. boldface, being close to the top of the document) of the string (c) keyword relevance (classic TF-IDF). The exact combination weights are proprietary.

<<Back to Top>>

5. If geographic coordinates are included in the text of the Web page, does MetaCarta index the document for the referenced location?

Yes. This is true for both civilian and military coordinate systems such as UTM or MGRS formatted coordinates. There are some syntactic variants, such as when the degrees and minutes are spelled out in words "thirty-six degrees and forty minutes North, seventy-five degrees five minutes West" that we currently do not handle, but these are actually very rare compared to the more ordinary cases when digits are used.

<<Back to Top>>

6. If geographic coordinates are indexed, does MetaCarta parse all the combinations of dashes, slashes, and spaces, as well as degree, minute and second symbols that are used by various data providers? If not, what convention is it able to parse?

We can never say we cover ALL combinations, but the important cases, which account for the bulk of coordinate mentions in running text, are already covered. We are continually expending the system to handle an ever-increasing variety of formats and spelling variants, but at this point the pickings are becoming increasingly slim.

<<Back to Top>>

7. How large is your gazetteer?

More than 10 million entries and variations from a combination of public authoritative sources and MetaCarta proprietary data.

<< Back to Top>>

8. If a Web page makes a reference to a city, which it follows with a set of geo-coords, does MetaCarta index both the provided geo-coords and the geo-coords (which may be different) from MetaCarta's own gazetteer database or is there some logic that determines which of the geo-coords to use?

This is a complex issue, different users expressed different preferences. We are still in the requirements gathering stage, the eventual solution is likely to be user-tunable just as header parsing. The current default behavior is that (a) the city and the geocoords BOTH get tagged/indexed (b) the city tag carries the geocoords computed from our gazetteer (c) the preexisting geocoords stay as they are except they get normalized to decimal degrees (d) the literal version of the preexisting geocoords is preserved (in the "Anchor" attribute of the geotag) both in case it was decimal degrees to begin with and in case it wasn't (nothing ever gets overwritten, tags are always providing _additional_ info).

<<Back to Top>>

9. For a Web page that is indexed to several nearby locations, if the bounding box used in the search query encloses several of the locations, do the search results display the same resource several times within the map window? If not, which location is used for the display?

When documents are very close at a given resolution, a separate "cluster of docs" icon is used to avoid visual overcrowding, and documents about the exact same location get a separate "stack of docs" icon. Therefore, this situation arises very rarely. For it to happen, two different (nearby) locations need to be used in the same doc with the exact same confidence and relevance high enough to the query that they both make it to the displayed tier. Since confidences are the results of complex computations over floating point numbers, they are hardly ever equal.

<<Back to Top>>