Submit Documents

Users are encouraged to submit content that they deem appropriate to the CiteSeerx collection. It is advisable to check with your co-authors before submission.

If you do not want your documents crawled by CiteSeerx, please use a robots.txt to disallow our crawler named "citeseerxbot". We require that all content be submitted through links to publicly accessible documents on the Web. Please make sure you have provided relevant permissions and your robots.txt permits documents to be crawled by our bot "citeseerxbot". Once we receive a link submission, that link will be queued for crawling and processed dynamically. Allow several weeks before the documents are indexed by CiteSeerX.

Overview

Once a URL is submitted, it will be crawled to a depth of 1 for PDF and PostScript files. These files may be compressed with zip, gzip, or compress formats. Any matching files will be downloaded and queued for processing within our ingestion pipeline, at which point our parsers will attempt to extract the text from the documents, filter the text for relevance, convert to PDF if necessary, and extract metadata from the document headers and reference sections.

Supported File Formats

  • PDF: (Recommended) We are generally able to convert PDF documents in such a way as to preserve UTF-8 character codes. Therefore, we recommend submitting content in this format particulary if your files contain characters that cannot be correctly represented within the ASCII character set.
  • PS: We do support PostScript files; however, text conversion will be limited to ASCII-only due to limitations in standard PostScript text extractors.
  • ZIP | GZ | Z: Common compression formats such as zip, gzip, and UNIX compress are all supported.

*Publishers policy on self-archiving of your publications.