Jam less (jless)

Less is one of the best text viewer. It is a successor of more. It allows you to scroll forward, scroll backward, search, etc through multiple text files.

However, it doesn't support multi bytes characters. So, I made a patch to enhance it in order to view texts with multi character sets using using ISO 2022 code extention techniques. And I also support some code conversion among Japanese encoding schemes, JIS X 0208, JIS X 0213, SJIS, and UJIS.


Overview

The less is one of the best text viewer. I enhanced it for reading texts using ISO 2022 code extention techniques and using multiple Japanese encoding schemes.

ISO 2022 describes techniques to represents encoding techniques that allow making a text from more than 100 character sets. It contains ISO 646 IRV, ISO 646 UK, ISO 646 US, ISO 646 Swedish, ISO 646 German, ISO 8859-1, ISO 8859-2, ISO 8859-7 Greek, ISO 8859-6 Arabic, ISO 8859-8 Hebrew, GB 2312-80 Chinese, JIS X 0208:1997 Japanese, KS C 5601-1987 Korean, CNS 11643, etc. By supporting ISO 2022 standards, jless now has ability to show all of them.

However, what you can see depends on your terminal or terminal emulator. If your terminal supports ISO 2022 and has all character sets, you can see all. If your terminal has only few character sets, you can see only what your terminal has. Therefore, jless also contains a mechanism for code conversion in order to reduce the number of character sets that jless needs to show texts. For example, it can convert JIS C 6226-1978, JIS X 0208:1997, SJIS, and UJIS into JIS X 0213:2000. Moreover, It is possible to implement a partial conversion from GB into JIS X 0213:2000, but it is not yet implemented.

Here are README about my enhancement and its Japanese version.

If you are interested about ISO 2022, please look at my character encoding page. It currently supports these character sets


Note

Version Number

I would like to explain about the name and version numbfer. Original less is called less-XXX. The XXX is its version number. I called my patch as less-XXX-isoYYY. The YYY is my patch's version number.

Copyright and future of this patch

I, sometimes, submitted my patches and asked Mark to merge mine to the original tree. However, these are not merged. He said he may merge in future. I also think this code is too compilcated to understand without ISO 2022 knowledge, so I think this will not be merged.

On the other hand, there is a copyright issue. The copyright of an original less was BSD-style licnese. Then, it is changed to GPL once. Currently, it is using both, GPL and less license. I personally don't want to publish my libraries under the GPL. So, my patch is under only BSD-style less license.


News

Released iso262 patch for less-382 at 24 Feb. 2006.
Released iso261 patch for less-382 at 24 Feb. 2006.
Released iso260 patch for less-382 at 18 Feb. 2006.
Released iso259 patch for less-382 at 7 Feb. 2006.
Released iso258 patch for less-382 at 4 Sep. 2005.
Released iso254 patch for less-358 at 6 Dec. 2000.

Download

Latest Version

Old Versions

Other contibutions.

Latest History

iso233 3/10/98
Fixed typo and made multi.h.
iso234 3/12/98
Removed prewind_multi and pdone_multi because it depend on less. Add init_multi and clear_multi instead of them.
iso235 3/13/98
Add unify.c for chcmp_cs function.
iso236 3/14/98
Fixed MSB_ENABLE bugs.
iso237 3/16/98
Add unification among JIS X 0208, ASCII, Cyrillic and Greek.
iso238 3/17/98
Add NULLCS to represent a terminator. Changed a character set for control characters to WRONGCS. Add chunify_cs and chconvert_cs as external function.
iso239 3/20/98
Fixed a bug in match() and add assertion in chunify_cs().
iso240 3/25/98
Corrected all cmdbuf and cmdcs buffers' handling. Fixed a control character handling bug. Changed to remove padded codes from search pattern.
iso241 4/2/98
Fixed small bugs in search.c.
iso242 5/18/98
Fixed a buffering problem of search.
iso243 7/1/98
Add elimination of wrong characters for JIS C 6226-1978, JIS X 0208-1983, and JIS X 0208:1990.
iso244 7/2/98
Add elimination of wrong characters for SJIS and UJIS.
iso245 7/2/98
Fix a bug about elimination for SJIS.
iso246 8/8/98
Add one locale for Win32, eliminate all MSB_ENABLE stuff from unify.c, and fix eliminating table for JIS C 6226-1978.
iso247 8/8/98
Add -W option. And change the point of putting a mark. Now multi.c call checking function, then mark wrong characters.
iso248 8/12/98
Fix a problem of outputting WRONGCS. Add checking table for JIS X0212.1990.
iso249 10/29/00
Joined with less-358. Fixed some bugs caused by join.
iso250 11/21/00
Support JIS X 0213:2000. Added support of cygwin. Thanks to nayuta-san.
iso251 11/22/00
Support SJIS and UJIS using JIS X 0213:2000.
iso252 11/24/00
Fixed a problem to output JIS X 0212:1990 using jis style.
iso253 12/2/00
Fixed a problem to output SJIS. Thanks to nayuta-san. Fixed assertion problem in search.c. Thanks to SAKAKI Kiyotake, Tanaka Akira, and Yuichi SATO.
iso254 12/5/00
Fixed a problem to output JIS X 0213:2000 plane 2 into SJIS. Thanks to Shinya Hanataka.
iso255 8/30/05
Joined with less-378.
iso256 8/30/05
Joined with less-381.
iso257 9/4/05
Fixed problems caused by merge.
Changed buffering mechanism to track exact POSITION through code set conversion. This helped hiliting routine and improved less running speed.
Changed to parse text from the beginning of physical line when less jumps into the middle of text. This fixed major problems on stateful text like ISO-2022.
Fixed JIS X 0213:2000 related problems. Thanks to Takeshi WATANABE. Also, fixed a problem reported by him. Less will not split one wrong multi-byte character into different lines even if it is not fit in first line. Less moves entire text to second line.
iso258 9/4/05
Joined with less-382.
iso259 9/6/05
Changed an algorithm to detect the gap of parsing input stream. This fixed a problem on long JIS/English text.
Fixed '\r' problem.
iso260 9/19/05
Changed the algorithm handling input and output character sets. Now jless use two variables, one represents supporting character sets for input stream, and the other represents encoding scheme for output stream.
Changed to support JISX0213:2004.
iso261 2/24/06
Changed put_wrongmark function to make it work with new iso260 buffering semantics. And applied a patch provied by Takuji. Thanks to Takuji.
iso262 2/24/06
Removed POSITION variable from member variables of M_BUFDATA. It was added to make multi-byte character buffering function work better with less. However, it degraded abstraction level of data structure (multi.h). This time, add POSITION* as an additional argument of few functions and keep data structure as simple as possible.
This modification make regex_cs-lwp9k be able to compile.

Mailing List

Subscribe to jless ML in English

Subscribe to jless ML in Japanese


Jam's welcome page -- Jam@pobox.com -- last modified February 24 2006 -- 57258