summary refs log tree commit diff
path: root/bin/man1/dehtml.1
blob: c55c35d4543f4b1c729e311e3e7c5abd8ea8e7d5 (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
.Dd September  7, 2021
.Dt DEHTML 1
.Os
.
.Sh NAME
.Nm dehtml
.Nd extract text from HTML
.
.Sh SYNOPSIS
.Nm
.Op Fl s
.Op Ar
.
.Sh DESCRIPTION
The
.Nm
utility extracts text
from HTML documents.
Text inside
.Sy <title> ,
.Sy <style>
and
.Sy <script>
tags is discarded.
Numeric and common named HTML entities
are converted.
.
.Pp
The arguments are as follows:
.Bl -tag -width Ds
.It Fl s
Collapse whitespace outside of
.Sy <pre>
tags.
.El
.
.Sh BUGS
There is no way to extract image alt text.
>June McEnroe 2022-07-26Rewrite glitch from new pngoJune McEnroe 2022-07-26Update Care with time-to-ID and piercingsJune McEnroe 2022-07-26Add -w to upJune McEnroe 2022-07-13Set push.autoSetupRemoteJune McEnroe 2022-07-08Remove TOURJune McEnroe 2022-07-03Add The Bone Shard EmperorJune McEnroe 2022-06-25Bump xterm font size to 12June McEnroe 2022-06-10Handle subshells (and functions) inside substitutionsJune McEnroe 2022-06-10Switch to jorts Install scriptJune McEnroe 2022-06-08Indicate if still reading or no resultsJune McEnroe 2022-06-08Add Maiden, Mother, CroneJune McEnroe 2022-06-05FIRST SHOW IN 2.5 YEARS BABEY!!!June McEnroe 2022-06-03Set line number on File linesJune McEnroe 2022-06-03Stop polling stdin after EOFJune McEnroe 2022-06-02Set TABSIZE=4June McEnroe 2022-06-02Do basic match highlightingJune McEnroe 2022-06-02Clean up parsing a littleJune McEnroe 2022-06-02Don't duplicate path stringJune McEnroe 2022-06-02Use stderr instead of /dev/tty, realloc buffer if lines too longJune McEnroe 2022-06-02Add initial working version of qfJune McEnroe 2022-05-29Set prompt for okshJune McEnroe