summary refs log tree commit diff
path: root/bin/man1/dehtml.1
blob: c55c35d4543f4b1c729e311e3e7c5abd8ea8e7d5 (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
.Dd September  7, 2021
.Dt DEHTML 1
.Os
.
.Sh NAME
.Nm dehtml
.Nd extract text from HTML
.
.Sh SYNOPSIS
.Nm
.Op Fl s
.Op Ar
.
.Sh DESCRIPTION
The
.Nm
utility extracts text
from HTML documents.
Text inside
.Sy <title> ,
.Sy <style>
and
.Sy <script>
tags is discarded.
Numeric and common named HTML entities
are converted.
.
.Pp
The arguments are as follows:
.Bl -tag -width Ds
.It Fl s
Collapse whitespace outside of
.Sy <pre>
tags.
.El
.
.Sh BUGS
There is no way to extract image alt text.
>June McEnroe 2021-01-13Match [] as Operator in C lexerJune McEnroe 2021-01-13Fix C lexer to require a digit in a float literalJune McEnroe 2021-01-13Support long double in c.shJune McEnroe 2021-01-13Update Terminal.app coloursJune McEnroe 2021-01-13Increase dark white brightness slightlyJune McEnroe 2021-01-13Add hilex example to htagml manualJune McEnroe 2021-01-12Style causal.agency like bin HTMLJune McEnroe 2021-01-12Avoid matching tag text inside HTML elementsJune McEnroe 2021-01-12Use hilex for up -hJune McEnroe 2021-01-12Use hilex for bin HTMLJune McEnroe 2021-01-12Don't output a pre in hilex by defaultJune McEnroe 2021-01-12Move hilex out of hilex directoryJune McEnroe 2021-01-12Consolidate hilex formatters into hilex.cJune McEnroe 2021-01-12Remove hacky tagging from hilexJune McEnroe 2021-01-12Add htagml -iJune McEnroe 2021-01-12Render tag index in HTMLJune McEnroe 2021-01-12Add htagml -xJune McEnroe 2021-01-12Prevent matching the same tag twiceJune McEnroe 2021-01-12Process htagml file line by lineJune McEnroe 2021-01-12Split fields by tab onlyJune McEnroe 2021-01-12List both Makefile and html.sh under README.7June McEnroe 2021-01-12Add htagml exampleJune McEnroe 2021-01-12Use mandoc and htagml for bin htmlJune McEnroe 2021-01-12Add htagmlJune McEnroe 2021-01-12Replace causal.agency with a simple mdoc pageJune McEnroe 2021-01-11Publish "Using vi"June McEnroe 2021-01-11Enable diff.colorMovedJune McEnroe 2021-01-10Set less search case-insensitiveJune McEnroe 2021-01-10Set EXINITJune McEnroe 2021-01-09Add c -t flag to print expression typeJune McEnroe 2021-01-05Update taglineJune McEnroe