Kolen Cheung
2016-10-22 09:09:32 UTC
Hi, all,
pandoc from markdown to markdown
Ever since I read issue #2814 <https://github.com/jgm/pandoc/issues/2814>,
I find it a very useful trick.
I am now working on a project, that starting from next semester will open
up to about 100 GSIs to collaboratively update a series of workbooks. I
want to incorporate the said trick as a cleanup tool to normalize the
source code (in pandoc markdown with minimal raw LaTeX).
Most things works very well, but however I find a few problems. I donât
know if thereâs any way to get around these?
1. ### Main Goals {-} becomes ### Main Goals {#main-goals .unnumbered}:
I want to keep using {-} for 2 reasons: shorter, and does not depends on
the header (which will gets repeated after cat).
2. 1. abcd... becomes 1. abcd...: it seems that pandoc enforce 2 spaces
after the enumerated list/bullet list. Are there ways to change this
behavior? I suppose I could use a regex to transform it back but it seems
to prone to error.
3. inline footnotes: I found that pandoc would convert inline footnotes
to explicit footnotes with [^1], [^2].... And the use of inline_notes
cannot be enforced. I opened an issue in #3172
<https://github.com/jgm/pandoc/issues/3172>. I suppose I can change the
source code to use explicit footnotes only. But it seems difficult to
enforce it and tell people not to use inline footnotes.
4. ™ becomes â¢: after studying how trademark should be typeset,
considering I aim at HTML+LaTeX output and no non-ascii characters in the
source code, I chose ™. But pandoc would happily convert that to â¢
without my consent. I suppose other such HTML characters might behave
similarly. (by the way, input ™ from markdown would output ⢠in
TeX, and pdflatex has no problem with that. The resultant PDF looks
identical as if I use \texttrademark. Does anyone knows why? I thought
pdflatex donât like unicode.)
5. pipe tables becomes HTML tables: I believe it is a bug so I opened issue
#3171 <https://github.com/jgm/pandoc/issues/3171>. Even more
interestingly, the pipe tables were obtained by a .docx to .md
conversion.
The command I used to enforce âpandoc styleâ is:
find . -maxdepth 2 -mindepth 2 -iname "*.md" -exec pandoc -f markdown+abbreviations+autolink_bare_uris+markdown_attribute+mmd_header_identifiers+mmd_link_attributes+mmd_title_block+tex_math_double_backslash-latex_macros -t markdown+raw_tex-native_spans-simple_tables-multiline_tables-grid_tables-latex_macros --normalize -s --wrap=none --atx-headers -o {} {} \;
âpandoc lintâ
By the way, does anyone know how to do some sort of âpandoc lintâ?
Currently I checked the TeX output by chktex -q and lacheck, which
sometimes gives useful typographical hints on what to correct.
And I remembered I read somewhere @jgm mentioned something about a random
string should be a valid markdown syntax (part of the markdown philosophy
kind of thing). In this sense it seems very difficult to enforce a ârightâ
syntax in markdown.
cat a lot of markdown files into one
Lastly, thereâs a very minor issue: if I cat lots of markdown files into
one, then between the end of one file to the beginning of another, the lack
of enough newlines between them might make it a wrong markdown syntax. (
*e.g.* the beginning of a file starts with a heading, some text editors (
*e.g.* Atom) normalized my trailing newline without my consent to 1 empty
line. So then the heading would start immediately after the last paragraph,
which pandoc will not parse it as a heading.)
I currently get around this problem with a script to normalize every files
with exactly 2 trailing empty lines.
I suppose cating markdown files would be a very common process. How
normally would others do it?
Thanks in advance,
Kolen
â
pandoc from markdown to markdown
Ever since I read issue #2814 <https://github.com/jgm/pandoc/issues/2814>,
I find it a very useful trick.
I am now working on a project, that starting from next semester will open
up to about 100 GSIs to collaboratively update a series of workbooks. I
want to incorporate the said trick as a cleanup tool to normalize the
source code (in pandoc markdown with minimal raw LaTeX).
Most things works very well, but however I find a few problems. I donât
know if thereâs any way to get around these?
1. ### Main Goals {-} becomes ### Main Goals {#main-goals .unnumbered}:
I want to keep using {-} for 2 reasons: shorter, and does not depends on
the header (which will gets repeated after cat).
2. 1. abcd... becomes 1. abcd...: it seems that pandoc enforce 2 spaces
after the enumerated list/bullet list. Are there ways to change this
behavior? I suppose I could use a regex to transform it back but it seems
to prone to error.
3. inline footnotes: I found that pandoc would convert inline footnotes
to explicit footnotes with [^1], [^2].... And the use of inline_notes
cannot be enforced. I opened an issue in #3172
<https://github.com/jgm/pandoc/issues/3172>. I suppose I can change the
source code to use explicit footnotes only. But it seems difficult to
enforce it and tell people not to use inline footnotes.
4. ™ becomes â¢: after studying how trademark should be typeset,
considering I aim at HTML+LaTeX output and no non-ascii characters in the
source code, I chose ™. But pandoc would happily convert that to â¢
without my consent. I suppose other such HTML characters might behave
similarly. (by the way, input ™ from markdown would output ⢠in
TeX, and pdflatex has no problem with that. The resultant PDF looks
identical as if I use \texttrademark. Does anyone knows why? I thought
pdflatex donât like unicode.)
5. pipe tables becomes HTML tables: I believe it is a bug so I opened issue
#3171 <https://github.com/jgm/pandoc/issues/3171>. Even more
interestingly, the pipe tables were obtained by a .docx to .md
conversion.
The command I used to enforce âpandoc styleâ is:
find . -maxdepth 2 -mindepth 2 -iname "*.md" -exec pandoc -f markdown+abbreviations+autolink_bare_uris+markdown_attribute+mmd_header_identifiers+mmd_link_attributes+mmd_title_block+tex_math_double_backslash-latex_macros -t markdown+raw_tex-native_spans-simple_tables-multiline_tables-grid_tables-latex_macros --normalize -s --wrap=none --atx-headers -o {} {} \;
âpandoc lintâ
By the way, does anyone know how to do some sort of âpandoc lintâ?
Currently I checked the TeX output by chktex -q and lacheck, which
sometimes gives useful typographical hints on what to correct.
And I remembered I read somewhere @jgm mentioned something about a random
string should be a valid markdown syntax (part of the markdown philosophy
kind of thing). In this sense it seems very difficult to enforce a ârightâ
syntax in markdown.
cat a lot of markdown files into one
Lastly, thereâs a very minor issue: if I cat lots of markdown files into
one, then between the end of one file to the beginning of another, the lack
of enough newlines between them might make it a wrong markdown syntax. (
*e.g.* the beginning of a file starts with a heading, some text editors (
*e.g.* Atom) normalized my trailing newline without my consent to 1 empty
line. So then the heading would start immediately after the last paragraph,
which pandoc will not parse it as a heading.)
I currently get around this problem with a script to normalize every files
with exactly 2 trailing empty lines.
I suppose cating markdown files would be a very common process. How
normally would others do it?
Thanks in advance,
Kolen
â
--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+***@googlegroups.com.
To post to this group, send email to pandoc-***@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/e82e943f-604e-4a5b-a621-4b3dd82e42c0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+***@googlegroups.com.
To post to this group, send email to pandoc-***@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/e82e943f-604e-4a5b-a621-4b3dd82e42c0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.