Mostly Programming by Jules JacobsMostly programming related.
http://julesjacobs.github.io/
Thu, 11 Apr 2019 00:09:46 +0000Thu, 11 Apr 2019 00:09:46 +0000Jekyll v3.7.4Disambiguation of context free grammars<p>A naive context free grammar for arithmetic expressions is ambiguous:</p>
<figure class="highlight"><pre><code class="language-fsharp" data-lang="fsharp"><span class="nc">E</span> <span class="p">-></span> <span class="nc">E</span> <span class="k">'</span><span class="o">+</span><span class="k">'</span> <span class="nc">E</span> <span class="p">|</span> <span class="nc">E</span> <span class="k">'</span><span class="p">*</span><span class="k">'</span> <span class="nc">E</span> <span class="p">|</span> <span class="k">'</span><span class="n">n'</span></code></pre></figure>
<p>The input <code class="highlighter-rouge">n+n+n</code> can be parsed as <code class="highlighter-rouge">(n+n)+n</code> or <code class="highlighter-rouge">n+(n+n)</code>, and the input <code class="highlighter-rouge">n+n*n</code> can be parsed as <code class="highlighter-rouge">(n+n)*n</code> or <code class="highlighter-rouge">n+(n*n)</code>. A parser for general context free grammars can produce a parse forest that contains all the ways of parsing an input string. However, in practice we only want one parse tree, selected according to precedence and associativity rules. Parser generators support disambiguation rules to specify which parse tree you want by filtering out the trees that you don’t want. This becomes more complicated when prefix operators and if-then/if-then-else is involved. You end up filtering out all parse trees if the filtering rules are too strict, and you still end up with multiple parse trees if they are too loose. By carefully crafting the filtering mechanism you can make sure that you end up with one parse tree, but you may have to look arbitrarily deeply into a parse tree to figure out whether it should be filtered out. The article <a href="https://researchr.org/publication/AmorimSV18">Towards Zero-Overhead Disambiguation of Deep Priority Conflicts
</a> by Luís Eduardo de Souza Amorim, Michael J. Steindorfer and Eelco Visser shows how to efficiently implement those rules in an SGLR parser, by passing a bitfield up the parse tree as it is being constructed. This bitfield stores which kind of nodes are in left- and rightmost spines of that parse tree, which determines which parents are allowed.</p>
<p>In this post I’ll investigate an alternative way to do disambiguation that is based on disambiguating individual unions <code class="highlighter-rouge">A | B</code> and sequential compositions <code class="highlighter-rouge">A B</code> in the grammar, rather than filtering certain tree patterns out of a parse forest.</p>
<p>The union <code class="highlighter-rouge">A | B</code> will produce ambiguity whenever there is an input that can be parsed by both <code class="highlighter-rouge">A</code> and <code class="highlighter-rouge">B</code>. This is easily resolved by introducing a left-biased choice <code class="highlighter-rouge">A / B</code> that will try <code class="highlighter-rouge">A</code> first and only try <code class="highlighter-rouge">B</code> if <code class="highlighter-rouge">A</code> fails. This is functionally identical to a precedence filter that filters out <code class="highlighter-rouge">B</code>-trees in favour of <code class="highlighter-rouge">A</code>-trees. The difference is our perspective: we think of <code class="highlighter-rouge">A / B</code> as an unambiguous version of <code class="highlighter-rouge">A | B</code>, rather than as a filter that filters a parse forest.</p>
<p>The sequential composition <code class="highlighter-rouge">A B</code> can also produce ambiguity. For example, if <code class="highlighter-rouge">A = B = 'x' | 'xx' | 'xxx'</code> then the string <code class="highlighter-rouge">xxxx</code> can be parsed as <code class="highlighter-rouge">(x)(xxx)</code> or <code class="highlighter-rouge">(xx)(xx)</code> or <code class="highlighter-rouge">(xxx)(x)</code>. There is ambiguity about how much of the string is parsed by <code class="highlighter-rouge">A</code> and how much by <code class="highlighter-rouge">B</code>. Hence we introduce right-biased sequential composition <code class="highlighter-rouge">A > B</code>, which always parses as much as possible with <code class="highlighter-rouge">A</code>, and left-biased sequential composition <code class="highlighter-rouge">A < B</code>, which always parses as much as possible with <code class="highlighter-rouge">B</code>. The string <code class="highlighter-rouge">xxxx</code> will be parsed as <code class="highlighter-rouge">(xxx)(x)</code> by <code class="highlighter-rouge">A > B</code> and as <code class="highlighter-rouge">(x)(xxx)</code> by <code class="highlighter-rouge">A < B</code>.</p>
<p>That’s all well and good, but can we actually do associativity and precedence with those operators? It turns out that we can:</p>
<figure class="highlight"><pre><code class="language-fsharp" data-lang="fsharp"><span class="nc">E</span> <span class="p">-></span> <span class="nc">E</span> <span class="p"><</span> <span class="k">'</span><span class="o">+</span><span class="k">'</span> <span class="nc">E</span> <span class="o">/</span> <span class="nc">E</span> <span class="p">></span> <span class="k">'</span><span class="p">*</span><span class="k">'</span> <span class="nc">E</span> <span class="p">|</span> <span class="k">'</span><span class="n">n'</span></code></pre></figure>
<p>This will parse <code class="highlighter-rouge">+</code> as right-associative, <code class="highlighter-rouge">*</code> as left-associative, and gives higher precedence to <code class="highlighter-rouge">*</code>. Note that the precedence and associativity of <code class="highlighter-rouge"><</code>,<code class="highlighter-rouge">></code>,<code class="highlighter-rouge">/</code> is such that the grammar is parsed as follows:</p>
<figure class="highlight"><pre><code class="language-fsharp" data-lang="fsharp"><span class="nc">E</span> <span class="p">-></span> <span class="p">(</span><span class="nc">E</span> <span class="p"><</span> <span class="p">(</span><span class="k">'</span><span class="o">+</span><span class="k">'</span> <span class="nc">E</span><span class="o">))</span> <span class="o">/</span> <span class="p">(</span><span class="nc">E</span> <span class="p">></span> <span class="p">(</span><span class="k">'</span><span class="p">*</span><span class="k">'</span> <span class="nc">E</span><span class="o">))</span> <span class="p">|</span> <span class="k">'</span><span class="n">n'</span></code></pre></figure>
<p>Here’s a CYK-parser written in F# that implements this:</p>
<figure class="highlight"><pre><code class="language-fsharp" data-lang="fsharp"><span class="k">type</span> <span class="nc">Node</span> <span class="p">=</span> <span class="nc">Str</span> <span class="k">of</span> <span class="kt">string</span> <span class="p">|</span> <span class="nc">Sym</span> <span class="k">of</span> <span class="kt">string</span> <span class="p">|</span> <span class="nc">AltL</span> <span class="k">of</span> <span class="nc">Node</span> <span class="p">*</span> <span class="nc">Node</span> <span class="p">|</span> <span class="nc">SeqL</span> <span class="k">of</span> <span class="nc">Node</span> <span class="p">*</span> <span class="nc">Node</span> <span class="p">|</span> <span class="nc">SeqR</span> <span class="k">of</span> <span class="nc">Node</span> <span class="p">*</span> <span class="nc">Node</span>
<span class="k">let</span> <span class="n">parse</span> <span class="n">grammar</span> <span class="p">(</span><span class="n">str</span><span class="p">:</span><span class="kt">string</span><span class="p">)</span> <span class="p">=</span>
<span class="k">let</span> <span class="n">cache</span> <span class="p">=</span> <span class="k">new</span> <span class="nn">System</span><span class="p">.</span><span class="nn">Collections</span><span class="p">.</span><span class="nn">Generic</span><span class="p">.</span><span class="nc">Dictionary</span><span class="o"><_,_>()</span>
<span class="k">let</span> <span class="k">rec</span> <span class="n">parseN</span> <span class="nc">N</span> <span class="n">i</span> <span class="n">j</span> <span class="p">=</span>
<span class="k">match</span> <span class="n">cache</span><span class="p">.</span><span class="nc">TryGetValue</span><span class="o">((</span><span class="nc">N</span><span class="p">,</span><span class="n">i</span><span class="p">,</span><span class="n">j</span><span class="o">))</span> <span class="k">with</span>
<span class="p">|</span> <span class="bp">true</span><span class="p">,</span> <span class="n">v</span> <span class="p">-></span> <span class="n">v</span>
<span class="p">|</span> <span class="p">_</span> <span class="p">-></span>
<span class="k">let</span> <span class="n">parseSeq</span> <span class="nc">A</span> <span class="nc">B</span> <span class="p">=</span> <span class="nn">Seq</span><span class="p">.</span><span class="n">tryPick</span> <span class="p">(</span><span class="k">fun</span> <span class="n">k</span> <span class="p">-></span>
<span class="k">match</span> <span class="p">(</span><span class="n">parseN</span> <span class="nc">A</span> <span class="n">i</span> <span class="n">k</span><span class="p">,</span><span class="n">parseN</span> <span class="nc">B</span> <span class="n">k</span> <span class="n">j</span><span class="p">)</span> <span class="k">with</span>
<span class="p">|</span> <span class="nc">Some</span> <span class="n">a</span><span class="p">,</span><span class="nc">Some</span> <span class="n">b</span> <span class="p">-></span> <span class="nc">Some</span> <span class="p">(</span><span class="n">a</span> <span class="o">+</span> <span class="n">b</span><span class="p">)</span> <span class="p">|</span> <span class="p">_</span> <span class="p">-></span> <span class="nc">None</span><span class="p">)</span>
<span class="k">let</span> <span class="n">v</span> <span class="p">=</span> <span class="k">match</span> <span class="nc">N</span> <span class="k">with</span>
<span class="p">|</span> <span class="nc">SeqL</span><span class="p">(</span><span class="nc">A</span><span class="p">,</span><span class="nc">B</span><span class="p">)</span> <span class="p">-></span> <span class="n">parseSeq</span> <span class="nc">A</span> <span class="nc">B</span> <span class="p">(</span><span class="n">seq</span><span class="p">{</span><span class="n">i</span> <span class="o">+</span> <span class="mi">1</span><span class="p">..</span><span class="n">j</span><span class="p">-</span><span class="mi">1</span><span class="o">})</span>
<span class="p">|</span> <span class="nc">SeqR</span><span class="p">(</span><span class="nc">A</span><span class="p">,</span><span class="nc">B</span><span class="p">)</span> <span class="p">-></span> <span class="n">parseSeq</span> <span class="nc">A</span> <span class="nc">B</span> <span class="p">(</span><span class="n">seq</span><span class="p">{</span><span class="n">j</span><span class="p">-</span><span class="mi">1</span><span class="o">..-</span><span class="mi">1</span><span class="p">..</span><span class="n">i</span> <span class="o">+</span> <span class="mi">1</span><span class="o">})</span>
<span class="p">|</span> <span class="nc">AltL</span><span class="p">(</span><span class="nc">A</span><span class="p">,</span><span class="nc">B</span><span class="p">)</span> <span class="p">-></span>
<span class="k">match</span> <span class="n">parseN</span> <span class="nc">A</span> <span class="n">i</span> <span class="n">j</span> <span class="k">with</span>
<span class="p">|</span> <span class="nc">Some</span> <span class="n">s</span> <span class="p">-></span> <span class="nc">Some</span> <span class="n">s</span> <span class="p">|</span> <span class="nc">None</span> <span class="p">-></span> <span class="n">parseN</span> <span class="nc">B</span> <span class="n">i</span> <span class="n">j</span>
<span class="p">|</span> <span class="nc">Str</span> <span class="n">s</span> <span class="p">-></span> <span class="k">if</span> <span class="n">j</span><span class="p">-</span><span class="n">i</span><span class="p">=</span><span class="n">s</span><span class="p">.</span><span class="nc">Length</span> <span class="p">&&</span> <span class="n">str</span><span class="o">.[</span><span class="n">i</span><span class="p">..</span><span class="n">j</span><span class="p">-</span><span class="mi">1</span><span class="o">]=</span><span class="n">s</span> <span class="k">then</span> <span class="nc">Some</span> <span class="n">s</span> <span class="k">else</span> <span class="nc">None</span>
<span class="p">|</span> <span class="nc">Sym</span> <span class="n">p</span> <span class="p">-></span> <span class="nn">Option</span><span class="p">.</span><span class="n">map</span> <span class="p">(</span><span class="n">sprintf</span> <span class="s2">"%s[%s]"</span> <span class="n">p</span><span class="p">)</span> <span class="p">(</span><span class="n">parseN</span> <span class="p">(</span><span class="nn">Map</span><span class="p">.</span><span class="n">find</span> <span class="n">p</span> <span class="n">grammar</span><span class="p">)</span> <span class="n">i</span> <span class="n">j</span><span class="p">)</span>
<span class="n">cache</span><span class="p">.</span><span class="nc">Add</span><span class="o">((</span><span class="nc">N</span><span class="p">,</span><span class="n">i</span><span class="p">,</span><span class="n">j</span><span class="o">),</span><span class="n">v</span><span class="o">);</span> <span class="n">v</span>
<span class="n">parseN</span> <span class="p">(</span><span class="nn">Map</span><span class="p">.</span><span class="n">find</span> <span class="s2">"S"</span> <span class="n">grammar</span><span class="p">)</span> <span class="mi">0</span> <span class="p">(</span><span class="nn">String</span><span class="p">.</span><span class="n">length</span> <span class="n">str</span><span class="p">)</span> </code></pre></figure>
<p>We write the arithmetic grammar as follows:</p>
<figure class="highlight"><pre><code class="language-fsharp" data-lang="fsharp"><span class="k">let</span> <span class="n">arithmeticGrammar</span> <span class="p">=</span>
<span class="p">[</span>
<span class="s2">"S"</span><span class="p">,</span><span class="nc">AltL</span><span class="p">(</span>
<span class="nc">Str</span> <span class="s2">"n"</span><span class="p">,</span>
<span class="nc">AltL</span><span class="p">(</span>
<span class="nc">SeqL</span><span class="p">(</span><span class="nc">Sym</span> <span class="s2">"S"</span><span class="p">,</span><span class="nc">SeqL</span><span class="p">(</span><span class="nc">Str</span> <span class="s2">" + "</span><span class="p">,</span> <span class="nc">Sym</span> <span class="s2">"S"</span><span class="o">)),</span>
<span class="nc">SeqR</span><span class="p">(</span><span class="nc">Sym</span> <span class="s2">"S"</span><span class="p">,</span><span class="nc">SeqL</span><span class="p">(</span><span class="nc">Str</span> <span class="s2">" * "</span><span class="p">,</span> <span class="nc">Sym</span> <span class="s2">"S"</span><span class="o">))))</span>
<span class="p">]</span> <span class="p">|></span> <span class="nn">Map</span><span class="p">.</span><span class="n">ofList</span></code></pre></figure>
<p>And test it:</p>
<figure class="highlight"><pre><code class="language-fsharp" data-lang="fsharp"><span class="p">></span> <span class="n">test</span> <span class="n">arithmeticGrammar</span> <span class="p">[</span><span class="s2">"n + n + n"</span><span class="p">;</span><span class="s2">"n * n * n"</span><span class="p">;</span><span class="s2">"n + n * n"</span><span class="p">;</span><span class="s2">"n * n + n"</span><span class="p">;</span><span class="s2">"n * * + n"</span><span class="o">];;</span>
<span class="n">n</span> <span class="o">+</span> <span class="n">n</span> <span class="o">+</span> <span class="n">n</span> <span class="o">==></span> <span class="nc">S</span><span class="p">[</span><span class="n">n</span><span class="p">]</span> <span class="o">+</span> <span class="nc">S</span><span class="p">[</span><span class="nc">S</span><span class="p">[</span><span class="n">n</span><span class="p">]</span> <span class="o">+</span> <span class="nc">S</span><span class="p">[</span><span class="n">n</span><span class="o">]]</span>
<span class="n">n</span> <span class="p">*</span> <span class="n">n</span> <span class="p">*</span> <span class="n">n</span> <span class="o">==></span> <span class="nc">S</span><span class="p">[</span><span class="nc">S</span><span class="p">[</span><span class="n">n</span><span class="p">]</span> <span class="p">*</span> <span class="nc">S</span><span class="p">[</span><span class="n">n</span><span class="o">]]</span> <span class="p">*</span> <span class="nc">S</span><span class="p">[</span><span class="n">n</span><span class="p">]</span>
<span class="n">n</span> <span class="o">+</span> <span class="n">n</span> <span class="p">*</span> <span class="n">n</span> <span class="o">==></span> <span class="nc">S</span><span class="p">[</span><span class="n">n</span><span class="p">]</span> <span class="o">+</span> <span class="nc">S</span><span class="p">[</span><span class="nc">S</span><span class="p">[</span><span class="n">n</span><span class="p">]</span> <span class="p">*</span> <span class="nc">S</span><span class="p">[</span><span class="n">n</span><span class="o">]]</span>
<span class="n">n</span> <span class="p">*</span> <span class="n">n</span> <span class="o">+</span> <span class="n">n</span> <span class="o">==></span> <span class="nc">S</span><span class="p">[</span><span class="nc">S</span><span class="p">[</span><span class="n">n</span><span class="p">]</span> <span class="p">*</span> <span class="nc">S</span><span class="p">[</span><span class="n">n</span><span class="o">]]</span> <span class="o">+</span> <span class="nc">S</span><span class="p">[</span><span class="n">n</span><span class="p">]</span>
<span class="n">n</span> <span class="p">*</span> <span class="p">*</span> <span class="o">+</span> <span class="n">n</span> <span class="o">==></span> <span class="nc">Parse</span> <span class="n">error</span></code></pre></figure>
<p>With this utility function:</p>
<figure class="highlight"><pre><code class="language-fsharp" data-lang="fsharp"><span class="k">let</span> <span class="n">test</span> <span class="n">g</span> <span class="n">ss</span> <span class="p">=</span> <span class="n">ss</span> <span class="p">|></span> <span class="nn">Seq</span><span class="p">.</span><span class="n">iter</span> <span class="p">(</span><span class="k">fun</span> <span class="n">s</span> <span class="p">-></span> <span class="n">printf</span> <span class="s2">"%s ==> %s</span><span class="se">\n</span><span class="s2">"</span> <span class="n">s</span> <span class="p">(</span><span class="n">defaultArg</span> <span class="p">(</span><span class="n">parse</span> <span class="n">g</span> <span class="n">s</span><span class="p">)</span> <span class="s2">"Parse error"</span><span class="o">))</span></code></pre></figure>
<p>The more difficult examples from the paper involving if-then/if-then-else/match can also be handled by adding left and right bias at appropriate points:</p>
<figure class="highlight"><pre><code class="language-fsharp" data-lang="fsharp"><span class="k">let</span> <span class="n">prefixGrammar</span> <span class="p">=</span>
<span class="p">[</span>
<span class="s2">"S"</span><span class="p">,</span><span class="nc">AltL</span><span class="p">(</span>
<span class="nc">Str</span> <span class="s2">"n"</span><span class="p">,</span>
<span class="nc">AltL</span><span class="p">(</span>
<span class="nc">SeqR</span><span class="p">(</span><span class="nc">Str</span> <span class="s2">"if "</span><span class="p">,</span> <span class="nc">SeqR</span><span class="p">(</span><span class="nc">Sym</span> <span class="s2">"S"</span><span class="p">,</span> <span class="nc">SeqR</span><span class="p">(</span><span class="nc">Str</span> <span class="s2">" then "</span><span class="p">,</span> <span class="nc">Sym</span> <span class="s2">"S"</span><span class="o">))),</span>
<span class="nc">SeqL</span><span class="p">(</span><span class="nc">Sym</span> <span class="s2">"S"</span><span class="p">,</span><span class="nc">SeqL</span><span class="p">(</span><span class="nc">Str</span> <span class="s2">" + "</span><span class="p">,</span> <span class="nc">Sym</span> <span class="s2">"S"</span><span class="o">))))</span>
<span class="p">]</span> <span class="p">|></span> <span class="nn">Map</span><span class="p">.</span><span class="n">ofList</span>
<span class="k">let</span> <span class="n">danglingElseGrammar</span> <span class="p">=</span>
<span class="p">[</span>
<span class="s2">"S"</span><span class="p">,</span><span class="nc">AltL</span><span class="p">(</span>
<span class="nc">Str</span> <span class="s2">"n"</span><span class="p">,</span>
<span class="nc">AltL</span><span class="p">(</span>
<span class="nc">SeqR</span><span class="p">(</span><span class="nc">Str</span> <span class="s2">"if "</span><span class="p">,</span> <span class="nc">SeqR</span><span class="p">(</span><span class="nc">Sym</span> <span class="s2">"S"</span><span class="p">,</span> <span class="nc">SeqR</span><span class="p">(</span><span class="nc">Str</span> <span class="s2">" then "</span><span class="p">,</span> <span class="nc">Sym</span> <span class="s2">"S"</span><span class="o">))),</span>
<span class="nc">SeqR</span><span class="p">(</span><span class="nc">Str</span> <span class="s2">"if "</span><span class="p">,</span> <span class="nc">SeqR</span><span class="p">(</span><span class="nc">Sym</span> <span class="s2">"S"</span><span class="p">,</span> <span class="nc">SeqR</span><span class="p">(</span><span class="nc">Str</span> <span class="s2">" then "</span><span class="p">,</span> <span class="nc">SeqR</span><span class="p">(</span><span class="nc">Sym</span> <span class="s2">"S"</span><span class="p">,</span> <span class="nc">SeqR</span><span class="p">(</span><span class="nc">Str</span> <span class="s2">" else "</span><span class="p">,</span> <span class="nc">Sym</span> <span class="s2">"S"</span><span class="o">)))))))</span>
<span class="p">]</span> <span class="p">|></span> <span class="nn">Map</span><span class="p">.</span><span class="n">ofList</span>
<span class="k">let</span> <span class="n">longestMatchGrammar</span> <span class="p">=</span>
<span class="p">[</span>
<span class="s2">"S"</span><span class="p">,</span><span class="nc">AltL</span><span class="p">(</span>
<span class="nc">Str</span> <span class="s2">"n"</span><span class="p">,</span>
<span class="nc">SeqR</span><span class="p">(</span><span class="nc">Str</span> <span class="s2">"match "</span><span class="p">,</span> <span class="nc">SeqR</span><span class="p">(</span><span class="nc">Sym</span> <span class="s2">"S"</span><span class="p">,</span> <span class="nc">SeqR</span><span class="p">(</span><span class="nc">Str</span> <span class="s2">" with "</span><span class="p">,</span> <span class="nc">Sym</span> <span class="s2">"P+"</span><span class="o">))));</span>
<span class="s2">"P+"</span><span class="p">,</span><span class="nc">AltL</span><span class="p">(</span>
<span class="nc">Sym</span> <span class="s2">"P"</span><span class="p">,</span>
<span class="nc">SeqL</span><span class="p">(</span><span class="nc">Sym</span> <span class="s2">"P"</span><span class="p">,</span> <span class="nc">SeqL</span><span class="p">(</span><span class="nc">Str</span> <span class="s2">" "</span><span class="p">,</span> <span class="nc">Sym</span> <span class="s2">"P+"</span><span class="o">)));</span>
<span class="s2">"P"</span><span class="p">,</span><span class="nc">SeqL</span><span class="p">(</span><span class="nc">Str</span> <span class="s2">"id -> "</span><span class="p">,</span> <span class="nc">Sym</span> <span class="s2">"S"</span><span class="p">)</span>
<span class="p">]</span> <span class="p">|></span> <span class="nn">Map</span><span class="p">.</span><span class="n">ofList</span> </code></pre></figure>
<p>Testing gives:</p>
<figure class="highlight"><pre><code class="language-fsharp" data-lang="fsharp"><span class="p">></span> <span class="n">test</span> <span class="n">prefixGrammar</span> <span class="p">[</span><span class="s2">"n + if n then n + n"</span><span class="o">];;</span>
<span class="n">n</span> <span class="o">+</span> <span class="k">if</span> <span class="n">n</span> <span class="k">then</span> <span class="n">n</span> <span class="o">+</span> <span class="n">n</span> <span class="o">==></span> <span class="nc">S</span><span class="p">[</span><span class="n">n</span><span class="p">]</span> <span class="o">+</span> <span class="nc">S</span><span class="p">[</span><span class="k">if</span> <span class="nc">S</span><span class="p">[</span><span class="n">n</span><span class="p">]</span> <span class="k">then</span> <span class="nc">S</span><span class="p">[</span><span class="nc">S</span><span class="p">[</span><span class="n">n</span><span class="p">]</span> <span class="o">+</span> <span class="nc">S</span><span class="p">[</span><span class="n">n</span><span class="o">]]]</span>
<span class="p">></span> <span class="n">test</span> <span class="n">danglingElseGrammar</span> <span class="p">[</span><span class="s2">"if n then if n then n else if n then n else n"</span><span class="o">];;</span>
<span class="k">if</span> <span class="n">n</span> <span class="k">then</span> <span class="k">if</span> <span class="n">n</span> <span class="k">then</span> <span class="n">n</span> <span class="k">else</span> <span class="k">if</span> <span class="n">n</span> <span class="k">then</span> <span class="n">n</span> <span class="k">else</span> <span class="n">n</span>
<span class="o">==></span> <span class="k">if</span> <span class="nc">S</span><span class="p">[</span><span class="n">n</span><span class="p">]</span> <span class="k">then</span> <span class="nc">S</span><span class="p">[</span><span class="k">if</span> <span class="nc">S</span><span class="p">[</span><span class="n">n</span><span class="p">]</span> <span class="k">then</span> <span class="nc">S</span><span class="p">[</span><span class="n">n</span><span class="p">]</span> <span class="k">else</span> <span class="nc">S</span><span class="p">[</span><span class="k">if</span> <span class="nc">S</span><span class="p">[</span><span class="n">n</span><span class="p">]</span> <span class="k">then</span> <span class="nc">S</span><span class="p">[</span><span class="n">n</span><span class="p">]</span> <span class="k">else</span> <span class="nc">S</span><span class="p">[</span><span class="n">n</span><span class="o">]]]</span>
<span class="p">></span> <span class="n">test</span> <span class="n">longestMatchGrammar</span> <span class="p">[</span><span class="s2">"match n with id -> match n with id -> n id -> n"</span><span class="o">];;</span>
<span class="k">match</span> <span class="n">n</span> <span class="k">with</span> <span class="n">id</span> <span class="p">-></span> <span class="k">match</span> <span class="n">n</span> <span class="k">with</span> <span class="n">id</span> <span class="p">-></span> <span class="n">n</span> <span class="n">id</span> <span class="p">-></span> <span class="n">n</span>
<span class="o">==></span> <span class="k">match</span> <span class="nc">S</span><span class="p">[</span><span class="n">n</span><span class="p">]</span> <span class="k">with</span> <span class="nc">P</span><span class="o">+[</span><span class="nc">P</span><span class="p">[</span><span class="n">id</span> <span class="p">-></span> <span class="nc">S</span><span class="p">[</span><span class="k">match</span> <span class="nc">S</span><span class="p">[</span><span class="n">n</span><span class="p">]</span> <span class="k">with</span> <span class="nc">P</span><span class="o">+[</span><span class="nc">P</span><span class="p">[</span><span class="n">id</span> <span class="p">-></span> <span class="nc">S</span><span class="p">[</span><span class="n">n</span><span class="o">]]</span> <span class="nc">P</span><span class="o">+[</span><span class="nc">P</span><span class="p">[</span><span class="n">id</span> <span class="p">-></span> <span class="nc">S</span><span class="p">[</span><span class="n">n</span><span class="o">]]]]]]]</span></code></pre></figure>
<p>Note that at most points in those grammars, it doesn’t matter whether we use left bias (SeqL) or right bias (SeqR), because those parts of the grammar are already unambiguous. For instance, <code class="highlighter-rouge">(X '+')</code> can be parsed in only one way, namely, for a given input string <code class="highlighter-rouge">"something+"</code>, the <code class="highlighter-rouge">"+"</code> will be consumed by <code class="highlighter-rouge">'+'</code> and the rest will be consumed by <code class="highlighter-rouge">X</code>. This toy parser only supports SeqL and SeqR, so we must choose either <code class="highlighter-rouge">(X < '+')</code> or <code class="highlighter-rouge">(X > '+')</code>, but for a real implementation we’d want to add a version of Seq written <code class="highlighter-rouge">(X '+')</code> that creates a parse forest with all alternatives, and/or raises an error if there is more than one parse tree, telling us that we need to disambiguate that particular sequential composition.</p>
<p>For repetition <code class="highlighter-rouge">Y = A*</code>, this method of disambiguation can support leftmost-longest, leftmost-shortest, rightmost-longest, and rightmost-shortest:</p>
<figure class="highlight"><pre><code class="language-fsharp" data-lang="fsharp"><span class="nc">Y</span> <span class="p">-></span> <span class="err">ε</span> <span class="p">|</span> <span class="nc">A</span> <span class="p">></span> <span class="nc">Y</span> <span class="c1">// leftmost-longest</span>
<span class="nc">Y</span> <span class="p">-></span> <span class="err">ε</span> <span class="p">|</span> <span class="nc">A</span> <span class="p"><</span> <span class="nc">Y</span> <span class="c1">// leftmost-shortest</span>
<span class="nc">Y</span> <span class="p">-></span> <span class="err">ε</span> <span class="p">|</span> <span class="nc">Y</span> <span class="p"><</span> <span class="nc">A</span> <span class="c1">// rightmost-longest</span>
<span class="nc">Y</span> <span class="p">-></span> <span class="err">ε</span> <span class="p">|</span> <span class="nc">Y</span> <span class="p">></span> <span class="nc">A</span> <span class="c1">// rightmost-shortest</span></code></pre></figure>
<p>Note that <code class="highlighter-rouge"><</code> and <code class="highlighter-rouge">></code> are not associative, <code class="highlighter-rouge">A < (A < A)</code> is not the same as <code class="highlighter-rouge">(A < A) < A</code> for <code class="highlighter-rouge">A -> 'aa' | 'aaa' | 'b' | 'abb' | 'b' | 'bb'</code> on the input “aaabbb”. The former puts more priority on making the first <code class="highlighter-rouge">A</code> as short as possible, whereas the latter puts more priority on making the last <code class="highlighter-rouge">A</code> as long as possible, hence the difference between leftmost-longest and rightmost-shortest, and between leftmost-shortest and rightmost-longest:</p>
<figure class="highlight"><pre><code class="language-fsharp" data-lang="fsharp"><span class="k">let</span> <span class="n">nonAssociativeGrammar1</span> <span class="p">=</span>
<span class="p">[</span>
<span class="s2">"S"</span><span class="p">,</span><span class="nc">SeqL</span><span class="p">(</span><span class="nc">Sym</span> <span class="s2">"A"</span><span class="p">,</span> <span class="nc">SeqL</span><span class="p">(</span><span class="nc">Sym</span> <span class="s2">"A"</span><span class="p">,</span> <span class="nc">Sym</span> <span class="s2">"A"</span><span class="o">));</span>
<span class="s2">"A"</span><span class="p">,</span><span class="nc">AltL</span><span class="p">(</span><span class="nc">AltL</span><span class="p">(</span><span class="nc">Str</span> <span class="s2">"aa"</span><span class="p">,</span> <span class="nc">Str</span> <span class="s2">"aaa"</span><span class="o">),</span> <span class="nc">AltL</span><span class="p">(</span><span class="nc">AltL</span><span class="p">(</span><span class="nc">Str</span> <span class="s2">"b"</span><span class="p">,</span> <span class="nc">Str</span> <span class="s2">"abb"</span><span class="o">),</span> <span class="nc">AltL</span><span class="p">(</span><span class="nc">Str</span> <span class="s2">"b"</span><span class="p">,</span> <span class="nc">Str</span> <span class="s2">"bb"</span><span class="o">)))</span>
<span class="p">]</span> <span class="p">|></span> <span class="nn">Map</span><span class="p">.</span><span class="n">ofList</span>
<span class="k">let</span> <span class="n">nonAssociativeGrammar2</span> <span class="p">=</span>
<span class="p">[</span>
<span class="s2">"S"</span><span class="p">,</span><span class="nc">SeqL</span><span class="p">(</span><span class="nc">SeqL</span><span class="p">(</span><span class="nc">Sym</span> <span class="s2">"A"</span><span class="p">,</span> <span class="nc">Sym</span> <span class="s2">"A"</span><span class="o">),</span> <span class="nc">Sym</span> <span class="s2">"A"</span><span class="o">);</span>
<span class="s2">"A"</span><span class="p">,</span><span class="nc">AltL</span><span class="p">(</span><span class="nc">AltL</span><span class="p">(</span><span class="nc">Str</span> <span class="s2">"aa"</span><span class="p">,</span> <span class="nc">Str</span> <span class="s2">"aaa"</span><span class="o">),</span> <span class="nc">AltL</span><span class="p">(</span><span class="nc">AltL</span><span class="p">(</span><span class="nc">Str</span> <span class="s2">"b"</span><span class="p">,</span> <span class="nc">Str</span> <span class="s2">"abb"</span><span class="o">),</span> <span class="nc">AltL</span><span class="p">(</span><span class="nc">Str</span> <span class="s2">"b"</span><span class="p">,</span> <span class="nc">Str</span> <span class="s2">"bb"</span><span class="o">)))</span>
<span class="p">]</span> <span class="p">|></span> <span class="nn">Map</span><span class="p">.</span><span class="n">ofList</span>
<span class="p">></span> <span class="n">test</span> <span class="n">nonAssociativeGrammar1</span> <span class="p">[</span><span class="s2">"aaabbb"</span><span class="o">];;</span>
<span class="n">aaabbb</span> <span class="o">==></span> <span class="nc">A</span><span class="p">[</span><span class="n">aa</span><span class="p">]</span><span class="nc">A</span><span class="p">[</span><span class="n">abb</span><span class="p">]</span><span class="nc">A</span><span class="p">[</span><span class="n">b</span><span class="p">]</span>
<span class="p">></span> <span class="n">test</span> <span class="n">nonAssociativeGrammar2</span> <span class="p">[</span><span class="s2">"aaabbb"</span><span class="o">];;</span>
<span class="n">aaabbb</span> <span class="o">==></span> <span class="nc">A</span><span class="p">[</span><span class="n">aaa</span><span class="p">]</span><span class="nc">A</span><span class="p">[</span><span class="n">b</span><span class="p">]</span><span class="nc">A</span><span class="p">[</span><span class="n">bb</span><span class="p">]</span></code></pre></figure>
<p>A CYK parser is not great, but any parser that can produce a parse forest annotated with an input range <code class="highlighter-rouge">i..j</code> for each node in the parse forest can be modified to support this kind of disambiguation. This method has no problems with filtering too much or too little, since it always produces a single parse tree, and works for any context free grammar. The question is whether biased choice and left and right biased sequential composition are enough to express all the disambiguation we want to do in practice. It might be that the disambiguation we want can be expressed by filtering certain tree patterns out of the parse forest, but can’t be expressed by inserting <code class="highlighter-rouge"><</code> and <code class="highlighter-rouge">></code>. In those cases we still have to rewrite the grammar to make it produce the parse tree we want.</p>
Wed, 10 Apr 2019 00:00:00 +0000
http://julesjacobs.github.io/2019/04/10/disambiguating_cfgs.html
http://julesjacobs.github.io/2019/04/10/disambiguating_cfgs.htmlBayes’ rule from minimum relative entropy, and an alternative derivation of variational inference<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" lang="" xml:lang="">
<head>
<meta charset="utf-8" />
<meta name="generator" content="pandoc" />
<meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes" />
<title>Bayes’ rule from minimum relative entropy, and an alternative derivation of variational inference</title>
<style>
code{white-space: pre-wrap;}
span.smallcaps{font-variant: small-caps;}
span.underline{text-decoration: underline;}
div.column{display: inline-block; vertical-align: top; width: 50%;}
</style>
<script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js?config=TeX-AMS_CHTML-full" type="text/javascript"></script>
<!--[if lt IE 9]>
<script src="//cdnjs.cloudflare.com/ajax/libs/html5shiv/3.7.3/html5shiv-printshiv.min.js"></script>
<![endif]-->
</head>
<body>
<p>In Bayesian inference our goal is to compute the posterior distribution <span class="math display">\[\begin{aligned}
Posterior(\theta) & =\frac{P(x^{*},\theta)}{\int P(x^{*},\theta)d\theta}\end{aligned}\]</span> where <span class="math inline">\(P(x,\theta)\)</span> is the joint distribution, and <span class="math inline">\(x=x^{*}\)</span> is the observed value of <span class="math inline">\(x\)</span>, see <a href="http://julesjacobs.github.io/2019/03/24/bayes-simply.html">the previous post about Bayes’ rule</a>. The trouble with this is the integral in the denominator, which is too difficult to compute for most models. Variational inference is one approach to compute an approximate posterior by solving an optimisation problem instead of an integral. Instead of computing <span class="math inline">\(Posterior(\theta)\)</span> exactly, we choose an easy family of distributions <span class="math inline">\(D\subset\mathbb{D}\)</span>, which is a subset of all distributions <span class="math inline">\(\mathbb{D}\)</span> on <span class="math inline">\(\theta\)</span>, and then pick <span class="math inline">\(Q\in D\)</span> that minimises the relative entropy to the true posterior: <span class="math display">\[\begin{aligned}
\min_{Q\in D} & D(Q||Posterior)\end{aligned}\]</span> If we minimise over all distributions <span class="math inline">\(\mathbb{D}\)</span>, then this will give us <span class="math inline">\(Q=Posterior\)</span>, but if we minimise only over a subset of all distributions <span class="math inline">\(D\subset\mathbb{D}\)</span>, then we’ll only get an approximation. So how does this help? Don’t we need to compute the true <span class="math inline">\(Posterior\)</span> anyway, in order to even set up this minimisation problem? It turns out that we don’t. We can rewrite the relative entropy as follows: <span class="math display">\[\begin{aligned}
D(Q||Posterior) & =\mathbb{E}_{\theta\sim Q}[\log\frac{Q(\theta)}{Posterior(\theta)}]\\
& =\mathbb{E}_{\theta\sim Q}[\log\frac{Q(\theta)}{P(x^{*},\theta)}]+\log\int P(x^{*},\theta)d\theta\end{aligned}\]</span> The difficult integral pops out of the logarithm as an additive constant, so for the sake of the minimisation problem it doesn’t matter: <span class="math display">\[\begin{aligned}
\min_{Q\in D} & D(Q||Posterior)=\min_{Q\in D}\mathbb{E}_{\theta\sim Q}[\log\frac{Q(\theta)}{P(x^{*},\theta)}]\end{aligned}\]</span> The right hand side is called the ELBO, the evidence lower bound. You may ask how this problem is any easier, because the expectation is still a difficult integral. In general it is still difficult, but it becomes easy if we choose our family of distributions <span class="math inline">\(D\)</span> right. Usually the model we’re talking about has a vector of parameters <span class="math inline">\(\theta=(\theta_{1},\dots,\theta_{n})\)</span>, and we choose a distribution <span class="math inline">\(Q(\theta)=Q_{1}(\theta_{1})\cdots Q_{n}(\theta_{n})\)</span> that factorises, and usually <span class="math inline">\(P(x^{*},\theta)\)</span> comes from a graphical model, so it factorises as well. The <span class="math inline">\(\log\)</span> turns this into a sum of terms, and for each of those terms it’s (hopefully) easy to compute the expectation in closed form. We can then solve the minimisation problem using gradient descent, or a similar algorithm.</p>
<h2 id="bayes-rule-from-minimum-relative-entropy" class="unnumbered">Bayes’ rule from minimum relative entropy</h2>
<p>Instead of finding <span class="math inline">\(Q\)</span> as an approximation to the posterior, we’re instead going to show that the posterior itself already is a solution to a miminisation problem. The problem is this: we have the model <span class="math inline">\(P(x,\theta)\)</span> and ask ourselves <strong>what’s the distribution <span class="math inline">\(Q(x,\theta)\)</span> closest to <span class="math inline">\(P(x,\theta)\)</span>, where <span class="math inline">\(Q\)</span> is any distribution that puts all probability mass on <span class="math inline">\(x=x^{*}\)</span>?</strong> If we interpret “closest” as “with minimum relative entropy”, then <span class="math inline">\(Q\)</span> is precisely the Bayesian posterior. Let me show you. The <span class="math inline">\(Q\)</span> we’re trying to find is <span class="math display">\[\begin{aligned}
\min_{Q\in D_{x^{*}}} & D(Q||P)\end{aligned}\]</span> where <span class="math inline">\(D_{x^{*}}\)</span> is the set of distributions that put all probability mass of <span class="math inline">\(Q(x,\theta)\)</span> on <span class="math inline">\(x=x^{*}\)</span>. In other words, <span class="math inline">\(Q(x,\theta)=\delta(x-x^{*})Q(\theta)\)</span> where <span class="math inline">\(\delta\)</span> is the Dirac delta measure. Since <span class="math inline">\(D(Q||P)=\mathbb{E}_{x,\theta\sim Q}[\log\frac{Q(x,\theta)}{P(x,\theta)}]=\mathbb{E}_{\theta\sim Q}[\log\frac{Q(\theta)}{P(x^{*},\theta)}]\)</span>, we have indeed <span class="math display">\[\begin{aligned}
Posterior & =argmin_{Q\in\mathbb{D}}\mathbb{E}_{\theta\sim Q}[\log\frac{Q(\theta)}{P(x^{*},\theta)}]\end{aligned}\]</span> We have derived Bayes’ rule from the principle of minimum relative entropy. Note that the term on the right hand side is precisely the ELBO of the previous section.</p>
<h2 id="an-alternative-derivation-of-variational-inference" class="unnumbered">An alternative derivation of variational inference</h2>
<p>By turning the true posterior into a minimisation problem, we have an alternative motivation for variational inference. Instead of minimising over all distributions to get the true posterior, minmise over an easy family to get an approximation to the posterior. This sounds similar to the previous motivation, but it’s subtly different. In the first motivation we used Bayes’ rule to get the true posterior, and then used relative entropy to look for a distribution <span class="math inline">\(Q\)</span> that approximates the posterior, and then derived the ELBO by ignoring an additive constant. In the second motivation we derived the ELBO directly, by using relative entropy to obtain an expression for the true posterior as a minimisation problem. In summary, Bayesian inference answers the question:</p>
<ul>
<li><p><strong>What’s the distribution <span class="math inline">\(Q(x,\theta)\)</span> closest to <span class="math inline">\(P(x,\theta)\)</span>, where <span class="math inline">\(Q\in\mathbb{D}\)</span> is a distribution that puts all probability mass on <span class="math inline">\(x=x^{*}\)</span>?</strong></p></li>
</ul>
<p>Whereas variational inference is the following approximation:</p>
<ul>
<li><p><strong>What’s the distribution <span class="math inline">\(Q(x,\theta)\)</span> closest to <span class="math inline">\(P(x,\theta)\)</span>, where <span class="math inline">\(Q\in D\)</span> is a distribution that puts all probability mass on <span class="math inline">\(x=x^{*}\)</span>?</strong></p></li>
</ul>
<p>For exact Bayesian inference we optimise over the set of all distributions <span class="math inline">\(\mathbb{D}\)</span>, whereas for variational inference we only optimise over some easy family <span class="math inline">\(D\subset\mathbb{D}\)</span>.</p>
<h2 id="maximum-a-posteriori-inference" class="unnumbered">Maximum a posteriori inference</h2>
<p>As a bonus, consider what happens if for our family <span class="math inline">\(D\)</span> we pick the set of distributions <span class="math inline">\(Q_{\theta}\in D\)</span> that put all probability mass on a single point <span class="math inline">\(\theta\)</span>. The expectation <span class="math inline">\(\mathbb{E}_{Q_{\theta}}[\log\frac{Q_{\theta}(\theta)}{P(x^{*},\theta)}]=\log\frac{Q_{\theta}(\theta)}{P(x^{*},\theta)}\)</span> becomes a single term in that case. The numerator is constant <span class="math inline">\(Q_{\theta}(\theta)=1\)</span> because all probability mass is on that <span class="math inline">\(\theta\)</span> (let’s assume <span class="math inline">\(\theta\)</span> is discrete for the sake of argument), so we’re left with <span class="math display">\[\begin{aligned}
\min_{\theta}\log\frac{1}{P(x^{*},\theta)} & =\max_{\theta}\log P(x^{*},\theta)\end{aligned}\]</span> This is MAP inference, so MAP inference is Bayesian variational inference with a particular easy family of distributions.</p>
</body>
</html>
Mon, 25 Mar 2019 00:00:00 +0000
http://julesjacobs.github.io/2019/03/25/bayes-maxent.html
http://julesjacobs.github.io/2019/03/25/bayes-maxent.htmlBayes’ rule simply<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" lang="" xml:lang="">
<head>
<meta charset="utf-8" />
<meta name="generator" content="pandoc" />
<meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes" />
<title>Bayes’ rule simply</title>
<style>
code{white-space: pre-wrap;}
span.smallcaps{font-variant: small-caps;}
span.underline{text-decoration: underline;}
div.column{display: inline-block; vertical-align: top; width: 50%;}
</style>
<script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js?config=TeX-AMS_CHTML-full" type="text/javascript"></script>
<!--[if lt IE 9]>
<script src="//cdnjs.cloudflare.com/ajax/libs/html5shiv/3.7.3/html5shiv-printshiv.min.js"></script>
<![endif]-->
</head>
<body>
<p>Bayes’ rule is usually written <span class="math display">\[\begin{aligned}
P(\theta|x) & =P(x|\theta)\frac{P(\theta)}{P(x)}\end{aligned}\]</span></p>
<p>In practice we’re trying to learn about some model parameter <span class="math inline">\(\theta\)</span> given some observation <span class="math inline">\(x\)</span>. The model <span class="math inline">\(P(x|\theta)\)</span> tells us how observations are influenced by the model parameter. This seems simple enough, but a small change in notation reveals how simple Bayes’ rule is. Let us call <span class="math inline">\(P(\theta)\)</span> the prior on <span class="math inline">\(\theta\)</span> and <span class="math inline">\(P'(\theta)\)</span> the posterior on theta. Then Bayes’ rule says:</p>
<p><span class="math display">\[\begin{aligned}
P'(\theta) & \propto P(x|\theta)P(\theta)\end{aligned}\]</span> We got rid of the denominator <span class="math inline">\(P(x)\)</span> because it’s just a normalisation to make the total probability sum to 1, and say that <span class="math inline">\(P'(\theta)\)</span> is proportional to <span class="math inline">\(P(x|\theta)P(\theta)\)</span>. The value <span class="math inline">\(P(x|\theta)P(\theta)=P(x,\theta)\)</span> is the joint probability of seeing a given pair <span class="math inline">\((x,\theta)\)</span>, so we can also write Bayes’ rule as:</p>
<p><span class="math display">\[\begin{aligned}
P'(\theta)\propto & P(x,\theta)\end{aligned}\]</span> So up to normalisation, the posterior is just substituting the actual observation <span class="math inline">\(X=x\)</span> into the joint distribution. How can we interpret this? Imagine that we have a robot whose current state of belief is given by <span class="math inline">\(P(x,\theta)\)</span> and that <span class="math inline">\(x,\theta\)</span> only have a finite number of possible values, so that the robot has stored a finite number of probabilities <span class="math inline">\(P(x,\theta)\)</span>, one for each pair <span class="math inline">\((x,\theta)\)</span>. Suppose that the robot now learns <span class="math inline">\(X=x\)</span> by observation. What does it do to compute its posterior belief? It first sets <span class="math inline">\(P(y,\theta)=0\)</span> for all <span class="math inline">\(y\neq x\)</span> because the actual observed value is <span class="math inline">\(x\)</span>. Then it renormalises the probabilities to make <span class="math inline">\(P(x,\theta)\)</span> sum to 1 again. That’s all Bayes’ rule is: simply delete the possibilities that are incompatible with the observation, and renormalise the remainder.</p>
</body>
</html>
Sun, 24 Mar 2019 00:00:00 +0000
http://julesjacobs.github.io/2019/03/24/bayes-simply.html
http://julesjacobs.github.io/2019/03/24/bayes-simply.htmlWhy two’s complement works<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" lang="" xml:lang="">
<head>
<meta charset="utf-8" />
<meta name="generator" content="pandoc" />
<meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes" />
<title>Why two’s complement works</title>
<style>
code{white-space: pre-wrap;}
span.smallcaps{font-variant: small-caps;}
span.underline{text-decoration: underline;}
div.column{display: inline-block; vertical-align: top; width: 50%;}
</style>
<script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js?config=TeX-AMS_CHTML-full" type="text/javascript"></script>
<!--[if lt IE 9]>
<script src="//cdnjs.cloudflare.com/ajax/libs/html5shiv/3.7.3/html5shiv-printshiv.min.js"></script>
<![endif]-->
</head>
<body>
<p>There are unsigned integers and signed integers. The first bit in a signed integer indicates whether it is positive or negative. You’d expect that CPUs have different instructions to do arithmetic with signed integers and unsigned integers. That’s partly true: x86 has DIV to divide unsigned integers and IDIV to divide signed integers, but there is only one ADD instruction, only one SUB instruction, and only one MUL instruction. These instructions work for both signed and unsigned integers. How can that be?!</p>
<p>This is the magic of two’s complement. The bit patterns of signed integers are precisely such that ordinary unsigned arithmetic gives the correct sign bit. For four bit numbers the representation is as follows:</p>
<table border=1>
<thead>
<tr class="header">
<th style="text-align: center;">bit pattern</th>
<th style="text-align: center;">unsigned value</th>
<th style="text-align: center;">signed value</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td style="text-align: center;">0000</td>
<td style="text-align: center;">0</td>
<td style="text-align: center;">0</td>
</tr>
<tr class="even">
<td style="text-align: center;">0001</td>
<td style="text-align: center;">1</td>
<td style="text-align: center;">1</td>
</tr>
<tr class="odd">
<td style="text-align: center;">0010</td>
<td style="text-align: center;">2</td>
<td style="text-align: center;">2</td>
</tr>
<tr class="even">
<td style="text-align: center;">0011</td>
<td style="text-align: center;">3</td>
<td style="text-align: center;">3</td>
</tr>
<tr class="odd">
<td style="text-align: center;">0100</td>
<td style="text-align: center;">4</td>
<td style="text-align: center;">4</td>
</tr>
<tr class="even">
<td style="text-align: center;">0101</td>
<td style="text-align: center;">5</td>
<td style="text-align: center;">5</td>
</tr>
<tr class="odd">
<td style="text-align: center;">0110</td>
<td style="text-align: center;">6</td>
<td style="text-align: center;">6</td>
</tr>
<tr class="even">
<td style="text-align: center;">0111</td>
<td style="text-align: center;">7</td>
<td style="text-align: center;">7</td>
</tr>
<tr class="odd">
<td style="text-align: center;">1000</td>
<td style="text-align: center;">8</td>
<td style="text-align: center;">-8</td>
</tr>
<tr class="even">
<td style="text-align: center;">1001</td>
<td style="text-align: center;">9</td>
<td style="text-align: center;">-7</td>
</tr>
<tr class="odd">
<td style="text-align: center;">1010</td>
<td style="text-align: center;">10</td>
<td style="text-align: center;">-6</td>
</tr>
<tr class="even">
<td style="text-align: center;">1011</td>
<td style="text-align: center;">11</td>
<td style="text-align: center;">-5</td>
</tr>
<tr class="odd">
<td style="text-align: center;">1100</td>
<td style="text-align: center;">12</td>
<td style="text-align: center;">-4</td>
</tr>
<tr class="even">
<td style="text-align: center;">1101</td>
<td style="text-align: center;">13</td>
<td style="text-align: center;">-3</td>
</tr>
<tr class="odd">
<td style="text-align: center;">1110</td>
<td style="text-align: center;">14</td>
<td style="text-align: center;">-2</td>
</tr>
<tr class="even">
<td style="text-align: center;">1111</td>
<td style="text-align: center;">15</td>
<td style="text-align: center;">-1</td>
</tr>
</tbody>
</table>
<p>The operation +1 goes forward by one step in this table, and wraps around from the end to the start. The operation -1 goes in reverse. If we start with the value 0000 and subtract 1, we end up with 1111. For signed values that bit pattern indeed represents -1. If we keep subtracing -1’s, it goes back to -8, which is the minimum representable signed integer, and then wraps around to +7. You see that the operations +1 and -1 work exactly the same on a bit pattern, regardless of whether it represents a signed or unsigned value. The only difference is the meaning of the bit patterns: 1111 represents 15 if it’s an unsigned value, and -1 if it’s a signed value. So if we <em>print</em> an integer, we need to know whether it’s signed or unsigned, but as long as we only do +1 and -1 we don’t need to know whether it’s signed or unsigned.</p>
<p>Adding bigger amounts, say, +3, is independent of whether it’s signed or unsigned too, because adding +3 is the same as +1+1+1, so if adding +1 is independent of whether it’s signed or unsigned, then +3 is too. Try an example: if we start with 1001 and add +2 we end up with 1011. As an unsigned value, that was 9+2=11, and with a signed value, that was -7+2=-5. Both correct! Similarly, subtracting bigger amounts, or adding negative amounts, also works. Multiplication be expressed as repeated addition, so that’s independent too. This does <em>not</em> work for division, because division cannot be expressed as repeated addition of +1. That’s why x86 has separate DIV and IDIV instructions</p>
<p>Technically, that argument only shows that addition <span class="math inline">\(a+b\)</span>, subtraction <span class="math inline">\(a-b\)</span>, and multiplication <span class="math inline">\(a\cdot b\)</span> is independent of whether <span class="math inline">\(a\)</span> is signed or unsigned. You’ll probably not be suprised that it’s also independent of whether <span class="math inline">\(b\)</span> is signed or unsigned. If you’re familiar with modular arithmetic, this is because <span class="math inline">\((a\mod16)+(b\mod16)=(a+b)\mod16\)</span> and similarly for subtraction and multiplication. The only difference between signed and unsigned is which representative from <span class="math inline">\(\mathbb{Z}\)</span> we assign to each equivalence class in <span class="math inline">\(\mathbb{Z}/16\mathbb{Z}\)</span>. This means that all the laws that hold in modular arithmetic, such as <span class="math inline">\(a\cdot(b+c)=a\cdot b+a\cdot c\)</span> hold for both unsigned and signed arithmetic, even in the presence of overflow! That's good news for compilers; it allows them to optimise arithmetic without worrying about overflow.</p>
</body>
</html>
Wed, 20 Mar 2019 00:00:00 +0000
http://julesjacobs.github.io/2019/03/20/why-twos-complement-works.html
http://julesjacobs.github.io/2019/03/20/why-twos-complement-works.htmlLeapfrog and Verlet are the same method<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" lang="" xml:lang="">
<head>
<meta charset="utf-8" />
<meta name="generator" content="pandoc" />
<meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes" />
<title>Leapfrog and Verlet are the same method</title>
<style>
code{white-space: pre-wrap;}
span.smallcaps{font-variant: small-caps;}
span.underline{text-decoration: underline;}
div.column{display: inline-block; vertical-align: top; width: 50%;}
</style>
<script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js?config=TeX-AMS_CHTML-full" type="text/javascript"></script>
<!--[if lt IE 9]>
<script src="//cdnjs.cloudflare.com/ajax/libs/html5shiv/3.7.3/html5shiv-printshiv.min.js"></script>
<![endif]-->
</head>
<body>
<p>Leapfrog and Verlet are two popular methods to integrate Newton’s equations of motion in physics simulations and games. These methods occupy a sweet spot between Euler’s method (the simplest method) and higher order methods. They are almost as simple as Euler’s method and use only one force calculation per time step, yet they have crucial advantages: they are second order accurate (compared Euler’s method which is of order one), and they are symplectic and time reversible. Because they have such similar properties I wondered if Leapfrog and Verlet are actually the same method – after all, how many second order sympletic methods can there be? It turns out that they are indeed equivalent, a fact apparently well known among numerical methods people. Their equivalence is easier to understand with code than with math notation, so that’s what I’ll do here.</p>
<p>I’m assuming that we have a function <span class="math inline">\(a(x)\)</span> to compute the forces/accelerations from the positions, and <span class="math inline">\(x\)</span> and <span class="math inline">\(v\)</span> are initialised to the initial positions and velocities. Here is Verlet:</p>
<pre><code>for(i = 0..n){
x_prev = x
x += v*dt + a(x)*dt/2
v += (a(x_prev) + a(x))*dt/2
}</code></pre>
<p>And here is Leapfrog:</p>
<pre><code>for(i = 0..n){
x += v*dt
v += a(x)*dt
}</code></pre>
<p>At first sight there’s no way that they could be equivalent, because Verlet computes <span class="math inline">\(x(t)\)</span> and <span class="math inline">\(v(t)\)</span> at multiples of the time step <span class="math inline">\(t=0,1,2,\dots\)</span>, whereas Leapfrog computes <span class="math inline">\(x(t)\)</span> at <span class="math inline">\(t=0,1,2,\dots\)</span> and <span class="math inline">\(v(t)\)</span> at <span class="math inline">\(t=0+\frac{1}{2},1+\frac{1}{2},2+\frac{1}{2},\dots\)</span> shifted one half step from each other (that’s why it’s called Leapfrog: the <span class="math inline">\(x\)</span> and <span class="math inline">\(v\)</span> values leapfrog over each other). However, at the start of the simulation we’re given <span class="math inline">\(x(0)\)</span> and <span class="math inline">\(v(0)\)</span>, whereas Leapfrog requires <span class="math inline">\(x(0)\)</span> and <span class="math inline">\(v(\frac{1}{2})\)</span>. To get Leapfrog started we must compute <span class="math inline">\(v(\frac{1}{2})\)</span> first, which we can do with one step of Euler’s method with <span class="math inline">\(\Delta t=\frac{1}{2}\)</span>. Similarly, at the end we’ve got <span class="math inline">\(x(n)\)</span> and <span class="math inline">\(v(n+\frac{1}{2})\)</span> whereas we’d like to know <span class="math inline">\(v(n)\)</span>, which we can do by doing one step of Euler’s method backward with <span class="math inline">\(\Delta t=-\frac{1}{2}\)</span>. That gives us the corrected Leapfrog:</p>
<pre><code>v += a(x)*dt/2
for(i = 0..n){
x += v*dt
v += a(x)*dt
}
v -= a(x)*dt/2</code></pre>
<p>Let’s rewrite this in an apparently silly way by splitting the <span class="math inline">\(v\)</span> update in half:</p>
<pre><code>v += a(x)*dt/2
for(i = 0..n){
x += v*dt
v += a(x)*dt/2
v += a(x)*dt/2
}
v -= a(x)*dt/2</code></pre>
<p>Think about what this is doing by unrolling this loop in your mind:</p>
<pre><code>v += a(x)*dt/2
x += v*dt
v += a(x)*dt/2
v += a(x)*dt/2
x += v*dt
v += a(x)*dt/2
v += a(x)*dt/2
... n times ...
x += v*dt
v += a(x)*dt/2
v += a(x)*dt/2
v -= a(x)*dt/2</code></pre>
<p>The last two updates cancel each other out, so we can remove both. A different way of looking at it emerges:</p>
<pre><code>v += a(x)*dt/2
x += v*dt
v += a(x)*dt/2
v += a(x)*dt/2
x += v*dt
v += a(x)*dt/2
... n times ...
v += a(x)*dt/2
x += v*dt
v += a(x)*dt/2</code></pre>
<p>In other words, the Leapfrog code is equivalent to this:</p>
<pre><code>for(i = 0..n){
v += a(x)*dt/2
x += v*dt
v += a(x)*dt/2
}</code></pre>
<p>An iteration of this loop is exactly the same as an iteration of Verlet! Can you see why? (Hint: incorporate the first <span class="math inline">\(v\)</span> update into the subsequent <span class="math inline">\(x\)</span> and <span class="math inline">\(v\)</span> updates).</p>
<h2 id="second-variant" class="unnumbered">Second variant</h2>
<p>Instead of advancing <span class="math inline">\(v\)</span> by half a timestep at the start and end of Leapfrog, we could also advance <span class="math inline">\(x\)</span> by half a timestep to get the second variant of Leapfrog:</p>
<pre><code>x += v*dt/2
for(i=0 to n){
v += a(x)*dt
x += v*dt
}
x -= v*dt/2</code></pre>
<p>We can do the same rewrite and move everything into the loop:</p>
<pre><code>for(i = 0..n){
x += v*dt/2
v += a(x)*dt
x += v*dt/2
}</code></pre>
<p>This variant has the advantage that <span class="math inline">\(a(x)\)</span> is only computed once per iteration. We don’t <em>really</em> need to compute <span class="math inline">\(a(x)\)</span> twice in the previous variant of Leapfrog-Verlet either, because the second call to <span class="math inline">\(a(x)\)</span> will be the same as the first call in the subsequent iteration, so we could save that instead of recomputing it. That complicates the code a bit, so I find this second variant nicer. By incorporating the first <span class="math inline">\(x\)</span> update into the subsequent <span class="math inline">\(v\)</span> and <span class="math inline">\(x\)</span> updates we obtain the second variant of Verlet:</p>
<pre><code>for(i = 0..n){
v_prev = v
v += a(x+v*dt/2)*dt
x += (v_prev + v)*dt/2
}</code></pre>
<p>This variant of Verlet also has the advantage of only computing <span class="math inline">\(a(x)\)</span> once. For some reason the other variant seems to be <a href="https://en.wikipedia.org/wiki/Verlet_integration#Velocity_Verlet">more popular</a>.</p>
<h2 id="conclusion" class="unnumbered">Conclusion</h2>
<p>In my opinion, the best way to write Leapfrog-Verlet is this:</p>
<pre><code>for(i = 0..n){
x += v*dt/2
v += a(x)*dt
x += v*dt/2
}</code></pre>
<p>The advantage is that it’s pretty, computes both <span class="math inline">\(x\)</span> and <span class="math inline">\(v\)</span> at <span class="math inline">\(t=0,1,2,\dots\)</span>, and doesn’t use any state other than <span class="math inline">\((x,v)\)</span>. The disadvantage is that it updates <span class="math inline">\(x\)</span> twice per iteration, instead of once as Leapfrog does. This is likely to be of negligible cost compared to computing <span class="math inline">\(a(x)\)</span>, but if you really care about it then use the second variant of Leapfrog. Just keep in mind Leapfrog computes <span class="math inline">\(x\)</span> at shifted time steps, so if you use <span class="math inline">\(\frac{1}{2}mv^{2}+U(x)\)</span> to compute the energy you’ll get an incorrect value.</p>
</body>
</html>
Fri, 15 Mar 2019 00:00:00 +0000
http://julesjacobs.github.io/2019/03/15/leapfrog-verlet.html
http://julesjacobs.github.io/2019/03/15/leapfrog-verlet.htmlConsistent execution of imperative reactive programs<p>This is a comment in response to <a href="http://julesjacobs.github.io/2018/02/22/hooks-bring-react-closer-to-frp.html#comment-4358285816">Sandro Magi’s comments on FRP and S.js</a>, turned into a post. S.js lets you create data signals with <code class="highlighter-rouge">x = S.data(initial value)</code>, read them with <code class="highlighter-rouge">x()</code> and write to them with <code class="highlighter-rouge">x(value)</code>. It also lets you create computations with <code class="highlighter-rouge">S(() => {...})</code>. When a signal is read in a computation, the computation is re-run whenever the signal’s value changes. When you write a value to a signal, reading from the signal doesn’t immediately return the new value. Instead, time progresses in ticks, and updates from tick n are only visible in tick n+1. Could we change that behaviour so that reading from a signal always returns its most recent value?</p>
<p>This creates a scheduling problem when the computation imperatively updates other signals, because the order in which we run computations may lead to different results. We could try to run computations in an order such that computations that write to a signal are executed before computations that read a signal. This could be done by running the computations in any order, keepping track of which computations read and write each signal, and if a computation A reads from <code class="highlighter-rouge">x</code> and B then writes to <code class="highlighter-rouge">x</code>, we abort. Then we repeat, now scheduling the computations that write to a signal before the computations that read from it.</p>
<p>I haven’t thought about this too deeply, but one danger is that this process will run in circles even if a consistent schedule exists, or at least take exponentially many restarts to find a consistent schedule. Another issue is that sometimes multiple consistent schedules exist that lead to different outcomes. Imagine a computation A that reads from x and writes y=1 if x==0, and a computation B that reads from y and writes x=1 if y==0. If x=0 and y=0 initially, then both orders AB and BA give a consistent schedule, but lead to different outcomes.</p>
<p>Is an “imperative reactive” programming model therefore doomed? I don’t know.</p>
<p>This is not actually how you write S.js code, by the way. A computation <code class="highlighter-rouge">S(() => {...})</code> returns a signal based on the value that the lambda returns. Unlike when imperatively updating some existing signal, this allows S.js to understand the dataflow graph ahead of time, and schedule updates in the correct order while guaranteeing that each computation is only run once per tick. So the issue that signal reads return the old value usually doesn’t come up.</p>
Sun, 10 Mar 2019 22:00:00 +0000
http://julesjacobs.github.io/2019/03/10/consistent-execution-of-imperative-reactive-programs.html
http://julesjacobs.github.io/2019/03/10/consistent-execution-of-imperative-reactive-programs.htmlProof that the calculus sin and cos functions equal the geometric sin and cos<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" lang="" xml:lang="">
<head>
<meta charset="utf-8" />
<meta name="generator" content="pandoc" />
<meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes" />
<title>Proof that the calculus sin and cos functions equal the geometric sin and cos</title>
<style>
code{white-space: pre-wrap;}
span.smallcaps{font-variant: small-caps;}
span.underline{text-decoration: underline;}
div.column{display: inline-block; vertical-align: top; width: 50%;}
</style>
<script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js?config=TeX-AMS_CHTML-full" type="text/javascript"></script>
<!--[if lt IE 9]>
<script src="//cdnjs.cloudflare.com/ajax/libs/html5shiv/3.7.3/html5shiv-printshiv.min.js"></script>
<![endif]-->
</head>
<body>
<p>The trigonometric functions can be defined in many different ways: as a power series, as a differential equation, using the complex exponential, and so on. That those definitions are equivalent is usually shown in a real analysis course. However, students first learn sin and cos in the context of geometry. The geometric sin and cos can also be defined in many ways, e.g. in terms of angles and side lenghts of triangles, or in terms of a point moving around the unit circle. It is again easy to show that those geometric definitions are equivalent. What’s sometimes missing is a proof that the calculus definitions are equivalent to the geometric definitions. <a href="https://www.khanacademy.org/math/ap-calculus-ab/ab-differentiation-1-new/ab-2-7/a/proving-the-derivatives-of-sinx-and-cosx">Khan academy has a nice proof using areas.</a> That argument is a little bit technical and it is difficult to find that proof. In this post I’ll give an argument using lengths, that is in my opinion a bit easier to come up with and to remember. It is perhaps a bit less ironclad than Khan’s proof, because it requires you to accept facts about arclengths of curves, whereas Khan’s proof only involves comparing areas.</p>
<p>Here we go. We define <span class="math inline">\(\sin\)</span> and <span class="math inline">\(\cos\)</span> as the functions satisfying the differential equation <span class="math display">\[\begin{aligned}
\cos' & =\sin\\
\sin' & =-\cos\end{aligned}\]</span> with initial conditions <span class="math inline">\(\cos(0)=1\)</span> and <span class="math inline">\(\sin(0)=0\)</span>. We show that the point <span class="math inline">\((\cos(t),\sin(t))\)</span> travels around the unit circle at unit speed. To show that the point stays on the unit circle we must show <span class="math inline">\(\cos(t)^{2}+\sin(t)^{2}=1\)</span>. At <span class="math inline">\(t=0\)</span> the equation holds due to the initial conditions. We differentiate <span class="math display">\[\begin{aligned}
(\cos(t)^{2}+\sin(t)^{2})' & =2\cos(t)\cos'(t)+2\sin(t)\sin'(t)\\
& =2\cos(t)\sin(t)-2\sin(t)\cos(t)\\
& =0\end{aligned}\]</span> So the value of <span class="math inline">\(\cos(t)^{2}+\sin(t)^{2}\)</span> doesn’t change, i.e. it stays equal to <span class="math inline">\(1\)</span>. We also need to show that the point <span class="math inline">\((\cos(t),\sin(t))\)</span> moves around the unit circle at unit speed. After all, there are many functions <span class="math inline">\(f,g\)</span> such that the point <span class="math inline">\((f(t),g(t))\)</span> stays on the unit circle that are not equal to <span class="math inline">\(\cos\)</span> and <span class="math inline">\(\sin\)</span>. The speed of the point is <span class="math display">\[\begin{aligned}
\cos'(t)^{2}+\sin'(t)^{2} & =\sin(t)^{2}+(-\cos(t))^{2}\\
& =\sin(t)^{2}+\cos(t)^{2}\\
& =1\end{aligned}\]</span> The point <span class="math inline">\((\cos(t),\sin(t))\)</span> indeed moves around the unit circle at unit speed. This implies that <span class="math inline">\((\cos(t),\sin(t))\)</span> give the coordinates of the point on the unit circle if we take an angle <span class="math inline">\(t\)</span> measured in radians from the <span class="math inline">\(x\)</span>-axis, because that’s how radians are defined. Therefore the calculus <span class="math inline">\(\sin\)</span> and <span class="math inline">\(\cos\)</span> indeed agree with the geometric <span class="math inline">\(\sin\)</span> and <span class="math inline">\(\cos\)</span>.</p>
</body>
</html>
Sun, 10 Mar 2019 00:00:00 +0000
http://julesjacobs.github.io/2019/03/10/sin-and-cos-proof.html
http://julesjacobs.github.io/2019/03/10/sin-and-cos-proof.htmlMath puzzle: how many coins do you need to toss to maximise the probability of getting 10 heads?<p>If you throw less than 10 coins, then the probability of getting 10 heads is zero. If you throw 10 coins then the probability is (1/2)^10. If you throw a million coins then the probability of getting exactly 10 heads is very small. How many coins do you throw to maximise the probability of getting exactly 10 heads?</p>
<p>It’s not difficult, but can you give an intuitive argument?</p>
Sat, 09 Mar 2019 00:00:00 +0000
http://julesjacobs.github.io/2019/03/09/how-many-coins.html
http://julesjacobs.github.io/2019/03/09/how-many-coins.htmlWhat is contification?<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" lang="" xml:lang="">
<head>
<meta charset="utf-8" />
<meta name="generator" content="pandoc" />
<meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes" />
<style>
code{white-space: pre-wrap;}
span.smallcaps{font-variant: small-caps;}
span.underline{text-decoration: underline;}
div.column{display: inline-block; vertical-align: top; width: 50%;}
</style>
<!--[if lt IE 9]>
<script src="//cdnjs.cloudflare.com/ajax/libs/html5shiv/3.7.3/html5shiv-printshiv.min.js"></script>
<![endif]-->
</head>
<body>
<p>Inlining is one of the most important compiler optimisations. The danger with inlining is that it can cause code size explosion, because the function that is inlined is copied to multiple call sites. However, if the function is only called in one place, then code duplication doesn’t occur, because the original function can be deleted after inlining. Another way to look at this transformation is that we remove the call instruction and replace it with a jump to the entry point of the function, and we replace the return instruction inside the function with a jump back to the point where the function is called.</p>
<p>Contification is based on the clever observation that you can do this transformation whenever a function returns to only one location, even if the function is called in multiple places. Here’s the simplest example:</p>
<pre><code>function f(y){
return ...
}
if(cond){
x = f(E1)
}else{
x = f(E2)
}
<REST ...></code></pre>
<p>The function f is called in two places, but the function returns to the same place. The compiler can therefore replace each call to f with a jump to f’s entry point, and replace the return inside f with a jump to the code after the if-else:</p>
<pre><code>label f:
x = ...
goto REST
if(cond){
y = E1
goto f
}else{
y = E2
goto f
}
label REST: <...></code></pre>
<p>This eliminates the function call overhead, but more importantly, it allows other compiler optimisations to kick in.</p>
<p>Analysing whether a function returns to only one place is a little bit tricky. The most powerful contification analyses can reason about tail calls. This allows them to optimise tail recursive functions that are only called in one place:</p>
<pre><code>function loop(x, n){
if(n==0) return ...
else return loop(g(x), n-1)
}
result = loop(a, k)
<REST ...></code></pre>
<p>We actually have two calls to loop, one from the outside and one tail recursive call from loop itself. Intuitively you’ll understand that this pattern is equivalent to an actual loop, and contification can turn this code into an actual loop:</p>
<pre><code>label loop:
if(n==0){
result = ...
goto REST
}else{
x = g(x)
n = n-1
goto loop
}
label REST: <...></code></pre>
<p>It can do this because the contification analysis determines that loop transitively (through tail calls) always returns to the same point. Note that we could have eliminated the two calls to f in the previous example by inlining f twice. Simple inlining truly does not work for the tail recursive example; no amount of inlining will eliminate the recursive call. Contification doesn’t work when there are multiple external calls to loop, so contification is not more general than inlining, and inlining is not more general than contification. Perhaps one can come up with an optimisation that generalises both: a version of inlining that doesn’t inline a single function call, but inlines multiple calls with a single return point simultaneously. In other words, a version of inlining that inlines a function <em>return</em> rather than a function <em>call</em>.</p>
<p>In the general case we have multiple functions, multiple tail calls, and multiple non-tail calls. The contification analysis tries to find out, for each function, if there exists a point in the program that the function always returns to, possibly through zero or more tail calls. If you’re interested in the details, read the paper <a href="https://www.cs.purdue.edu/homes/suresh/502-Fall2008/papers/contification.pdf">Contification Using Dominators</a> by Matthew Fluet and Stephen Weeks. They work with a functional style IR with continutations. In this post I’ve used an imperative IR, which I hope makes the contification transformation a little bit easier to understand: it’s just replacing call and return instructions with jumps. I don’t think that the idea of contification has fully penetrated conventional compilers, but maybe it should.</p>
</body>
</html>
Sat, 09 Mar 2019 00:00:00 +0000
http://julesjacobs.github.io/2019/03/09/what-is-contification.html
http://julesjacobs.github.io/2019/03/09/what-is-contification.htmlCombining probabilities? Try a logarithm.<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" lang="" xml:lang="">
<head>
<meta charset="utf-8" />
<meta name="generator" content="pandoc" />
<meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes" />
<title>Correlated probabilities? Try a logarithm.</title>
<style>
code{white-space: pre-wrap;}
span.smallcaps{font-variant: small-caps;}
span.underline{text-decoration: underline;}
div.column{display: inline-block; vertical-align: top; width: 50%;}
</style>
<script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js?config=TeX-AMS_CHTML-full" type="text/javascript"></script>
<!--[if lt IE 9]>
<script src="//cdnjs.cloudflare.com/ajax/libs/html5shiv/3.7.3/html5shiv-printshiv.min.js"></script>
<![endif]-->
</head>
<body>
<p>Suppose that we have a patient and are trying to determine whether his leg is broken. A general practitioner investigates the patient and a radiologist looks at a scan of his leg. Both doctors give their conclusion in the form of a probability that the leg is broken, <span class="math inline">\(P_{GP}\)</span> and <span class="math inline">\(P_{R}\)</span>. How do we combine these probabilities into one? If these were independent then we could just multiply <span class="math inline">\(P_{GP}\cdot P_{R}\)</span>, but it’s likely that both the GP and the radiologist will have the same opinion, so they are not independent. Furthermore, maybe the opinion of the radiologist is more accurate than the opinion of the general practitioner, because the radiologist looks at a scan of the leg.</p>
<p>We could forget that these are probabilities altogether, and focus on the decision of whether to operate the patient or not. We assign a combined score <span class="math inline">\(f(P_{GP},P_{R})\)</span> in some way, and then look empirically for a decision boundary <span class="math inline">\(f(P_{GP},P_{R})<\alpha\)</span> that gives us a trade-off between the false positive and false negative rate. The question remains which <span class="math inline">\(f\)</span> we should use.</p>
<p>A linear model is usually the first you’d try, <span class="math inline">\(f(P_{GP},P_{R})=aP_{GP}+bP_{R}\)</span>, but I claim that <span class="math inline">\(f(P_{GP},P_{R})=P_{GP}^{a}\cdot P_{R}^{b}\)</span> is more natural. If the probabilities were independent then <span class="math inline">\(a=1,b=1\)</span> would give <span class="math inline">\(P_{GP}\cdot P_{R}\)</span>. Choosing different <span class="math inline">\(a,b\)</span> weighs the opinions, e.g. <span class="math inline">\(a=1/3\)</span>, <span class="math inline">\(b=2/3\)</span>.</p>
<p>This is equivalent to training a linear model on the log probabilities <span class="math inline">\(\log P_{GP}\)</span> and <span class="math inline">\(\log P_{R}\)</span>, because <span class="math inline">\(\log(P_{GP}^{a}\cdot P_{R}^{b})=a\log P_{GP}+b\log P_{R}\)</span>. The log probability is natural from the point of view of information theory: log probability is measured in bits. Probabilities get multiplied, bits get added.</p>
<p>If you’re training a classifier on probabilities, try a logarithm.</p>
</body>
</html>
Thu, 08 Mar 2018 00:00:00 +0000
http://julesjacobs.github.io/2018/03/08/combining-probabilities.html
http://julesjacobs.github.io/2018/03/08/combining-probabilities.html