<HTML>
<HEAD>
<TITLE>SRC Modula-3: text/src/UnsafeHash.m3</TITLE>
</HEAD>
<BODY>
<A NAME="0TOP0">
<H2>text/src/UnsafeHash.m3</H2></A><HR>
<inModule>
<PRE><A HREF="../../COPYRIGHT.html">Copyright (C) 1994, Digital Equipment Corp.</A>
</PRE>      Modified On Tue Feb 15 10:00:03 PST 1994 By perl       

<P><PRE>UNSAFE MODULE <module>UnsafeHash</module> EXPORTS <A HREF="Text.i3"><implements>Text</A></implements>;

IMPORT <A HREF="TextF.i3">TextF</A>, <A HREF="../../word/src/Word.i3">Word</A>;
</PRE> <CODE>Hash</CODE> is the only unsafe procedure in the <CODE>Text</CODE> interface,
   so me move it into its own module.


<P>
We will derive an efficient version of <CODE>Text.Hash</CODE> starting from the
following simple version:
<P>
<PRE>
         res := 0;
         i := 0;
         DO i # M -&gt; 
           res[i MOD N] := res[i MOD N] XOR t[i];
           i := i + 1 
         OD
</PRE>
where
<P>
<PRE>
         t is the text to be hashed
         M is the number of bytes in t
         t[i] is byte i of t
         res is the computed result
         N is the number of bytes per word
         res[i] is byte i of res
</PRE>
The numeric value of <CODE>res</CODE> will depend on whether the
machine is big-endian or little-endian; but the value
of <CODE>res</CODE> regarded as a sequence of <CODE>N</CODE> bytes will not.
<P>
We would like to derive a more efficient version that 
uses word operations.  We write
<P>
<PRE>
         rotup(w, k)  
</PRE>
to indicate the word <CODE>w</CODE> (regarded as an array of 
bytes) shifted up (towards increasing indexes) by <CODE>k</CODE> places, circularly.
<P>
<PRE>
         rotup(w, k)[(i + k) MOD N] = w[i] for all i
</PRE>
We also write
<P>
<PRE>
         rotdn(w, k)
</PRE>
to indicate <CODE>up(w, -k)</CODE>. 
<P>
We will also need the corresponding shift operators:
<CODE>shiftup</CODE> is like <CODE>rotup</CODE> except it shifts instead of rotates, that is:
<P>
<PRE>
       shiftup(w, k)[i + k] = w[i] 
           for all i such that i and i+k are both in [0..N-1] and all other
           bytes of shiftup(w, k) are 0.
</PRE>
and <CODE>shiftdn(w, k) = shiftup(w, -k)</CODE>.
<P>
<P>
We begin by transforming the simple loop by a change of
coordinates.  We <CODE>temp</CODE>, which is <CODE>res</CODE> rotated
so that <CODE>temp[0]</CODE> corresponds to <CODE>res[i MOD N]</CODE>, that is,
<P>
<PRE>
         rotup(temp, i) = res            (Q)
</PRE>
(Note that <CODE>rotup(temp, i) = rotup(temp, i MOD N)</CODE>; in general the
second argument to <CODE>rotup</CODE> and <CODE>rotdn</CODE> only matters modulo <CODE>N</CODE>.)
<P>
This allows us to transform the simple loop into:
<P>
<PRE>
         res := 0;
         temp := 0;
         {Q}
         i := 0;
         DO i # M -&gt; 
           {Q}
           res[i MOD N] := res[i MOD N] XOR t[i];
           temp[0] := temp[0] XOR t[i];
           i := i + 1;
           temp := rotdn(temp, 1)
         OD
</PRE>
Proof that <CODE>rotdn(temp, 1)</CODE> is correct: 
<P>
<PRE>
         {Q} i := i + 1; temp := rotdn(temp, 1) {Q}
      == {Hoare Logic}
         Q =&gt; Q(i := i+1, temp := rotdn(temp, 1))
      == {Carry out the substitution}
         Q =&gt; rotup(rotdn(temp,1), i+1) = res
      == {Since rotup(rotup(x, a), b) = rotup(x, a+b)}
      == Q =&gt; rotup(temp, i) = res
      == Q =&gt; Q
      == TRUE
    </PRE>
Now we can eliminate the work on <CODE>res</CODE>, and do it only at the end:
<P>
<PRE>
         temp := 0;
         i := 0;
         DO i # M -&gt; 
           temp[0] := temp[0] XOR t[i];
           i := i + 1;
           temp := rotdn(temp, 1)
         OD;
         {Q}
         res := rotup(temp, M)
</PRE>
Next, we break this loop into three pieces, the first of which processes the
unaligned prefix of the text, the second of which processes the aligned full
words of the text, and the last of which processes trailing subword fragment:
<P>
<PRE>
         temp := 0;
         i := 0;
         DO i # M AND (ADR(t[i]) MOD N) # 0 -&gt; 
           temp[0] := t[i];
           i := i + 1;
           temp := rotdn(temp, 1)
         OD;
         DO i + N &lt;= M -&gt; 
           VAR j := i IN
             DO j # i + N -&gt;
               temp[0] := temp[0] XOR t[j];
               j := j + 1;
               temp := rotdn(temp, 1)
             OD
           END;
           i := i + N
         OD;
         DO i # M -&gt; 
           temp[0] := temp[0] XOR t[i];
           i := i + 1;
           temp := rotdn(temp, 1)
         OD;
         {Q}
         res := rotup(temp, M)
</PRE>
Now we will change the first loop to use word operations.  This loop copies
into <CODE>temp</CODE> some number of bytes from a single word of memory, preserving the
order of the bytes, and leaving the bytes in <CODE>temp</CODE> so that the last byte
copied is in <CODE>temp[N-1]</CODE>.  We can achieve this with word operations by loading
the appropriate word into <CODE>temp</CODE>, shifting down to eliminate any junk bytes
that preceed the relevant bytes, and then shifting up to eliminate any junk
bytes that follow the relevant bytes, if any.  In our case, the number of
preceeding junk bytes (<CODE>jpre</CODE>) is just <CODE>ADR(t[0]) MOD N</CODE>, and the number of
following junk bytes (<CODE>jpost</CODE>) is zero if <CODE>M &gt; N - jpre</CODE>, otherwise it is 
<CODE>N -jpre - M</CODE>.  Thus the first loop above can be replaced by:
<P>
<PRE>
         jpre := ADR(t[0]) MOD N;
         IF jpre # 0 -&gt;
              jpost := MAX(0, N - jpre - M);
              temp := Mem[ADR(t[0])-jpre];
              temp := shiftdn(temp, jpre);
              temp := shiftup(temp, jpost+jpre);
              i := N - jpre - jpost
         [] jpre = 0 -&gt; SKIP
         FI
</PRE>
Similarly, we can change the last loop to use word operations.
<P>
<PRE>
         IF i # M -&gt;
             jpost := N - (M - i);
             VAR w := Mem[ADR(t[i])] IN
               w := shiftup(w, jpost);
               temp := rotup(temp, jpost);
               temp := temp XOR w;
             END
         [] i = M -&gt; SKIP
         FI
</PRE>
(Note that the rotation of <CODE>temp</CODE> to <CODE>rotup(temp, jpost)</CODE> could equally well
have been written <CODE>rotdn(temp, M-i)</CODE>.  The same rotation that brings <CODE>temp</CODE>
into alignment with <CODE>shiftup(w, jpost)</CODE> also matches the rotation performed by
the loop we are refining.)
<P>
Finally we change the middle loop to use word operations.  Its
inner loop rotates <CODE>temp</CODE> by one <CODE>N</CODE> times, and consequently
has no net rotation.  The inner loop also XORs
<P>
<PRE>
        t[i], ..., t[i+N-1]
</PRE>
into
<P>
<PRE>
        temp[0], ..., temp[N-1].
</PRE>
respectively.  Since <CODE>ADR(t[i]) MOD N = 0</CODE>, this can be 
accomplished by a single word operation.  The new version
of the middle loop is therefore:
<P>
<PRE>
         DO i + N &lt;= M -&gt; 
           temp := temp XOR Mem[ADR(t[i])];
           i := i + N
         OD;
</PRE>
In the above we write <CODE>Mem[addr]</CODE> to indicate the word whose byte's addresses
range from <CODE>addr</CODE> to <CODE>addr+N-1</CODE>, regarding that word as an array of bytes.  
We have also (for the first time) used XOR on words instead of bytes.
<P>
Now we can translate the program into Modula-3.  Here is the collected guarded
command version:
<P>
<PRE>
         temp := 0;
         i := 0;
         jpre := ADR(t[0]) MOD N;
         IF jpre # 0 -&gt;
              jpost := MAX(0, N - jpre - M);
              temp := Mem[ADR(t[0])-jpre];
              temp := shiftdn(temp, jpre);
              temp := shiftup(temp, jpost+jpre);
              i := N - jpre - jpost
         [] jpre = 0 -&gt; SKIP
         FI;
         DO i + N &lt;= M -&gt; 
           temp := temp XOR Mem[ADR(t[i])];
           i := i + N
         OD;
         IF i # M -&gt;
             jpost := N - (M - i);
             VAR w := Mem[ADR(t[i])] IN
               w := shiftup(w, jpost);
               temp := rotup(temp, jpost);
               temp := temp XOR w;
             END
         [] i = M -&gt; SKIP
         FI;
         res := rotup(temp, M)
</PRE>
Which in Modula-3 becomes:

        
<P><PRE>PROCEDURE <A NAME="Hash"><procedure>Hash</procedure></A>(t: TEXT): INTEGER =
  CONST
    N = BYTESIZE(INTEGER);
  VAR
    temp := 0;
    p    := LOOPHOLE (ADR(t[0]), UNTRACED REF INTEGER);
    m    := NUMBER(t^) - 1;
    endp := p + m;
  BEGIN
    VAR jpre := Word.And(LOOPHOLE(p, INTEGER), N-1);
        jpost: INTEGER;
    BEGIN
      IF jpre # 0 THEN
        jpost := MAX(0, N - jpre - m);
        temp := LOOPHOLE(p - jpre, UNTRACED REF INTEGER)^;
        temp := Word.Shift(Word.Shift(temp, jpre * -up1), (jpost+jpre) * up1);
        INC(p, N - jpre - jpost)
      END
    END;
    WHILE p + N &lt; endp DO
      temp := Word.Xor(temp, p^);
      INC(p, N)
    END;
     IF littleEndian THEN
      IF p # endp THEN
        VAR jpost := N - (endp - p);
            w := Word.Shift(p^, Word.Shift(jpost, lgUp1));
        BEGIN
          temp := Word.Xor(Word.Rotate(temp, Word.Shift(jpost, lgUp1)), w)
        END
      END;
      RETURN Word.Plus(Word.Rotate(temp, Word.Shift(m, lgUp1)), m)
    ELSE
      IF p # endp THEN
        VAR jpost := N - (endp - p);
            w := Word.Shift(p^, -Word.Shift(jpost, lgUp1));
        BEGIN
          temp := Word.Xor(Word.Rotate(temp, -Word.Shift(jpost, lgUp1)), w)
        END
      END;
      RETURN Word.Plus(Word.Rotate(temp, -Word.Shift(m, lgUp1)), m)
    END
  END Hash;
</PRE> In the Modula-3 version we have added the text length into
   the result before returning it, in order to get a better
   hash function for texts that contain long strings of
   repeated characters.  Also, instead of multiplying
   by <CODE>up1</CODE> we have shifted by its base two logarithm
  <CODE>lg2Up1</CODE>.  These constants are computed below: 

<P><PRE>VAR
  littleEndian: BOOLEAN;
  ref := NEW(UNTRACED REF INTEGER);
  up1: INTEGER;
  lgUp1: INTEGER;

BEGIN
  &lt;* ASSERT 1 = ADRSIZE(CHAR) *&gt;
  &lt;* ASSERT 0 = Word.And(BYTESIZE(INTEGER), BYTESIZE(INTEGER)-1) *&gt;
  ref^ := 1;
  littleEndian := 1 = LOOPHOLE(ref, UNTRACED REF [0..255])^;
  IF littleEndian THEN up1 := BITSIZE(CHAR) ELSE up1 := -BITSIZE(CHAR) END;
  lgUp1 := 0;
  VAR k := 1; BEGIN
    WHILE k # ABS(up1) DO
      INC(lgUp1); k := k + k
    END
  END
END UnsafeHash.
</PRE>
</inModule>
<PRE>























</PRE>
</BODY>
</HTML>
