Simple text watermarking with Unicode

There’s quite a few papers on the watermarking of text. Most of them are pretty complex. I was trying to think of a less robust, but simpler solution, which could help track text being cross posted on websites and blogs. The idea was that you could provide the same block of text with a different watermark to each user. So if the text then showed up later on a blog, you could tell who had “leaked” the text.

I chose to use the spaces between words as the bits for storing the watermark and have the on bit be marked by inserting a zero width unicode character after the space. I decided against inserting a character inside of words because if the unicode character showed up after pasting the text in a non unicode editor, the text would be very un-readable. Inserting between words also allowed for the text to be searchable in the browser. If you had the text Pe[invisible unicode character]ter in your browser and tried to search for “Pet”, your search wouldn’t match it even though it would look jsut like “Peter”. Of course terms with spaces are still unsearchable in my approach.

I tried some other approaches using different unicode space characters but I ran into problems with all of them. This one seems to work the best in Firefox and IE. There’s a crapload of unicode code points so there’s probably a bunch of other possibilities. For example, all the alternative punctuation characters.

Currently the watermark must be an unsigned integer. It would be pretty trivial to make it work with a string.

Here’s the usage:

irb(main):003:0> puts Watermark.apply_watermark('Here is a block of text inside of which a number will be hidden!', 42)
Here is a block of text inside of which a number will be hidden!
irb(main):004:0> Watermark.read_watermark('Here is a block of text inside of which a number will be hidden!')
=> 42

note: The string in the above code actually contains the watermark, but you don’t see it… Try copying the text to a non-unicode aware context

Just in case I’ve also provided a method to convert the unicode characters to HTML entities:

irb(main):011:0> Watermark.apply_watermark('Here is a block of text inside of which a number will be hidden!', 42)
=> "Here is \357\273\277a block \357\273\277of text \357\273\277inside of which a number will be hidden!"
irb(main):012:0> Watermark.escape_unicode _
=> "Here is &#xFEFF;a block &#xFEFF;of text &#xFEFF;inside of which a number will be hidden!"

Here’s the implementation:

class Watermark
  INVISIBLE_SPACE = "\357\273\277"  # U+FEFF
  SPACE_CHARS = [ " ", " #{INVISIBLE_SPACE}" ]
  SPACE_REGEX = Regexp.union(SPACE_CHARS[1], SPACE_CHARS[0])

  class NotEnoughSpacesError < StandardError; end
  class BadWatermarkError < StandardError; end

  class << self
    def apply_watermark(text, watermark)
      verify_watermark_format!(watermark)
      verify_enough_spaces!(text, watermark)

      bits = bit_map(watermark)
      text.gsub(/ /) { SPACE_CHARS[bits.shift || 0] }
    end

    def read_watermark(watermarked_text)
      bit_map = watermarked_text.scan(SPACE_REGEX).map {|c| SPACE_CHARS.index(c) }

      bit = -1
      bit_map.inject(0) { |watermark, on_off| watermark |= (on_off << bit+=1) }
    end

    def escape_unicode(text)
      text.gsub(INVISIBLE_SPACE, "&#xFEFF;")
    end

    private

    def verify_watermark_format!(watermark)
      raise(BadWatermarkError, "only unsigned integers")  if ! watermark.is_a? Integer or watermark < 0
    end

    def verify_enough_spaces!(text, watermark)
      spaces_count = text.scan(/ /).size
      raise NotEnoughSpacesError  if bits_needed(watermark) > spaces_count
    end

    def bits_needed(integer)
      return 1  if integer == 0
      (Math.log(integer+1)/Math.log(2)).ceil  # solve: integer < 2**bits_needed
    end

    def bit_map(integer)
      Array.new(bits_needed(integer)) {|i| i }.map {|bit| [integer & (1 << bit), 1].min }
    end
  end
end

Simple text watermarking with Unicode

Trending Articles

Police confirm man stabbed to death in Selsdon was Andrew David Else of Croydon

Angry father ordered to compensate daughter’s male friend

Moondru Mudichu 20-07-2016 – Polimer tv Serial

Download: Rich Bizzy -Panono Ukwenda (Cover)

Sniper: Ghost Warrior 3: Трейнер/Trainer (+17) [1.0 - 1.02] {FLiNG}

IN COURT: Full list of people sentenced at Northampton Magistrates’ Court

GERVASE JOHN

Gordian S01e01-73 [H264 - Ita Jap Ac3 - SoftSub Ita]

Ndebele names

Hyper-V replication "Enabling Replication Failed"

FLASHBACK WITH SIRASA FM AT GALGAMUWA 2022

Prison officer charged!

Jessica Carradero Lopez Arrested by Miami-Dade County Corrections on Dec 17,...

Anthony Wahome Biography, Family, Wife and Children

Who’s been sentenced at Northampton Magistrates’ Court

Reply: Betrayal at House on the Hill:: Rules:: Re: Haunt #6 - Spoilers Within

Jamani mm nauliza hivi second selection za form five zinatoka lini?

(NOTES & Audio) The 12 Sources for Islamic Shariah Parts 1 & 2

Madonna – Behind Me (feat. Guido Dos Santos) – Single [iTunes Plus M4A]

Laura Pausini - Platinum Collection (3Cd) (2009) .mp3 - 320 Kbps