Skip to content
whimsycwd edited this page Oct 11, 2014 · 9 revisions

-1. 啰嗦

  1. 为什么需要测试

  2. Pull Request 标题格式 11300240057_CHENWEIDONG_AssignmentX

  3. 做Assignment3的步骤

    • 从master新建分支branch Assignment3, (确保assignment之间的代码不混在一起),在新分支上做作业。
    • 复制TA/Assignment3到你的目录, 然后Merge 到你的master 马上 提交一次Pull Request,不要在master上开发,否则修改会持续更新到这个Pull request (确保能够分辨出你和TA提供的代码) 标题 11300240057_CHENWEIDONG_AssignmentX[Initial Commit]
    • ...
  4. 提交代码前先阅读其他同学的Pull Request(重要!)

0. 写在最前

这次的Assignment,希望同学们在能在下周一(Oct. 13)前尽量完成前两个算法。

  • 请阅读 ustc book site
  • 关注一下 String-matching automata 这个是书本中没有的
  • 强调一下, Makefile & MatcherTest.cpp 请务必阅读。最后改进修改它们。

Have fun and good luck!~ cheers.

1. Requirement

实现课本 3.5 中提到四个字符串匹配算法。

  1. BruteForceImpl (30min)
  2. KarpRabinImpl (30min)
  3. KMPImpl (5 hour)
  4. BoyerMooreImpl (5 hour)

2. Simple Definition

  1. pattern, 模式字符串,即要查找的字符串
  2. text, 文本串

example :

text =   "ababcab"
pattern =  "abc"
XImpl xImpl = new XImpl(pattern);
xImpl.find(text) should return 2 (start from 0, return the first matched substring index)

3. Application

Finding all occurrences of a pattern in a text is a problem that arises frequently in text-editing programs. Typically, the text is a document being edited, and the pattern searched for is a particular word supplied by the user. Efficient algorithms for this problem can greatly aid the responsiveness of the text-editing program. String-matching algorithms are also used, for example, to search for particular patterns in DNA sequences.

4. Formal Definition

We say that pattern P occurs with shift s in text T (or, equivalently, that pattern P occurs beginning at position s + 1 in text T) if 0 s n - m and T[s + 1 . . s + m] = P[1 . . m] (that is, if T[s + j] = P[j], for 1 j m). If P occurs with shift s in T, then we call s a valid shift; otherwise, we call s an invalid shift. The string-matching problem is the problem of finding all valid shifts with which a given pattern P occurs in a given text T.

5. Interface

接口仅仅为了方便写测试

class Matcher {
    
    public :
        const static int NOT_FOUND = -1;
        virtual int find(std::string pattern) = 0;
        virtual ~Matcher() {
        }
};

6. Usage

From MatcherTest.cpp, 请务必阅读该程序。

void find(string text, string pattern, int expect) {
    Matcher * impl = getImpl(pattern);
    ...
    int pos = impl->find(text);
    ...
    assert(pos == expect);
    ... 
    delete impl;
}

7. 建议步骤

feel free to modify MatherTest.cpp & Matcher.h & Makefile

  1. 阅读Matcher.h 以及 MatcherTest.cpp了解所待实现类的要求和用法。
  2. 阅读Makefile, TA为你实现了make (BF | KR | KMP | BM)以及数据生成make genData
  3. 实现XImpl, 比如KMPImpl , 用 make KMP来对其进行测试。
  4. 修改MatcherTest, 完善该测试 (重要)!
  5. 3-4重复完成4个实现
  6. TA做了简单 MatcherTest 测效率功能。 请结合课本比较四个算法的运行效率
  7. 设计更合理的数据,来对4个算法进行比较。

8. 阅读资源:

  1. KarpRabin Wiki
  2. KMP Wiki
  3. ByoerMoore Wiki
  4. ustc book site

Clone this wiki locally