Tuesday, June 12, 2012

Tech Review AI: Tuesday, June 12, 2012


Natural Language Processing

  • Have listen to the course introduction
  • listening to the Basic Text Processing -> Regular Expressions
    • Regular expressions
      • a formal language for specifying text strings
    • regular expression disjuctions
      • letters inside square brackets []
    • pipe disjunction
      • a|b|c is the same as [abc]
    • ranges [A-Z]
    • negations [^A-Z]
    • ?
      • previous character is optional
    • .
      • one charater
    • ^
      • begining of line
      • end of line
    • \.
      • use backslach to find special characters
    • Kleene operators
      • Stephen C Kleene
      • * is 0 or more
      • + is 1 or more
    • non-alphabetics 
    • regexpal.com
    • Type1 error false positive
      • increasing accuracy or precision (minimizing false positives)
    • Type2 error false negatives
      • increasing coverage or recall (minimizing false negatives)
    • Type1 and 2 conflict, antagonistic 
    • regular expressions play a surprisingly large role in NLP.
    • regular expression used as features in classifiers for the hard NLP tasks
  • Basic Text Processing -> Regular expression in practical NLP
    • Christopher Manning
    • Stanford English Tokenizer
      • a deterministic
    • Java Flexer 
    • {2,4}
      • length between 2 to 4