今さら言語処理100本ノック #01

環境

Python 3.6.8

00. 文字列の逆順 Permalink

文字列”stressed”の文字を逆に（末尾から先頭に向かって）並べた文字列を得よ．

>>> "stressed"[::-1]
'desserts'

01. 「パタトクカシーー」Permalink

「パタトクカシーー」という文字列の 1,3,5,7 文字目を取り出して連結した文字列を得よ．

>>> "パタトクカシーー"[1::2]
'タクシー'

02. 「パトカー」＋「タクシー」＝「パタトクカシーー」

「パトカー」＋「タクシー」の文字を先頭から交互に連結して文字列「パタトクカシーー」を得よ．

>>> "".join([i + j for i, j in zip("パトカー", "タクシー")])
'パタトクカシーー'

03. 円周率

“Now I need a drink, alcoholic of course, after the heavy lectures involving quantum mechanics.”という文を単語に分解し，各単語の（アルファベットの）文字数を先頭から出現順に並べたリストを作成せよ．

>>> import re
>>> s = "Now I need a drink, alcoholic of course, after the heavy lectures involving quantum mechanics."
>>> list(map(lambda x: len(x), re.findall("[a-zA-Z]+", s)))
[3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5, 8, 9, 7, 9]

04. 元素記号 Permalink

“Hi He Lied Because Boron Could Not Oxidize Fluorine. New Nations Might Also Sign Peace Security Clause. Arthur King Can.”という文を単語に分解し，1, 5, 6, 7, 8, 9, 15, 16, 19 番目の単語は先頭の 1 文字，それ以外の単語は先頭に 2 文字を取り出し，取り出した文字列から単語の位置（先頭から何番目の単語か）への連想配列（辞書型もしくはマップ型）を作成せよ．

>>> import re
>>> s = "Hi He Lied Because Boron Could Not Oxidize Fluorine. New Nations Might Also Sign Peace Security Clause. Arthur King Can."
>>> p = (1, 5, 6, 7, 8, 9, 15, 16, 19)
>>> {word[0] if i in p else word[:2] :i for i, word in enumerate(re.findall("[a-zA-Z]+", s), start=1)}
{'H': 1, 'He': 2, 'Li': 3, 'Be': 4, 'B': 5, 'C': 6, 'N': 7, 'O': 8, 'F': 9, 'Ne': 10, 'Na': 11, 'Mi': 12, 'Al': 13, 'Si': 14, 'P': 15, 'S': 16, 'Cl': 17, 'Ar': 18, 'K': 19, 'Ca': 20}

05. n-gramPermalink

与えられたシーケンス（文字列やリストなど）から n-gram を作る関数を作成せよ．この関数を用い，”I am an NLPer”という文から単語 bi-gram，文字 bi-gram を得よ．

>>> import re
>>> s = "I am an NLPer"
>>> n_gram = lambda x, y: [ "".join(x[i:i+y]) for i in range(len(x) - y + 1) ]
>>>
>>> n_gram(re.findall("[a-zA-Z]", s), 1)
['I', 'a', 'm', 'a', 'n', 'N', 'L', 'P', 'e', 'r']
>>> n_gram(re.findall("[a-zA-Z]", s), 2)
['Ia', 'am', 'ma', 'an', 'nN', 'NL', 'LP', 'Pe', 'er']
>>> n_gram(re.findall("[a-zA-Z]", s), 3)
['Iam', 'ama', 'man', 'anN', 'nNL', 'NLP', 'LPe', 'Per']
>>>
>>> n_gram(re.findall("[a-zA-Z]+", s), 1)
['I', 'am', 'an', 'NLPer']
>>> n_gram(re.findall("[a-zA-Z]+", s), 2)
['Iam', 'aman', 'anNLPer']
>>> n_gram(re.findall("[a-zA-Z]+", s), 3)
['Iaman', 'amanNLPer']

06. 集合 Permalink

“paraparaparadise”と”paragraph”に含まれる文字 bi-gram の集合を，それぞれ, X と Y として求め，X と Y の和集合，積集合，差集合を求めよ．さらに，’se’という bi-gram が X および Y に含まれるかどうかを調べよ．

>>> bi_gram = lambda x: [ x[i:i+2] for i in range(len(x) - 1)]
>>> X = bi_gram("paraparaparadise")
>>> Y = bi_gram("paragraph")
>>> X
['pa', 'ar', 'ra', 'ap', 'pa', 'ar', 'ra', 'ap', 'pa', 'ar', 'ra', 'ad', 'di', 'is', 'se']
>>> Y
['pa', 'ar', 'ra', 'ag', 'gr', 'ra', 'ap', 'ph']
>>>
>>> set(X) | set(Y)
{'se', 'ra', 'ag', 'ad', 'ph', 'ap', 'is', 'pa', 'di', 'ar', 'gr'}
>>> set(X) & set(Y)
{'ar', 'pa', 'ra', 'ap'}
>>> set(X) - set(Y)
{'se', 'ad', 'is', 'di'}
>>>
>>> "se" in X
True
>>> "se" in Y
False

07. テンプレートによる文生成

引数 x, y, z を受け取り「x 時の y は z」という文字列を返す関数を実装せよ．さらに，x=12, y=”気温”, z=22.4 として，実行結果を確認せよ．

>>> f = lambda x, y, z: print(f"{x} 時の {y} は {z}")
>>> f(x=12, y="気温", z=22.4)
12 時の 気温 は 22.4

08. 暗号文

与えられた文字列の各文字を，以下の仕様で変換する関数 cipher を実装せよ．

英小文字ならば(219 - 文字コード)の文字に置換
その他の文字はそのまま出力

この関数を用い，英語のメッセージを暗号化・復号化せよ．

>>> import re
>>> cipher = lambda x: "".join([chr(219-ord(e)) if re.match("[a-z]", e) else e for e in x])
>>> cipher("abcde01Aaz")
'zyxwv01Aza'
>>>
>>> s = "".join([chr(i) for i in range(97, 97+26)])
>>> cipher(s)
'zyxwvutsrqponmlkjihgfedcba'
>>> cipher(cipher(s))
'abcdefghijklmnopqrstuvwxyz'
>>>
>>> s = "".join([chr(i) if i%3!=0 else chr(i).upper() for i in range(97, 97+26)])
>>> s
'abCdeFghIjkLmnOpqRstUvwXyz'
>>> cipher(s)
'zyCwvFtsIqpLnmOkjRhgUedXba'

memo

英小文字の文字コード範囲は 79-122

219 - 文字コードの範囲は 122-79

つまり丁度反転の形をとり cipher によって a-z が z-a に対応するよう暗号化されることがわかる。

また同じく cipher を 1 度かけると暗号化, 2 度かけると複合化されることがわかる.

参考

Python 組み込み関数

Python で Unicode コードポイントと文字を相互変換（chr, ord, \x, \u, \U）

09. Typoglycemia

スペースで区切られた単語列に対して，各単語の先頭と末尾の文字は残し，それ以外の文字の順序をランダムに並び替えるプログラムを作成せよ．ただし，長さが４以下の単語は並び替えないこととする．適当な英語の文（例えば”I couldn’t believe that I could actually understand what I was reading : the phenomenal power of the human mind .”）を与え，その実行結果を確認せよ．

>>> s = "I couldn’t believe that I could actually understand what I was reading : the phenomenal power of the human mind ."
>>> from random import sample
>>> f = lambda x: " ".join([e if len(e) < 5 else e[0] + "".join([e[i] for i in sample(range(1, len(e)-1), len(e)-2)]) + e[-1] for e in x.split()])
>>> f(s)
'I codln’ut bivleee that I cloud aaltculy usaertdnnd what I was randeig : the pnnmeoahel peowr of the hmuan mind .'
>>> f(s)
'I cl’duont belevie that I colud actaully udtsnaenrd what I was rieadng : the pmnehoeanl pewor of the huamn mind .'
>>> f(s)
'I coludn’t beevlie that I culod acltualy udestnnrad what I was rdeaing : the pnmoneehal power of the hmaun mind .'

こちらでもいけた

>>> f2 = lambda x: " ".join([e if len(e) < 5 else e[0] + "".join(sample(e[1:-1], len(e[1:-1]))) + e[-1] for e in x.split()])
>>> f2(s)
'I cnudlo’t belivee that I colud aaulclty utnansredd what I was rediang : the pahemnneol peowr of the hamun mind .'
>>> f2(s)
'I culn’odt belivee that I culod acalulty uraetndnsd what I was reaindg : the pennoeamhl peowr of the haumn mind .'
>>> f2(s)
'I codlnu’t beevile that I cloud atluclay unrntsaedd what I was rneidag : the paeehnomnl peowr of the hmaun mind .'

参考

文字列やタプルのシャッフル

おわりに

画像処理 100 本ノック!!もあるようなので言語処理 100 本ノックが終わり次第やりたい。

(まずは言語処理 100 本ノック )