Saya mencoba menganalisis beberapa data yang saya miliki tetapi ada banyak ketidakkonsistenan dalam data saya.
Saya memiliki tabel SQL yang saya coba analisis.
Tabel tersebut adalah tabel universitas dengan struktur sebagai berikut: name:string, city:string, state:string, country:string
Nama selalu ada namun kota, negara bagian, negara dapat hilang. Masalah utama saya adalah ada banyak kesalahan ketik dan deklinasi nama universitas yang berbeda. Sebagai contoh di sini adalah deklinasi Standford Unversity yang saya temukan ketika saya melakukannya SELECT "universities".* FROM "perm_universities" WHERE (name like '%stanford%')
:
stanford university - stanford - ca - united states of america
the leland stanford junior university - stanford - ca - united states of america
leland stanford jr. university - stanford - ca - united states of america
stanford university graduate school of business - stanford - ca - united states of america
the leland stanford junior university (stanford university) - stanford - ca - united states of america
leland stanford junior university - stanford - ca - united states of america
stanford university - stanford - -
leland stanford jr. university, graduate school of business - stanford - ca - united states of america
stanford law school - stanford - ca - united states of america
stanford - stanford - ca - united states of america
stanford university, graduate school of business - stanford - ca - united states of america
stanford graduate school of business - stanford - ca - united states of america
stanford univerity - stanford - ca - united states of america
stanford university (the leland stanford junior university) - stanford - ca - united states of america
the leland stanford jr. university - palo alto - ca - united states of america
leland stanford junior university, school of law - stanford - ca / n/a - united states of america
stanford universit - stanford - ca - united states of america
the leland stanford university - stanford - ca - united states of america
leland standford stanford junior university - stanford - ca - united states of america
stanford university - cambridge - ma - united states of america
the leland stanford junior university 'stanford university' - stanford - ca - united states of america
stanford university school of law - stanford - ca - united states of america
stanford univresity - stanford - ca - united states of america
the leland stanford jr. university (stanford university) - stanford - ca - united states of america
leeland stanford junior university - stanford - ca - united states of america
leland stanford junion university - - ca - united states of america
leland stanford junior university (stanford university) - stanford - ca - united states of america
the leland stanford junior university - stanford - -
stanford university - graduate school of business - stanford - ca - united states of america
graduate school of business, stanford university - stanford - ca - united states of america
stanford universoty - stanford - ca - united states of america
leland stanford junior university - stanford - -
stanford univeristy - palo alto - ca - united states of america
leland stanford university - palo alto - ca - united states of america
stanford university - stanford - ca / n/a - united states of america
the leland stanford junior university, stanford university - stanford - ca - united states of america
the leland stanford junior university graduate school of business - stanford - ca - united states of america
stanford universtiy - stanford - ca - united states of america
stanford univerisity - stanford - ca - united states of america
stanford university - stanford - ct - united states of america
stanford law scool - stanford - ca - united states of america
mba: stanford university - stanford - ca - united states of america
Mereka semua adalah universitas yang sama, tetapi beberapa memiliki kesalahan ketik, beberapa memiliki nama yang berbeda, beberapa tidak memiliki kota, beberapa memiliki kota yang salah, ... datanya tidak besar.
Jadi saya mencoba memperbaikinya. Bagaimana saya bisa menggabungkan data ini?