is a research project funded by Sorbonne
Paris Cité
and hosted at CRI (UP5) and LIAFA (UP7)
The wekeypedia python toolkit is a set of class and helpers that have been written during the overall wekeypedia project. Its main purpose is to give back some shortcuts to the science community. We hope this work will help future data scientist and web scrappers make them win some time about the tedious part of the work, be able to spend more time on the more fun parts and conduct studies with wikipedia materials.
Its main features are :
Convergences is a web visualization of convergences and divergences between the different languages of wikipedia as a proxy of distances between cultures.
While cultures and languages do not have a clear causal relationship. This kind of measurements can still help to understand the social representations of the wikipedia communities and their sub-cultures. It can also provide some interesting insights about the power structure between languages. For example, the difference in number of links for a page in two languages tells something about the importance of the concepts involved in the page within the language/culture and the dominance as differential power of a language over another one.
A small experiment to play around the strange traces let by some "vandals" of the Wikipedia page about Love. It was also a trial to foster the possibilities of natural language processing over delta analysis of revisions. It was also a fun way to show how collective intelligence energy is not only about straight out editing and quality. Wikipedia after all is a public space with similarities with other spaces particulary in terms of appropriations and non-intended usages.
A proof concept inspired by ben fry's visualization of Darwin's multiple editions of On the origin of species. It was used to show the possibility to observe graphically the dynamic of revisions of a predetermined set of pages. The input is a preprocess texts corpus which are transformed as blocks instead of text. It is also a preview of what can be visualize using small multiple technics over sample of variations and rich content (hyperlinks, pictures, ect).