58 skills found · Page 1 of 2
aparrish / Phonetic Similarity VectorsSource code to accompany my paper "Poetic sound similarity vectors using phonetic features"
sanusanth / Python Basic ProgramsWhat is Python? Executive Summary Python is an interpreted, object-oriented, high-level programming language with dynamic semantics. Its high-level built in data structures, combined with dynamic typing and dynamic binding, make it very attractive for Rapid Application Development, as well as for use as a scripting or glue language to connect existing components together. Python's simple, easy to learn syntax emphasizes readability and therefore reduces the cost of program maintenance. Python supports modules and packages, which encourages program modularity and code reuse. The Python interpreter and the extensive standard library are available in source or binary form without charge for all major platforms, and can be freely distributed. Often, programmers fall in love with Python because of the increased productivity it provides. Since there is no compilation step, the edit-test-debug cycle is incredibly fast. Debugging Python programs is easy: a bug or bad input will never cause a segmentation fault. Instead, when the interpreter discovers an error, it raises an exception. When the program doesn't catch the exception, the interpreter prints a stack trace. A source level debugger allows inspection of local and global variables, evaluation of arbitrary expressions, setting breakpoints, stepping through the code a line at a time, and so on. The debugger is written in Python itself, testifying to Python's introspective power. On the other hand, often the quickest way to debug a program is to add a few print statements to the source: the fast edit-test-debug cycle makes this simple approach very effective. What is Python? Python is a popular programming language. It was created by Guido van Rossum, and released in 1991. It is used for: web development (server-side), software development, mathematics, system scripting. What can Python do? Python can be used on a server to create web applications. Python can be used alongside software to create workflows. Python can connect to database systems. It can also read and modify files. Python can be used to handle big data and perform complex mathematics. Python can be used for rapid prototyping, or for production-ready software development. Why Python? Python works on different platforms (Windows, Mac, Linux, Raspberry Pi, etc). Python has a simple syntax similar to the English language. Python has syntax that allows developers to write programs with fewer lines than some other programming languages. Python runs on an interpreter system, meaning that code can be executed as soon as it is written. This means that prototyping can be very quick. Python can be treated in a procedural way, an object-oriented way or a functional way. Good to know The most recent major version of Python is Python 3, which we shall be using in this tutorial. However, Python 2, although not being updated with anything other than security updates, is still quite popular. In this tutorial Python will be written in a text editor. It is possible to write Python in an Integrated Development Environment, such as Thonny, Pycharm, Netbeans or Eclipse which are particularly useful when managing larger collections of Python files. Python Syntax compared to other programming languages Python was designed for readability, and has some similarities to the English language with influence from mathematics. Python uses new lines to complete a command, as opposed to other programming languages which often use semicolons or parentheses. Python relies on indentation, using whitespace, to define scope; such as the scope of loops, functions and classes. Other programming languages often use curly-brackets for this purpose. Applications for Python Python is used in many application domains. Here's a sampling. The Python Package Index lists thousands of third party modules for Python. Web and Internet Development Python offers many choices for web development: Frameworks such as Django and Pyramid. Micro-frameworks such as Flask and Bottle. Advanced content management systems such as Plone and django CMS. Python's standard library supports many Internet protocols: HTML and XML JSON E-mail processing. Support for FTP, IMAP, and other Internet protocols. Easy-to-use socket interface. And the Package Index has yet more libraries: Requests, a powerful HTTP client library. Beautiful Soup, an HTML parser that can handle all sorts of oddball HTML. Feedparser for parsing RSS/Atom feeds. Paramiko, implementing the SSH2 protocol. Twisted Python, a framework for asynchronous network programming. Scientific and Numeric Python is widely used in scientific and numeric computing: SciPy is a collection of packages for mathematics, science, and engineering. Pandas is a data analysis and modeling library. IPython is a powerful interactive shell that features easy editing and recording of a work session, and supports visualizations and parallel computing. The Software Carpentry Course teaches basic skills for scientific computing, running bootcamps and providing open-access teaching materials. Education Python is a superb language for teaching programming, both at the introductory level and in more advanced courses. Books such as How to Think Like a Computer Scientist, Python Programming: An Introduction to Computer Science, and Practical Programming. The Education Special Interest Group is a good place to discuss teaching issues. Desktop GUIs The Tk GUI library is included with most binary distributions of Python. Some toolkits that are usable on several platforms are available separately: wxWidgets Kivy, for writing multitouch applications. Qt via pyqt or pyside Platform-specific toolkits are also available: GTK+ Microsoft Foundation Classes through the win32 extensions Software Development Python is often used as a support language for software developers, for build control and management, testing, and in many other ways. SCons for build control. Buildbot and Apache Gump for automated continuous compilation and testing. Roundup or Trac for bug tracking and project management. Business Applications Python is also used to build ERP and e-commerce systems: Odoo is an all-in-one management software that offers a range of business applications that form a complete suite of enterprise management applications. Try ton is a three-tier high-level general purpose application platform.
sanusanth / C Basic ProgramsWhat is C#? C# is pronounced "C-Sharp". It is an object-oriented programming language created by Microsoft that runs on the .NET Framework. C# has roots from the C family, and the language is close to other popular languages like C++ and Java. The first version was released in year 2002. The latest version, C# 8, was released in September 2019. C# is a modern object-oriented programming language developed in 2000 by Anders Hejlsberg, the principal designer and lead architect at Microsoft. It is pronounced as "C-Sharp," inspired by the musical notation “♯” which stands for a note with a slightly higher pitch. As it’s considered an incremental compilation of the C++ language, the name C “sharp” seemed most appropriate. The sharp symbol, however, has been replaced by the keyboard friendly “#” as a suffix to “C” for purposes of programming. Although the code is very similar to C++, C# is newer and has grown fast with extensive support from Microsoft. The fact that it’s so similar to Java syntactically helps explain why it has emerged as one of the most popular programming languages today. C# is pronounced "C-Sharp". It is an object-oriented programming language created by Microsoft that runs on the .NET Framework. C# has roots from the C family, and the language is close to other popular languages like C++ and Java. The first version was released in year 2002. The latest version, C# 8, was released in September 2019. C# is used for: Mobile applications Desktop applications Web applications Web services Web sites Games VR Database applications And much, much more! An Introduction to C# Programming C# is a general-purpose, object-oriented programming language that is structured and easy to learn. It runs on Microsoft’s .Net Framework and can be compiled on a variety of computer platforms. As the syntax is simple and easy to learn, developers familiar with C, C++, or Java have found a comfort zone within C#. C# is a boon for developers who want to build a wide range of applications on the .NET Framework—Windows applications, Web applications, and Web services—in addition to building mobile apps, Windows Store apps, and enterprise software. It is thus considered a powerful programming language and features in every developer’s cache of tools. Although first released in 2002, when it was introduced with .NET Framework 1.0, the C# language has evolved a great deal since then. The most recent version is C# 8.0, available in preview as part of Visual Studio. To get access to all of the new language features, you would need to install the latest preview version of .NET Core 3.0. C# is used for: Mobile applications Desktop applications Web applications Web services Web sites Games VR Database applications And much, much more! Why Use C#? It is one of the most popular programming language in the world It is easy to learn and simple to use It has a huge community support C# is an object oriented language which gives a clear structure to programs and allows code to be reused, lowering development costs. As C# is close to C, C++ and Java, it makes it easy for programmers to switch to C# or vice versa. The C# Environment You need the .NET Framework and an IDE (integrated development environment) to work with the C# language. The .NET Framework The .NET Framework platform of the Windows OS is required to write web and desktop-based applications using not only C# but also Visual Basic and Jscript, as the platform provides language interoperability. Besides, the .Net Framework allows C# to communicate with any of the other common languages, such as C++, Jscript, COBOL, and so on. IDEs Microsoft provides various IDEs for C# programming: Visual Studio 2010 (VS) Visual Studio Express Visual Web Developer Visual Studio Code (VSC) The C# source code files can be written using a basic text editor, like Notepad, and compiled using the command-line compiler of the .NET Framework. Alternative open-source versions of the .Net Framework can work on other operating systems as well. For instance, the Mono has a C# compiler and runs on several operating systems, including Linux, Mac, Android, BSD, iOS, Windows, Solaris, and UNIX. This brings enhanced development tools to the developer. As C# is part of the .Net Framework platform, it has access to its enormous library of codes and components, such as Common Language Runtime (CLR), the .Net Framework Class Library, Common Language Specification, Common Type System, Metadata and Assemblies, Windows Forms, ASP.Net and ASP.Net AJAX, Windows Workflow Foundation (WF), Windows Communication Foundation (WCF), and LINQ. C# and Java C# and Java are high-level programming languages that share several similarities (as well as many differences). They are both object-oriented languages much influenced by C++. But while C# is suitable for application development in the Microsoft ecosystem from the front, Java is considered best for client-side web applications. Also, while C# has many tools for programming, Java has a larger arsenal of tools to choose from in IDEs and Text Editors. C# is used for virtual reality projects like games, mobile, and web applications. It is built specifically for Microsoft platforms and several non-Microsoft-based operating systems, like the Mono Project that works with Linux and OS X. Java is used for creating messaging applications and developing web-based and enterprise-based applications in open-source ecosystems. Both C# and Java support arrays. However, each language uses them differently. In C#, arrays are a specialization of the system; in Java, they are a direct specialization of the object. The C# programming language executes on the CLR. The source code is interpreted into bytecode, which is further compiled by the CLR. Java runs on any platform with the assistance of JRE (Java Runtime Environment). The written source code is first compiled into bytecode and then converted into machine code to be executed on a JRE. C# and C++ Although C# and C++ are both C-based languages with similar code, there are some differences. For one, C# is considered a component-oriented programming language, while C++ is a partial object-oriented language. Also, while both languages are compiled languages, C# compiles to CLR and is interpreted by.NET, but C++ compiles to machine code. The size of binaries in C# is much larger than in C++. Other differences between the two include the following: C# gives compiler errors and warnings, but C++ doesn’t support warnings, which may cause damage to the OS. C# runs in a virtual machine for automatic memory management. C++ requires you to manage memory manually. C# can create Windows, .NET, web, desktop, and mobile applications, but not stand-alone apps. C++ can create server-side, stand-alone, and console applications as it can work directly with the hardware. C++ can be used on any platform, while C# is targeted toward Windows OS. Generally, C++ being faster than C#, the former is preferred for applications where performance is essential. Features of C# The C# programming language has many features that make it more useful and unique when compared to other languages, including: Object-oriented language Being object-oriented, C# allows the creation of modular applications and reusable codes, an advantage over C++. As an object-oriented language, C# makes development and maintenance easier when project size grows. It supports all three object-oriented features: data encapsulation, inheritance, interfaces, and polymorphism. Simplicity C# is a simple language with a structured approach to problem-solving. Unsafe operations, like direct memory manipulation, are not allowed. Speed The compilation and execution time in C# is very powerful and fast. A Modern programming language C# programming is used for building scalable and interoperable applications with support for modern features like automatic garbage collection, error handling, debugging, and robust security. It has built-in support for a web service to be invoked from any app running on any platform. Type-safe Arrays and objects are zero base indexed and bound checked. There is an automatic checking of the overflow of types. The C# type safety instances support robust programming. Interoperability Language interoperability of C# maximizes code reuse for the efficiency of the development process. C# programs can work upon almost anything as a program can call out any native API. Consistency Its unified type system enables developers to extend the type system simply and easily for consistent behavior. Updateable C# is automatically updateable. Its versioning support enables complex frameworks to be developed and evolved. Component oriented C# supports component-oriented programming through the concepts of properties, methods, events, and attributes for self-contained and self-describing components of functionality for robust and scalable applications. Structured Programming Language The structured design and modularization in C# break a problem into parts, using functions for easy implementation to solve significant problems. Rich Library C# has a standard library with many inbuilt functions for easy and fast development. Prerequisites for Learning C# Basic knowledge of C or C++ or any programming language or programming fundamentals. Additionally, the OOP concept makes for a short learning curve of C#. Advantages of C# There are many advantages to the C# language that makes it a useful programming language compared to other languages like Java, C, or C++. These include: Being an object-oriented language, C# allows you to create modular, maintainable applications and reusable codes Familiar syntax Easy to develop as it has a rich class of libraries for smooth implementation of functions Enhanced integration as an application written in .NET will integrate and interpret better when compared to other NET technologies As C# runs on CLR, it makes it easy to integrate with components written in other languages It’s safe, with no data loss as there is no type-conversion so that you can write secure codes The automatic garbage collection keeps the system clean and doesn’t hang it during execution As your machine has to install the .NET Framework to run C#, it supports cross-platform Strong memory backup prevents memory leakage Programming support of the Microsoft ecosystem makes development easy and seamless Low maintenance cost, as C# can develop iOS, Android, and Windows Phone native apps The syntax is similar to C, C++, and Java, which makes it easier to learn and work with C# Useful as it can develop iOS, Android, and Windows Phone native apps with the Xamarin Framework C# is the most powerful programming language for the .NET Framework Fast development as C# is open source steered by Microsoft with access to open source projects and tools on Github, and many active communities contributing to the improvement What Can C Sharp Do for You? C# can be used to develop a wide range of: Windows client applications Windows libraries and components Windows services Web applications Native iOS and Android mobile apps Azure cloud applications and services Gaming consoles and gaming systems Video and virtual reality games Interoperability software like SharePoint Enterprise software Backend services and database programs AI and ML applications Distributed applications Hardware-level programming Virus and malware software GUI-based applications IoT devices Blockchain and distributed ledger technology C# Programming for Beginners: Introduction, Features and Applications By Simplilearn Last updated on Jan 20, 2020674 C# Programming for Beginners As a programmer, you’re motivated to master the most popular languages that will give you an edge in your career. There’s a vast number of programming languages that you can learn, but how do you know which is the most useful? If you know C and C++, do you need to learn C# as well? How similar is C# to Java? Does it become more comfortable for you to learn C# if you already know Java? Every developer and wannabe programmer asks these types of questions. So let us explore C# programming: how it evolved as an extension of C and why you need to learn it as a part of the Master’s Program in integrated DevOps for server-side execution. Are you a web developer or someone interested to build a website? Enroll for the Javascript Certification Training. Check out the course preview now! What is C#? C# is a modern object-oriented programming language developed in 2000 by Anders Hejlsberg, the principal designer and lead architect at Microsoft. It is pronounced as "C-Sharp," inspired by the musical notation “♯” which stands for a note with a slightly higher pitch. As it’s considered an incremental compilation of the C++ language, the name C “sharp” seemed most appropriate. The sharp symbol, however, has been replaced by the keyboard friendly “#” as a suffix to “C” for purposes of programming. Although the code is very similar to C++, C# is newer and has grown fast with extensive support from Microsoft. The fact that it’s so similar to Java syntactically helps explain why it has emerged as one of the most popular programming languages today. An Introduction to C# Programming C# is a general-purpose, object-oriented programming language that is structured and easy to learn. It runs on Microsoft’s .Net Framework and can be compiled on a variety of computer platforms. As the syntax is simple and easy to learn, developers familiar with C, C++, or Java have found a comfort zone within C#. C# is a boon for developers who want to build a wide range of applications on the .NET Framework—Windows applications, Web applications, and Web services—in addition to building mobile apps, Windows Store apps, and enterprise software. It is thus considered a powerful programming language and features in every developer’s cache of tools. Although first released in 2002, when it was introduced with .NET Framework 1.0, the C# language has evolved a great deal since then. The most recent version is C# 8.0, available in preview as part of Visual Studio. To get access to all of the new language features, you would need to install the latest preview version of .NET Core 3.0. The C# Environment You need the .NET Framework and an IDE (integrated development environment) to work with the C# language. The .NET Framework The .NET Framework platform of the Windows OS is required to write web and desktop-based applications using not only C# but also Visual Basic and Jscript, as the platform provides language interoperability. Besides, the .Net Framework allows C# to communicate with any of the other common languages, such as C++, Jscript, COBOL, and so on. IDEs Microsoft provides various IDEs for C# programming: Visual Studio 2010 (VS) Visual Studio Express Visual Web Developer Visual Studio Code (VSC) The C# source code files can be written using a basic text editor, like Notepad, and compiled using the command-line compiler of the .NET Framework. Alternative open-source versions of the .Net Framework can work on other operating systems as well. For instance, the Mono has a C# compiler and runs on several operating systems, including Linux, Mac, Android, BSD, iOS, Windows, Solaris, and UNIX. This brings enhanced development tools to the developer. As C# is part of the .Net Framework platform, it has access to its enormous library of codes and components, such as Common Language Runtime (CLR), the .Net Framework Class Library, Common Language Specification, Common Type System, Metadata and Assemblies, Windows Forms, ASP.Net and ASP.Net AJAX, Windows Workflow Foundation (WF), Windows Communication Foundation (WCF), and LINQ. C# and Java C# and Java are high-level programming languages that share several similarities (as well as many differences). They are both object-oriented languages much influenced by C++. But while C# is suitable for application development in the Microsoft ecosystem from the front, Java is considered best for client-side web applications. Also, while C# has many tools for programming, Java has a larger arsenal of tools to choose from in IDEs and Text Editors. C# is used for virtual reality projects like games, mobile, and web applications. It is built specifically for Microsoft platforms and several non-Microsoft-based operating systems, like the Mono Project that works with Linux and OS X. Java is used for creating messaging applications and developing web-based and enterprise-based applications in open-source ecosystems. Both C# and Java support arrays. However, each language uses them differently. In C#, arrays are a specialization of the system; in Java, they are a direct specialization of the object. The C# programming language executes on the CLR. The source code is interpreted into bytecode, which is further compiled by the CLR. Java runs on any platform with the assistance of JRE (Java Runtime Environment). The written source code is first compiled into bytecode and then converted into machine code to be executed on a JRE. C# and C++ Although C# and C++ are both C-based languages with similar code, there are some differences. For one, C# is considered a component-oriented programming language, while C++ is a partial object-oriented language. Also, while both languages are compiled languages, C# compiles to CLR and is interpreted by.NET, but C++ compiles to machine code. The size of binaries in C# is much larger than in C++. Other differences between the two include the following: C# gives compiler errors and warnings, but C++ doesn’t support warnings, which may cause damage to the OS. C# runs in a virtual machine for automatic memory management. C++ requires you to manage memory manually. C# can create Windows, .NET, web, desktop, and mobile applications, but not stand-alone apps. C++ can create server-side, stand-alone, and console applications as it can work directly with the hardware. C++ can be used on any platform, while C# is targeted toward Windows OS. Generally, C++ being faster than C#, the former is preferred for applications where performance is essential. Features of C# The C# programming language has many features that make it more useful and unique when compared to other languages, including: Object-oriented language Being object-oriented, C# allows the creation of modular applications and reusable codes, an advantage over C++. As an object-oriented language, C# makes development and maintenance easier when project size grows. It supports all three object-oriented features: data encapsulation, inheritance, interfaces, and polymorphism. Simplicity C# is a simple language with a structured approach to problem-solving. Unsafe operations, like direct memory manipulation, are not allowed. Speed The compilation and execution time in C# is very powerful and fast. A Modern programming language C# programming is used for building scalable and interoperable applications with support for modern features like automatic garbage collection, error handling, debugging, and robust security. It has built-in support for a web service to be invoked from any app running on any platform. Type-safe Arrays and objects are zero base indexed and bound checked. There is an automatic checking of the overflow of types. The C# type safety instances support robust programming. Interoperability Language interoperability of C# maximizes code reuse for the efficiency of the development process. C# programs can work upon almost anything as a program can call out any native API. Consistency Its unified type system enables developers to extend the type system simply and easily for consistent behavior. Updateable C# is automatically updateable. Its versioning support enables complex frameworks to be developed and evolved. Component oriented C# supports component-oriented programming through the concepts of properties, methods, events, and attributes for self-contained and self-describing components of functionality for robust and scalable applications. Structured Programming Language The structured design and modularization in C# break a problem into parts, using functions for easy implementation to solve significant problems. Rich Library C# has a standard library with many inbuilt functions for easy and fast development. Full Stack Java Developer Course The Gateway to Master Web DevelopmentEXPLORE COURSEFull Stack Java Developer Course Prerequisites for Learning C# Basic knowledge of C or C++ or any programming language or programming fundamentals. Additionally, the OOP concept makes for a short learning curve of C#. Advantages of C# There are many advantages to the C# language that makes it a useful programming language compared to other languages like Java, C, or C++. These include: Being an object-oriented language, C# allows you to create modular, maintainable applications and reusable codes Familiar syntax Easy to develop as it has a rich class of libraries for smooth implementation of functions Enhanced integration as an application written in .NET will integrate and interpret better when compared to other NET technologies As C# runs on CLR, it makes it easy to integrate with components written in other languages It’s safe, with no data loss as there is no type-conversion so that you can write secure codes The automatic garbage collection keeps the system clean and doesn’t hang it during execution As your machine has to install the .NET Framework to run C#, it supports cross-platform Strong memory backup prevents memory leakage Programming support of the Microsoft ecosystem makes development easy and seamless Low maintenance cost, as C# can develop iOS, Android, and Windows Phone native apps The syntax is similar to C, C++, and Java, which makes it easier to learn and work with C# Useful as it can develop iOS, Android, and Windows Phone native apps with the Xamarin Framework C# is the most powerful programming language for the .NET Framework Fast development as C# is open source steered by Microsoft with access to open source projects and tools on Github, and many active communities contributing to the improvement What Can C Sharp Do for You? C# can be used to develop a wide range of: Windows client applications Windows libraries and components Windows services Web applications Native iOS and Android mobile apps Azure cloud applications and services Gaming consoles and gaming systems Video and virtual reality games Interoperability software like SharePoint Enterprise software Backend services and database programs AI and ML applications Distributed applications Hardware-level programming Virus and malware software GUI-based applications IoT devices Blockchain and distributed ledger technology Who Should Learn the C# Programming Language and Why? C# is one of the most popular programming languages as it can be used for a variety of applications: mobile apps, game development, and enterprise software. What’s more, the C# 8.0 version is packed with several new features and enhancements to the C# language that can change the way developers write their C# code. The most important new features available are ‘null reference types,’ enhanced ‘pattern matching,’ and ‘async streams’ that help you to write more reliable and readable code. As you’re exposed to the fundamental programming concepts of C# in this course, you can work on projects that open the doors for you as a Full Stack Java Developer. So, upskill and master the C# language for a faster career trajectory and salary scope.
jun-zeng / TailorLearning graph-based code representations for source-level functional similarity detection. ICSE'23
micheletufano / AutoenCODEAutoenCODE is a Deep Learning infrastructure that allows to encode source code fragments into vector representations, which can be used to learn similarities.
TylerJaacks / ItsJustACoincidenceProfessor"It's just a coincidence professor!" is a plagiarism checker for source code. It uses the Wagner–Fischer algorithm to precisely and accurately determine percentage similarity of two given strings. We also cross reference common sites like GitHub and Stackoverflow, for potential cheating.
src-d / GeminiAdvanced similarity and duplicate source code at scale.
zealscott / ST2VecSource code for Spatio-Temporal Trajectory Similarity Learning in Road Networks. KDD 2022.
src-d / ApolloAdvanced similarity and duplicate source code proof of concept for our research efforts.
SOYJUN / TCP Socket Client ServerThe aim of this assignment is to have you do TCP socket client / server programming using I/O multiplexing, child processes and threads. It also aims at getting you to familiarize yourselves with the inetd superserver daemon, the ‘exec’ family of functions, various socket error scenarios, some socket options, and some basic domain name / IP address conversion functions. Apart from the material in Chapters 1 to 6 covered in class, you will also need to refer to the following : the exec family of functions (Section 4.7 of Chapter 4) using pipes for interprocess communication (IPC) in Unix error scenarios induced by process terminations & host crashes (Sections 5.11 to 5.16, Chapter 5) setsockopt function & SO_REUSEADDR socket option (Section 7.2 & pp.210-213, Chapter 7) gethostbyname & gethostbyaddr functions (Sections 11.3 & 11.4, Chapter 11) the basic structure of inetd (Section 13.5, Chapter 13) programming with threads (Sections 26.1 to 26.5, Chapter 26) Overview I shall present an overview of this assignment and discuss some of the specification details given below in class on Wednesday, September 17 & Monday, September 22. Client The client is evoked with a command line argument giving either the server IP address in dotted decimal notation, or the server domain name. The client has to be able to handle either mode and figure out which of the two is being passed to it. If it is given the IP address, it calls the gethostbyaddr function to get the domain name, which it then prints out to the user in the form of an appropriate message (e.g., ‘The server host is compserv1.cs.stonybrook.edu’). The function gethostbyname, on the other hand, returns the IP address that corresponds to a given domain name. The client then enters an infinite loop in which it queries the user which service is being requested. There are two options : echo and time (note that time is a slightly modified version of the daytime service – see below). The client then forks off a child. After the child is forked off, the parent process enters a second loop in which it continually reads and prints out status messages received from the child via a half-duplex pipe (see below). The parent exits the second loop when the child closes the pipe (how does the parent detect this?), and/or the SIGCHLD signal is generated when the child terminates. The parent then repeats the outer loop, querying the user again for the (next) service s/he desires. This cycle continues till the user responds to a query with quit rather than echo or time. The child process is the one which handles the actual service for the user. It execs (see Section 4.7, Chapter 4) an xterm to generate a separate window through which all interactions with server and user take place. For example, the following exec function call evokes an xterm, and gets the xterm to execute echocli, located in the current directory, passing the string 127.0.0.1 (assumed to be the IP address of the server) as the command line argument argv[1] to echocli (click on the url for further details) : execlp("xterm", "xterm", "-e", "./echocli", "127.0.0.1", (char *) 0) xterm executes one of two client programs (echocli or timecli, say) depending on the service requested. A client program establishes a TCP connection to the server at the ‘well-known port’ for the service (in reality, this port will, of course, be some ephemeral port of your choosing, the value of which is known to both server and client code). All interaction with the user, on the one hand, and with the server, on the other, takes place through the child’s xterm window, not the parent’s window. On the other hand, the child will use a half-duplex pipe to relay status information to the parent which the parent prints out in its window (see below).To terminate the echo client, the user can type in ^D (CTRL D, the EOF character). To terminate the time client, the only option is for the user to type in ^C (CTRL C). (This can also be used as an alternative means of terminating the echo client.) Note that using ^C in the context of the time service will give the server process the impression that the client process has ‘crashed’. It is your responsibility to ensure that the server process handles this correctly and closes cleanly. I shall address this further when discussing the server process. It is also part of your responsibility in this assignment to ensure that the client code is robust with respect to the server process crashing (see Sections 5.12 & 5.13, Chapter 5). Amongst other implications, this means that it would probably be a good idea for you to implement your echo client code along the lines of either : Figure 6.9, p.168 (or even Figure 6.13, p.174) which uses I/O multiplexing with the select function; or of Figure 26.2, p.680, which uses threads; rather than along the lines of Figure 5.5, p.125. When the child terminates, either normally or abnormally, its xterm window disappears instantaneously. Consequently, any status information that the child might want to communicate to the user should not be printed out on the child’s xterm window, since the user will not have time to see the final such message before the window disappears. Instead, as the parent forks off the child at the beginning, a half-duplex pipe should be established from child to parent. The child uses the pipe to send status reports to the parent, which the parent prints out in its window. I leave it up to you to decide what status information exactly should be relayed to the parent but, at a minimum, the parent should certainly be notified, in as precise terms as possible, of any abnormal termination conditions of the service provided by the child. In general, you should try to make your code as robust as possible with respect to handling errors, including confused behaviour by the user (e.g., passing an invalid command line argument; responding to a query incorrectly; trying to interact with the service through the parent process window, not the child process xterm; etc.). Amongst other things, you have to worry about EINTR errors occurring during slow system calls (such as the parent reading from the pipe, or, possibly, printing to stdout, for example) due to a SIGCHLD signal. What about other kinds of errors? Which ones can occur? How should you handle them? Server The server has to be able to handle multiple clients using threads (specifically, detached threads), not child processes (see Sections 26.1 to 26.4, Chapter 26). Furthermore, it has to be able to handle multiple types of service; in our case, two : echo and time. echo is just the standard echo service we have seen in class. time is a slightly modified version of the daytime service (see Figure 1.9, p.14) : instead of sending the client the ‘daytime’ just once and closing, the service sits in an infinite loop, sending the ‘daytime’, sleeping for 5 seconds, and repeating, ad infinitum. The server is loosely based on the way the inetd daemon works : see Figure 13.7, p.374. However, note that the differences between inetd and our server are probably more significant than the similarities: inetd forks off children, whereas our server uses threads; inetd child processes issue exec commands, which our server threads do not; etc. So you should treat Figure 13.7 (and Section 13.5, Chapter 13, generally) as a source of ideas, not as a set of specifications which you must slavishly adhere to and copy. Note, by the way, that there are some similarities between our client and inetd (primarily, forking off children which issue execs), which could be a useful source of ideas. The server creates a listening socket for each type of service that it handles, bound to the ‘well-known port’ for that service. It then uses select to await clients (Chapter 6; or, if you prefer, poll; note that pselect is not supported in Solaris 2.10). The socket on which a client connects identifies the service the client is seeking. The server accepts the connection and creates a thread which provides the service. The thread detaches itself. Meanwhile, the main thread goes back to the select to await further clients. A major concern when using threads is to make sure that operations are thread safe (see p.685 and on into Section 26.5). In this respect, Stevens’ readline function (in Stevens’ file unpv13e/lib/readline.c, see Figure 3.18, pp.91-92) poses a particular problem. On p.686, the authors give three options for dealing with this. The third option is too inefficient and should be discarded. You can implement the second option if you wish. Easiest of all would be the first option, since it involves using a thread-safe version of readline (see Figures 26.11 & 26.12) provided in file unpv13e/threads/readline.c. Whatever you do, remember that Stevens’ library, libunp.a, contains the non-thread-safe version of Figure 3.18, and that is the version that will be link-loaded to your code unless you undertake explicit steps to ensure this does not happen (libunp.a also contains the ‘wrapper’ function Readline, whose code is also in file unpv13e/lib/readline.c). Remaking your copy of libunp.a with the ‘correct’ version of readline is not a viable option because when you hand in your code, it will be compiled and link-loaded with respect to the version of libunp.a in the course account, ~cse533/Stevens/unpv13e_solaris2.10 (I do not intend to change that version since it risks creating confusion later on in the course). Also, you will probably want to use the original version of readline in the client code anyway. I am providing you with a sample Makefile which picks up the thread-safe version of readline from directory ~cse533/Stevens/unpv13e_solaris2.10/threads and uses it when making the executable for the server, but leaves the other executables it makes to link-load the non-thread-safe version from libunp.a. Again, it is part of your responsibility to make sure that your server code is as robust as possible with respect to errors, and that the server threads terminate cleanly under all circumstances. Recall, first of all, that the client user will often use ^C (CTRL C) in the xterm to terminate the service. This will appear to the server thread as if the client process has crashed. You need to think about the error conditions that will be induced (see Sections 5.11 to 5.13, Chapter 5), and how the echo and time server code is to detect and handle these conditions. For example, the time server will almost certainly experience an EPIPE error (see Section 5.13). How should the associated SIGPIPE signal be handled? Be aware that when we return out of the Stevens’ writen function with -1 (indicating an error) and check errno, errno is sometimes equal to 0, not EPIPE (value 32). This can happen under Solaris 2.10, but I am not sure under precisely what conditions nor why. Nor am I sure if it also happens under other Unix versions, or if it also happens when using write rather than writen. The point is, you cannot depend on errno to find out what has happened to the write or writen functions. My suggestion, therefore, is that the time server should use the select function. On the one hand, select’s timeout mechanism can be used to make the server sleep for the 5 seconds. On the other hand, select should also monitor the connection socket read event because, when the client xterm is ^C’ed, a FIN will be sent to the server TCP, which will prime the socket for reading; a read on the socket will then return with value 0 (see Figure 14.3, p. 385 as an example). But what about errors other than EPIPE? Which ones can occur? How should you handle them? Recall, as well, that if a thread terminates without explicitly closing the connection socket it has been using, the connection socket will remain existent until the server process itself dies (why?). Since the server process is supposed, in principle, to run for ever, you risk ending up with an ever increasing number of unused, ‘orphaned’ sockets unless you are careful. Whenever a server thread detects the termination of its client, it should print out a message giving appropriate details: e.g., “Client termination: EPIPE error detected”, “Client termination: socket read returned with value 0”, “Client termination: socket read returned with value -1, errno = . . .”, and so on. When debugging your server code, you will probably find that restarting the server very shortly after it was last running will give you trouble when it comes to bind to its ‘well-known ports’. This is because, when the server side initiates connection termination (which is what will happen if the server process crashes; or if you kill it first, before killing the client) TCP keeps the connections open in the TIME_WAIT state for 2MSLs (Sections 2.6 & 2.7, Chapter 2). This could very quickly become a major irritant. I suggest you explore the possibility of using the SO_REUSEADDR socket option (pp.210-213, Chapter 7; note that the SO_REUSEPORT socket option is not supported in Solaris 2.10), which should help keep the stress level down. You will need to use the setsockopt function (Section 7.2) to enable this option. Figure 8.24, p.263, shows an instance of server code that sets the SO_REUSEADDR socket option. Finally, you should be aware of the sort of problem, described in Section 16.6, pp.461-463, that might occur when (blocking) listening sockets are monitored using select. Such sockets should be made nonblocking, which requires use of the fcntl function after socket creates the socket, but before listen turns the socket into a listening socket.
Lab41 / AltairAssessing Source Code Semantic Similarity with Unsupervised Learning
xingsumq / Dynamic Community Detection IncNSAThe source code of a community detection method in dynamic networks for paper "IncNSA: Detecting communities incrementally from time-evolving networks based on node similarity".
fbkarsdorp / Melodic SimilaritySource code for "Learning Similarity Metrics for Melody Retrieval"
AnubisLMS / MayatExperimental AST-Based Source Code Similarity Detection Tool
jorge-martinez-gil / CodesimSource Code Clone Detection Using Unsupervised Similarity Measures
yeonjun-in / Torch SP AGCLThe official source code for Similarity Preserving Adversarial Graph Contrastive Learning (SP-AGCL) at KDD 2023.
tahmedge / CETE LRECSource code of the paper "Contextualized Embeddings based Transformer Encoder for Sentence Similarity Modeling in Answer Selection Task" published in LREC 2020.
arpit3043 / Extractive Text SummerizationSummarization systems often have additional evidence they can utilize in order to specify the most important topics of document(s). For example, when summarizing blogs, there are discussions or comments coming after the blog post that are good sources of information to determine which parts of the blog are critical and interesting. In scientific paper summarization, there is a considerable amount of information such as cited papers and conference information which can be leveraged to identify important sentences in the original paper. How text summarization works In general there are two types of summarization, abstractive and extractive summarization. Abstractive Summarization: Abstractive methods select words based on semantic understanding, even those words did not appear in the source documents. It aims at producing important material in a new way. They interpret and examine the text using advanced natural language techniques in order to generate a new shorter text that conveys the most critical information from the original text. It can be correlated to the way human reads a text article or blog post and then summarizes in their own word. Input document → understand context → semantics → create own summary. 2. Extractive Summarization: Extractive methods attempt to summarize articles by selecting a subset of words that retain the most important points. This approach weights the important part of sentences and uses the same to form the summary. Different algorithm and techniques are used to define weights for the sentences and further rank them based on importance and similarity among each other. Input document → sentences similarity → weight sentences → select sentences with higher rank. The limited study is available for abstractive summarization as it requires a deeper understanding of the text as compared to the extractive approach. Purely extractive summaries often times give better results compared to automatic abstractive summaries. This is because of the fact that abstractive summarization methods cope with problems such as semantic representation, inference and natural language generation which is relatively harder than data-driven approaches such as sentence extraction. There are many techniques available to generate extractive summarization. To keep it simple, I will be using an unsupervised learning approach to find the sentences similarity and rank them. One benefit of this will be, you don’t need to train and build a model prior start using it for your project. It’s good to understand Cosine similarity to make the best use of code you are going to see. Cosine similarity is a measure of similarity between two non-zero vectors of an inner product space that measures the cosine of the angle between them. Since we will be representing our sentences as the bunch of vectors, we can use it to find the similarity among sentences. Its measures cosine of the angle between vectors. Angle will be 0 if sentences are similar. All good till now..? Hope so :) Next, Below is our code flow to generate summarize text:- Input article → split into sentences → remove stop words → build a similarity matrix → generate rank based on matrix → pick top N sentences for summary.
Alex123425 / Geotechnical Codesit contains source code (all written in matlab), including the modified camclay (MCC), the Bounding Surface and the NorSand, simulating triaxial and cavity expansion analysis. The similarity solution from Collins & Stimpson 1994 is used for cavity expansion.
TreezzZ / LSH PyTorchSource code for paper "Similarity Search in High Dimensions via Hashing" on VLDH-1999